Defaulting to pandas#
Currently Modin does not support distributed execution for all methods from pandas API. The remaining unimplemented methods are being executed in a mode called “default to pandas”. This allows users to continue using Modin even though their workloads contain functions not yet implemented in Modin. Here is a diagram of how we convert to pandas and perform the operation:
We first convert to a pandas DataFrame, then perform the operation. There is a performance penalty for going from a partitioned Modin DataFrame to pandas because of the communication cost and single-threaded nature of pandas. Once the pandas operation has completed, we convert the DataFrame back into a partitioned Modin DataFrame. This way, operations performed after something defaults to pandas will be optimized with Modin.
The exact methods we have implemented are listed in the respective subsections:
We have taken a community-driven approach to implementing new methods. We did a study on pandas usage to learn what the most-used APIs are. Modin currently supports 93% of the pandas API based on our study of pandas usage, and we are actively expanding the API. To request implementation, file an issue at https://github.com/modin-project/modin/issues or send an email to firstname.lastname@example.org.