Core Modin Dataframe Objects#
Modin partitions data to scale efficiently.
To keep track of everything a few key classes are introduced: Dataframe
, Partition
, AxisPartiton
and PartitionManager
.
Dataframe
is the class conforming to Dataframe Algebra.Partition
is an element of a NxM grid which, when combined, represents theDataframe
AxisPartition
is a joined group ofPartition
-s along some axis (either rows or columns)PartitionManager
is the manager that implements the primitives used for Dataframe Algebra operations overPartition
-s
Each storage format, execution engine, and each execution system (storage format + execution engine) may have its own implementations of these Core Dataframe’s entities. Current stable implementations are the following:
Base ModinDataframe defines a common interface and algebra operators for Dataframe implementations.
Storage format specific:
Modin PandasDataframe is an implementation for any frame class of pandas storage format.
Engine specific:
Modin GenericRayDataframe is an implementation for any frame class that works on Ray execution engine.
Modin GenericUnidistDataframe is an implementation for any frame class that works on Unidist execution engine.
Execution system specific:
Modin PandasOnRayDataframe is a specialization of the Core Modin Dataframe for
PandasOnRay
execution.Modin PandasOnDaskDataframe is specialization of the Core Modin Dataframe for
PandasOnDask
execution.Modin PandasOnPythonDataframe is a specialization of the Core Modin Dataframe for
PandasOnPython
execution.Modin PandasOnUnidistDataframe is a specialization of the Core Modin Dataframe for
PandasOnUnidist
execution.
Note
At the current stage of Modin development, the base interfaces of the Dataframe objects are not defined yet. So for now the origin of all changes in the Dataframe interfaces is the Dataframe for pandas storage format.