Factories Module Description#

Brief description#

Modin has several execution engines and storage formats, combining them together forms certain executions. Calling any DataFrame API function will end up in some execution-specific method. The responsibility of dispatching high-level API calls to execution-specific function belongs to the QueryCompiler, which is determined at the time of the dataframe’s creation by the factory of the corresponding execution. The mission of this module is to route IO function calls from the API level to its actual execution-specific implementations, which builds the QueryCompiler of the appropriate execution.

Execution representation via Factories#

Execution is a combination of the storage format and an actual execution engine. For example, PandasOnRay execution means the combination of the pandas storage format and Ray engine.

Each storage format has its own Query Compiler which compiles the most efficient queries for the corresponding Core Modin Dataframe implementation. Speaking about PandasOnRay execution, its Query Compiler is PandasQueryCompiler and the Dataframe implementation is PandasDataframe, which is general implementation for every execution of the pandas storage format. The actual implementation of PandasOnRay dataframe is defined by the PandasOnRayDataframe class that extends PandasDataframe.

In the scope of this module, each execution is represented with a factory class located in modin/core/execution/dispatching/factories/factories.py. Each factory contains a field that identifies the IO module of the corresponding execution. This IO module is responsible for dispatching calls of IO functions to their actual implementations in the underlying IO module. For more information about IO module visit IO page.

Factory Dispatcher#

The FactoryDispatcher class provides public methods whose interface corresponds to pandas IO functions, the only difference is that they return QueryCompiler of the selected storage format instead of high-level DataFrame. FactoryDispatcher is responsible for routing these IO calls to the factory which represents the selected execution.

So when you call read_csv() function and your execution is PandasOnRay then the trace would be the following:

../../../../_images/factory_dispatching.svg

modin.pandas.read_csv calls FactoryDispatcher.read_csv, which calls ._read_csv function of the factory of the selected execution, in our case it’s PandasOnRayFactory._read_csv, which in turn forwards this call to the actual implementation of read_csv — to the PandasOnRayIO.read_csv. The result of modin.pandas.read_csv will return a high-level Modin DataFrame with the appropriate QueryCompiler bound to it, which is responsible for dispatching all of the further function calls.

Public API#

Module contains Factories for all of the supported Modin executions.

Factory is a bridge between calls of IO function from high-level API and its actual implementation in the execution, bound to that factory. Each execution is represented with a Factory class.

class modin.core.execution.dispatching.factories.factories.BaseFactory#

Abstract factory which allows to override the IO module easily.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

BaseIO

classmethod get_info() FactoryInfo#

Get information about current factory.

Notes

It parses factory name, so it must be conformant with how FactoryDispatcher class constructs factory names.

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with an underlying execution’s IO-module lazily.

class modin.core.execution.dispatching.factories.factories.ExperimentalCudfOnRayFactory#

Factory of cuDFOnRay execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

cuDFOnRayIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with cuDFOnRayIO lazily.

class modin.core.execution.dispatching.factories.factories.ExperimentalHdkOnNativeFactory#

Factory of experimental HdkOnNative execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

experimental HdkOnNativeIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with experimental HdkOnNativeIO lazily.

class modin.core.execution.dispatching.factories.factories.FactoryInfo(engine: str, partition: str, experimental: bool)#

Structure that stores information about factory.

Parameters:
  • engine (str) – Name of underlying execution engine.

  • partition (str) – Name of the partition format.

  • experimental (bool) – Whether underlying engine is experimental-only.

engine: str#

Alias for field number 0

experimental: bool#

Alias for field number 2

partition: str#

Alias for field number 1

exception modin.core.execution.dispatching.factories.factories.NotRealFactory#

NotRealFactory exception class.

Raise when no matching factory could be found.

class modin.core.execution.dispatching.factories.factories.PandasOnDaskFactory#

Factory of PandasOnDask execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

PandasOnDaskIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with PandasOnDaskIO lazily.

class modin.core.execution.dispatching.factories.factories.PandasOnPythonFactory#

Factory of PandasOnPython execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

PandasOnPythonIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with PandasOnPythonIO lazily.

class modin.core.execution.dispatching.factories.factories.PandasOnRayFactory#

Factory of PandasOnRay execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

PandasOnRayIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with PandasOnRayIO lazily.

class modin.core.execution.dispatching.factories.factories.PandasOnUnidistFactory#

Factory of PandasOnUnidist execution.

This class is responsible for dispatching calls of IO-functions to its actual execution-specific implementations.

io_cls#

IO module class of the underlying execution. The place to dispatch calls to.

Type:

PandasOnUnidistIO

classmethod prepare()#

Initialize Factory.

Fills in .io_cls class attribute with PandasOnUnidistIO lazily.