PandasOnPythonDataframePartition#

The class is specific implementation of PandasDataframePartition, providing the API to perform operations on a block partition using Python as the execution engine.

In addition to wrapping a pandas.DataFrame, the class also holds the following metadata:

  • length - length of pandas.DataFrame wrapped

  • width - width of pandas.DataFrame wrapped

An operation on a block partition can be performed in two modes:

  • immediately via apply() - in this case accumulated call queue and new function will be executed immediately.

  • lazily via add_to_apply_calls() - in this case function will be added to the call queue and no computations will be done at the moment.

Public API#

class modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition(data, length=None, width=None, call_queue=None)#

Partition class with interface for pandas storage format and Python engine.

Class holds the data and metadata for a single partition and implements methods of parent abstract class PandasDataframePartition.

Parameters:
  • data (pandas.DataFrame) – pandas.DataFrame that should be wrapped with this class.

  • length (int, optional) – Length of data (number of rows in the input dataframe).

  • width (int, optional) – Width of data (number of columns in the input dataframe).

  • call_queue (list, optional) – Call queue of the partition (list with entities that should be called before partition materialization).

Notes

Objects of this class are treated as immutable by partition manager subclasses. There is no logic for updating in-place.

apply(func, *args, **kwargs)#

Apply a function to the object wrapped by this partition.

Parameters:
  • func (callable) – Function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns:

New PandasOnPythonDataframePartition object.

Return type:

PandasOnPythonDataframePartition

drain_call_queue()#

Execute all operations stored in the call queue on the object wrapped by this partition.

execution_wrapper#

alias of PythonWrapper

get()#

Flush the call_queue and return copy of the data.

Returns:

Copy of DataFrame that was wrapped by this partition.

Return type:

pandas.DataFrame

Notes

Since this object is a simple wrapper, just return the copy of data.

classmethod preprocess_func(func)#

Preprocess a function before an apply call.

Parameters:

func (callable) – Function to preprocess.

Returns:

An object that can be accepted by apply.

Return type:

callable

Notes

No special preprocessing action is required, so unmodified func will be returned.

classmethod put(obj)#

Create partition containing obj.

Parameters:

obj (pandas.DataFrame) – DataFrame to be put into the new partition.

Returns:

New PandasOnPythonDataframePartition object.

Return type:

PandasOnPythonDataframePartition

wait()#

Wait for completion of computations on the object wrapped by the partition.

Internally will be done by flushing the call queue.