PandasOnPythonFramePartition

The class is specific implementation of PandasFramePartition, providing the API to perform operations on a block partition using Python as the execution engine.

In addition to wrapping a pandas.DataFrame, the class also holds the following metadata:

  • length - length of pandas.DataFrame wrapped

  • width - width of pandas.DataFrame wrapped

An operation on a block partition can be performed in two modes:

  • immediately via apply() - in this case accumulated call queue and new function will be executed immediately.

  • lazily via add_to_apply_calls() - in this case function will be added to the call queue and no computations will be done at the moment.

Public API

class modin.engines.python.pandas_on_python.frame.partition.PandasOnPythonFramePartition(data, length=None, width=None, call_queue=None)

Partition class with interface for pandas backend and Python engine.

Class holds the data and metadata for a single partition and implements methods of parent abstract class PandasFramePartition.

Parameters
  • data (pandas.DataFrame) – pandas.DataFrame that should be wrapped with this class.

  • length (int, optional) – Length of data (number of rows in the input dataframe).

  • width (int, optional) – Width of data (number of columns in the input dataframe).

  • call_queue (list, optional) – Call queue of the partition (list with entities that should be called before partition materialization).

Notes

Objects of this class are treated as immutable by partition manager subclasses. There is no logic for updating in-place.

add_to_apply_calls(func, *args, **kwargs)

Add a function to the call queue.

Parameters
  • func (callable) – Function to be added to the call queue.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

New PandasOnPythonFramePartition object with extended call queue.

Return type

PandasOnPythonFramePartition

apply(func, *args, **kwargs)

Apply a function to the object wrapped by this partition.

Parameters
  • func (callable) – Function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

New PandasOnPythonFramePartition object.

Return type

PandasOnPythonFramePartition

drain_call_queue()

Execute all operations stored in the call queue on the object wrapped by this partition.

classmethod empty()

Create a new partition that wraps an empty pandas DataFrame.

Returns

New PandasOnPythonFramePartition object wrapping empty pandas DataFrame.

Return type

PandasOnPythonFramePartition

get()

Flush the call_queue and return copy of the data.

Returns

Copy of DataFrame that was wrapped by this partition.

Return type

pandas.DataFrame

Notes

Since this object is a simple wrapper, just return the copy of data.

length()

Get the length of the object wrapped by this partition.

Returns

The length of the object.

Return type

int

classmethod preprocess_func(func)

Preprocess a function before an apply call.

Parameters

func (callable) – Function to preprocess.

Returns

An object that can be accepted by apply.

Return type

callable

Notes

No special preprocessing action is required, so unmodified func will be returned.

classmethod put(obj)

Create partition containing obj.

Parameters

obj (pandas.DataFrame) – DataFrame to be put into the new partition.

Returns

New PandasOnPythonFramePartition object.

Return type

PandasOnPythonFramePartition

to_numpy(**kwargs)

Return NumPy array representation of pandas.DataFrame stored in this partition.

Parameters

**kwargs (dict) – Keyword arguments to pass into pandas.DataFrame.to_numpy function.

Returns

Return type

np.ndarray

to_pandas()

Return copy of the pandas.Dataframe stored in this partition.

Returns

Return type

pandas.DataFrame

Notes

Equivalent to get method for this class.

wait()

Wait for completion of computations on the object wrapped by the partition.

Internally will be done by flushing the call queue.

width()

Get the width of the object wrapped by the partition.

Returns

The width of the object.

Return type

int