PandasOnDaskFramePartition

The class is the specific implementation of PandasFramePartition, providing the API to perform operations on a block partition, namely, pandas.DataFrame, using Dask as the execution engine.

In addition to wrapping a pandas DataFrame, the class also holds the following metadata:

  • length - length of pandas DataFrame wrapped

  • width - width of pandas DataFrame wrapped

  • ip - node IP address that holds pandas DataFrame wrapped

An operation on a block partition can be performed in two modes:

Public API

class modin.engines.dask.pandas_on_dask.frame.partition.PandasOnDaskFramePartition(future, length=None, width=None, ip=None, call_queue=None)

The class implements the interface in PandasFramePartition.

Parameters
  • future (distributed.Future) – A reference to pandas DataFrame that need to be wrapped with this class.

  • length (distributed.Future or int, optional) – Length or reference to it of wrapped pandas DataFrame.

  • width (distributed.Future or int, optional) – Width or reference to it of wrapped pandas DataFrame.

  • ip (distributed.Future or str, optional) – Node IP address or reference to it that holds wrapped pandas DataFrame.

  • call_queue (list, optional) – Call queue that needs to be executed on wrapped pandas DataFrame.

add_to_apply_calls(func, *args, **kwargs)

Add a function to the call queue.

Parameters
  • func (callable) – Function to be added to the call queue.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

A new PandasOnDaskFramePartition object.

Return type

PandasOnDaskFramePartition

Notes

The keyword arguments are sent as a dictionary.

apply(func, *args, **kwargs)

Apply a function to the object wrapped by this partition.

Parameters
  • func (callable) – A function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

A new PandasOnDaskFramePartition object.

Return type

PandasOnDaskFramePartition

Notes

The keyword arguments are sent as a dictionary.

drain_call_queue()

Execute all operations stored in the call queue on the object wrapped by this partition.

classmethod empty()

Create a new partition that wraps an empty pandas DataFrame.

Returns

A new PandasOnDaskFramePartition object.

Return type

PandasOnDaskFramePartition

get()

Get the object wrapped by this partition out of the distributed memory.

Returns

The object from the distributed memory.

Return type

pandas.DataFrame

ip()

Get the node IP address of the object wrapped by this partition.

Returns

IP address of the node that holds the data.

Return type

str

length()

Get the length of the object wrapped by this partition.

Returns

The length of the object.

Return type

int

mask(row_indices, col_indices)

Lazily create a mask that extracts the indices provided.

Parameters
  • row_indices (list-like, slice or label) – The indices for the rows to extract.

  • col_indices (list-like, slice or label) – The indices for the columns to extract.

Returns

A new PandasOnDaskFramePartition object.

Return type

PandasOnDaskFramePartition

classmethod preprocess_func(func)

Preprocess a function before an apply call.

Parameters

func (callable) – The function to preprocess.

Returns

An object that can be accepted by apply.

Return type

callable

classmethod put(obj)

Put an object into distributed memory and wrap it with partition object.

Parameters

obj (any) – An object to be put.

Returns

A new PandasOnDaskFramePartition object.

Return type

PandasOnDaskFramePartition

to_numpy(**kwargs)

Convert the object wrapped by this partition to a NumPy array.

Parameters

**kwargs (dict) – Additional keyword arguments to be passed in to_numpy.

Returns

Return type

np.ndarray.

to_pandas()

Convert the object wrapped by this partition to a pandas DataFrame.

Returns

Return type

pandas.DataFrame

wait()

Wait completing computations on the object wrapped by the partition.

width()

Get the width of the object wrapped by the partition.

Returns

The width of the object.

Return type

int