PandasOnRayFramePartition

The class is the specific implementation of PandasFramePartition, providing the API to perform operations on a block partition, namely, pandas.DataFrame, using Ray as an execution engine.

In addition to wrapping a pandas DataFrame, the class also holds the following metadata:

  • length - length of pandas DataFrame wrapped

  • width - width of pandas DataFrame wrapped

  • ip - node IP address that holds pandas DataFrame wrapped

An operation on a block partition can be performed in two modes:

Public API

class modin.engines.ray.pandas_on_ray.frame.partition.PandasOnRayFramePartition(object_id, length=None, width=None, ip=None, call_queue=None)

The class implements the interface in PandasFramePartition.

Parameters
  • object_id (ray.ObjectRef) – A reference to pandas.DataFrame that need to be wrapped with this class.

  • length (ray.ObjectRef or int, optional) – Length or reference to it of wrapped pandas.DataFrame.

  • width (ray.ObjectRef or int, optional) – Width or reference to it of wrapped pandas.DataFrame.

  • ip (ray.ObjectRef or str, optional) – Node IP address or reference to it that holds wrapped pandas.DataFrame.

  • call_queue (list) – Call queue that needs to be executed on wrapped pandas.DataFrame.

add_to_apply_calls(func, *args, **kwargs)

Add a function to the call queue.

Parameters
  • func (callable or ray.ObjectRef) – Function to be added to the call queue.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

A new PandasOnRayFramePartition object.

Return type

PandasOnRayFramePartition

Notes

It does not matter if func is callable or an ray.ObjectRef. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.

apply(func, *args, **kwargs)

Apply a function to the object wrapped by this partition.

Parameters
  • func (callable or ray.ObjectRef) – A function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

A new PandasOnRayFramePartition object.

Return type

PandasOnRayFramePartition

Notes

It does not matter if func is callable or an ray.ObjectRef. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.

drain_call_queue()

Execute all operations stored in the call queue on the object wrapped by this partition.

classmethod empty()

Create a new partition that wraps an empty pandas DataFrame.

Returns

A new PandasOnRayFramePartition object.

Return type

PandasOnRayFramePartition

get()

Get the object wrapped by this partition out of the Plasma store.

Returns

The object from the Plasma store.

Return type

pandas.DataFrame

ip()

Get the node IP address of the object wrapped by this partition.

Returns

IP address of the node that holds the data.

Return type

str

length()

Get the length of the object wrapped by this partition.

Returns

The length of the object.

Return type

int

mask(row_indices, col_indices)

Lazily create a mask that extracts the indices provided.

Parameters
  • row_indices (list-like, slice or label) – The indices for the rows to extract.

  • col_indices (list-like, slice or label) – The indices for the columns to extract.

Returns

A new PandasOnRayFramePartition object.

Return type

PandasOnRayFramePartition

classmethod preprocess_func(func)

Put a function into the Plasma store to use in apply.

Parameters

func (callable) – A function to preprocess.

Returns

A reference to func.

Return type

ray.ObjectRef

classmethod put(obj)

Put an object into Plasma store and wrap it with partition object.

Parameters

obj (any) – An object to be put.

Returns

A new PandasOnRayFramePartition object.

Return type

PandasOnRayFramePartition

to_numpy(**kwargs)

Convert the object wrapped by this partition to a NumPy array.

Parameters

**kwargs (dict) – Additional keyword arguments to be passed in to_numpy.

Returns

Return type

np.ndarray

to_pandas()

Convert the object wrapped by this partition to a pandas.DataFrame.

Returns

Return type

pandas DataFrame.

wait()

Wait completing computations on the object wrapped by the partition.

width()

Get the width of the object wrapped by the partition.

Returns

The width of the object.

Return type

int