PandasOnRayDataframePartition#

The class is the specific implementation of PandasDataframePartition, providing the API to perform operations on a block partition, namely, pandas.DataFrame, using Ray as an execution engine.

In addition to wrapping a pandas.DataFrame, the class also holds the following metadata:

  • length - length of pandas.DataFrame wrapped

  • width - width of pandas.DataFrame wrapped

  • ip - node IP address that holds pandas.DataFrame wrapped

An operation on a block partition can be performed in two modes:

Public API#

class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframePartition(data: Union[ObjectRef, ClientObjectRef, DeferredExecution], length: int = None, width: int = None, ip: str = None, meta: MetaList = None, meta_offset: int = 0)#

The class implements the interface in PandasDataframePartition.

Parameters:
  • data (ObjectIDType or DeferredExecution) – A reference to pandas.DataFrame that needs to be wrapped with this class or a reference to DeferredExecution that needs to be executed on demand.

  • length (ObjectIDType or int, optional) – Length or reference to it of wrapped pandas.DataFrame.

  • width (ObjectIDType or int, optional) – Width or reference to it of wrapped pandas.DataFrame.

  • ip (ObjectIDType or str, optional) – Node IP address or reference to it that holds wrapped pandas.DataFrame.

  • meta (MetaList) – Meta information, containing the lengths and the worker address (the last value).

  • meta_offset (int) – The lengths offset in the meta list.

add_to_apply_calls(func: Union[Callable, ObjectRef], *args, length=None, width=None, **kwargs)#

Add a function to the call queue.

Parameters:
  • func (callable) – Function to be added to the call queue.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • length (reference or int, optional) – Length, or reference to length, of wrapped pandas.DataFrame.

  • width (reference or int, optional) – Width, or reference to width, of wrapped pandas.DataFrame.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns:

New PandasDataframePartition object with the function added to the call queue.

Return type:

PandasDataframePartition

Notes

This function will be executed when apply is called. It will be executed in the order inserted; apply’s func operates the last and return.

apply(func: Union[Callable, ObjectRef], *args, **kwargs)#

Apply a function to the object wrapped by this partition.

Parameters:
  • func (callable or ray.ObjectRef) – A function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns:

A new PandasOnRayDataframePartition object.

Return type:

PandasOnRayDataframePartition

Notes

It does not matter if func is callable or an ray.ObjectRef. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.

drain_call_queue()#

Execute all operations stored in the call queue on the object wrapped by this partition.

execution_wrapper#

alias of RayWrapper

ip(materialize=True)#

Get the node IP address of the object wrapped by this partition.

Parameters:

materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If False was specified, may return a future of the result if it hasn’t been materialized yet.

Returns:

IP address of the node that holds the data.

Return type:

str

length(materialize=True)#

Get the length of the object wrapped by this partition.

Parameters:

materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If False was specified, may return a future of the result if it hasn’t been materialized yet.

Returns:

The length of the object.

Return type:

int or ray.ObjectRef

mask(row_labels, col_labels)#

Lazily create a mask that extracts the indices provided.

Parameters:
  • row_labels (list-like, slice or label) – The row labels for the rows to extract.

  • col_labels (list-like, slice or label) – The column labels for the columns to extract.

Returns:

A new PandasOnRayDataframePartition object.

Return type:

PandasOnRayDataframePartition

classmethod preprocess_func(func)#

Put a function into the Plasma store to use in apply.

Parameters:

func (callable) – A function to preprocess.

Returns:

A reference to func.

Return type:

ray.ObjectRef

classmethod put(obj: DataFrame)#

Put the data frame into Plasma store and wrap it with partition object.

Parameters:

obj (pandas.DataFrame) – A data frame to be put.

Returns:

A new PandasOnRayDataframePartition object.

Return type:

PandasOnRayDataframePartition

wait()#

Wait for completion of computations on the object wrapped by the partition.

width(materialize=True)#

Get the width of the object wrapped by the partition.

Parameters:

materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If False was specified, may return a future of the result if it hasn’t been materialized yet.

Returns:

The width of the object.

Return type:

int or ray.ObjectRef