PandasOnRayDataframeVirtualPartition#

This class is the specific implementation of PandasDataframeAxisPartition, providing the API to perform operations on an axis partition, using Ray as an execution engine. The virtual partition is a wrapper over a list of block partitions, which are stored in this class, with the capability to combine the smaller partitions into the one “virtual”.

Public API#

class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframeVirtualPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

The class implements the interface in PandasDataframeAxisPartition.

Parameters
  • list_of_partitions (Union[list, PandasOnRayDataframePartition]) – List of PandasOnRayDataframePartition and PandasOnRayDataframeVirtualPartition objects, or a single PandasOnRayDataframePartition.

  • get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.

  • call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.

  • length (ray.ObjectRef or int, optional) – Length, or reference to length, of wrapped pandas.DataFrame.

  • width (ray.ObjectRef or int, optional) – Width, or reference to width, of wrapped pandas.DataFrame.

add_to_apply_calls(func, *args, length=None, width=None, **kwargs)#

Add a function to the call queue.

Parameters
  • func (callable or ray.ObjectRef) – Function to be added to the call queue.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • length (ray.ObjectRef or int, optional) – Length, or reference to it, of wrapped pandas.DataFrame.

  • width (ray.ObjectRef or int, optional) – Width, or reference to it, of wrapped pandas.DataFrame.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns

A new PandasOnRayDataframeVirtualPartition object.

Return type

PandasOnRayDataframeVirtualPartition

Notes

It does not matter if func is callable or an ray.ObjectRef. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.

apply(func, *args, num_splits=None, other_axis_partition=None, maintain_partitioning=True, lengths=None, manual_partition=False, **kwargs)#

Apply a function to this axis partition along full axis.

Parameters
  • func (callable) – The function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (PandasDataframeAxisPartition, default: None) – Another PandasDataframeAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual AxisPartition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • lengths (list, optional) – The list of lengths to shuffle the object.

  • manual_partition (bool, default: False) – If True, partition the result with lengths.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasOnRayDataframeVirtualPartition objects.

Return type

list

Notes

In older versions of Modin, args was passed internally as kwargs["args"], and deserialization was handled in a special case in _deploy_ray_func, making control flow difficult to follow. All deserialization is still handled in _deploy_ray_func, but in a more direct fashion.

classmethod deploy_axis_func(axis, func, f_args, f_kwargs, num_splits, maintain_partitioning, *partitions, lengths=None, manual_partition=False, max_retries=None)#

Deploy a function along a full axis.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • lengths (list, optional) – The list of lengths to shuffle the object.

  • manual_partition (bool, default: False) – If True, partition the result with lengths.

  • max_retries (int, default: None) – The max number of times to retry the func.

Returns

A list of ray.ObjectRef-s.

Return type

list

classmethod deploy_func_between_two_axis_partitions(axis, func, f_args, f_kwargs, num_splits, len_of_left, other_shape, *partitions)#

Deploy a function along a full axis between two data sets.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

Returns

A list of ray.ObjectRef-s.

Return type

list

classmethod deploy_splitting_func(axis, func, f_args, f_kwargs, num_splits, *partitions, extract_metadata=False)#

Deploy a splitting function along a full axis.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • split_func (callable(pandas.DataFrame) -> list[pandas.DataFrame]) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to split_func.

  • f_kwargs (dict) – Keyword arguments to pass to split_func.

  • num_splits (int) – The number of splits the split_func return.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • extract_metadata (bool, default: False) – Whether to return metadata (length, width, ip) of the result. Note that True value is not supported in PandasDataframeAxisPartition class.

Returns

A list of pandas DataFrames.

Return type

list

drain_call_queue(num_splits=None)#

Execute all operations stored in this partition’s call queue.

Parameters

num_splits (int, default: None) – The number of times to split the result object.

force_materialization(get_ip=False)#

Materialize partitions into a single partition.

Parameters

get_ip (bool, default: False) – Whether to get node ip address to a single partition or not.

Returns

An axis partition containing only a single materialized partition.

Return type

PandasOnRayDataframeVirtualPartition

instance_type#

alias of ObjectRef

length()#

Get the length of this partition.

Returns

The length of the partition.

Return type

int

property list_of_block_partitions: list#

Get the list of block partitions that compose this partition.

Returns

A list of PandasOnRayDataframePartition.

Return type

List

property list_of_ips#

Get the IPs holding the physical objects composing this partition.

Returns

A list of IPs as ray.ObjectRef or str.

Return type

List

mask(row_indices, col_indices)#

Create (synchronously) a mask that extracts the indices provided.

Parameters
  • row_indices (list-like, slice or label) – The row labels for the rows to extract.

  • col_indices (list-like, slice or label) – The column labels for the columns to extract.

Returns

A new PandasOnRayDataframeVirtualPartition object, materialized.

Return type

PandasOnRayDataframeVirtualPartition

partition_type#

alias of PandasOnRayDataframePartition

to_numpy()#

Convert the data in this partition to a numpy.array.

Return type

NumPy array.

to_pandas()#

Convert the data in this partition to a pandas.DataFrame.

Return type

pandas DataFrame.

wait()#

Wait completing computations on the object wrapped by the partition.

width()#

Get the width of this partition.

Returns

The width of the partition.

Return type

int

PandasOnRayDataframeColumnPartition#

Public API#

class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframeColumnPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

PandasOnRayDataframeRowPartition#

Public API#

class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframeRowPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

Initialize self. See help(type(self)) for accurate signature.