PandasOnUnidistDataframeVirtualPartition#

This class is the specific implementation of PandasDataframeAxisPartition, providing the API to perform operations on an axis partition, using Unidist as an execution engine. The virtual partition is a wrapper over a list of block partitions, which are stored in this class, with the capability to combine the smaller partitions into the one “virtual”.

Public API#

class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeVirtualPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

The class implements the interface in PandasDataframeAxisPartition.

Parameters:
  • list_of_partitions (Union[list, PandasOnUnidistDataframePartition]) – List of PandasOnUnidistDataframePartition and PandasOnUnidistDataframeVirtualPartition objects, or a single PandasOnUnidistDataframePartition.

  • get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.

  • call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.

  • length (unidist.ObjectRef or int, optional) – Length, or reference to length, of wrapped pandas.DataFrame.

  • width (unidist.ObjectRef or int, optional) – Width, or reference to width, of wrapped pandas.DataFrame.

classmethod deploy_axis_func(axis, func, f_args, f_kwargs, num_splits, maintain_partitioning, *partitions, lengths=None, manual_partition=False, max_retries=None)#

Deploy a function along a full axis.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • lengths (list, optional) – The list of lengths to shuffle the object.

  • manual_partition (bool, default: False) – If True, partition the result with lengths.

  • max_retries (int, default: None) – The max number of times to retry the func.

Returns:

A list of unidist.ObjectRef-s.

Return type:

list

classmethod deploy_func_between_two_axis_partitions(axis, func, f_args, f_kwargs, num_splits, len_of_left, other_shape, *partitions)#

Deploy a function along a full axis between two data sets.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

Returns:

A list of unidist.ObjectRef-s.

Return type:

list

classmethod deploy_splitting_func(axis, func, f_args, f_kwargs, num_splits, *partitions, extract_metadata=False)#

Deploy a splitting function along a full axis.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • split_func (callable(pandas.DataFrame) -> list[pandas.DataFrame]) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to split_func.

  • f_kwargs (dict) – Keyword arguments to pass to split_func.

  • num_splits (int) – The number of splits the split_func return.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • extract_metadata (bool, default: False) – Whether to return metadata (length, width, ip) of the result. Note that True value is not supported in PandasDataframeAxisPartition class.

Returns:

A list of pandas DataFrames.

Return type:

list

property list_of_ips#

Get the IPs holding the physical objects composing this partition.

Returns:

A list of IPs as unidist.ObjectRef or str.

Return type:

List

partition_type#

alias of PandasOnUnidistDataframePartition

wait()#

Wait completing computations on the object wrapped by the partition.

PandasOnUnidistDataframeColumnPartition#

Public API#

class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeColumnPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

Initialize self. See help(type(self)) for accurate signature.

PandasOnUnidistDataframeRowPartition#

Public API#

class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeRowPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

Initialize self. See help(type(self)) for accurate signature.