PandasOnDaskDataframeVirtualPartition#

The class is the specific implementation of PandasOnDaskDataframeVirtualPartition, providing the API to perform operations on an axis (column or row) partition using Dask as the execution engine. The axis partition is a wrapper over a list of block partitions that are stored in this class.

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeVirtualPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

The class implements the interface in PandasDataframeAxisPartition.

Parameters:
  • list_of_partitions (Union[list, PandasOnDaskDataframePartition]) – List of PandasOnDaskDataframePartition and PandasOnDaskDataframeVirtualPartition objects, or a single PandasOnDaskDataframePartition.

  • get_ip (bool, default: False) – Whether to get node IP addresses of conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.

  • call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.

  • length (distributed.Future or int, optional) – Length, or reference to length, of wrapped pandas.DataFrame.

  • width (distributed.Future or int, optional) – Width, or reference to width, of wrapped pandas.DataFrame.

classmethod deploy_axis_func(axis, func, f_args, f_kwargs, num_splits, maintain_partitioning, *partitions, min_block_size, lengths=None, manual_partition=False)#

Deploy a function along a full axis.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • min_block_size (int) – Minimum number of rows/columns in a single split.

  • lengths (iterable, default: None) – The list of lengths to shuffle the partition into.

  • manual_partition (bool, default: False) – If True, partition the result with lengths.

Returns:

A list of distributed.Future.

Return type:

list

classmethod deploy_func_between_two_axis_partitions(axis, func, f_args, f_kwargs, num_splits, len_of_left, other_shape, *partitions, min_block_size)#

Deploy a function along a full axis between two data sets.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to func.

  • f_kwargs (dict) – Keyword arguments to pass to func.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

  • min_block_size (int) – Minimum number of rows/columns in a single split.

Returns:

A list of distributed.Future.

Return type:

list

classmethod deploy_splitting_func(axis, func, f_args, f_kwargs, num_splits, *partitions, extract_metadata=False)#

Deploy a splitting function along a full axis.

Parameters:
  • axis ({0, 1}) – The axis to perform the function along.

  • split_func (callable(pandas.DataFrame) -> list[pandas.DataFrame]) – The function to perform.

  • f_args (list or tuple) – Positional arguments to pass to split_func.

  • f_kwargs (dict) – Keyword arguments to pass to split_func.

  • num_splits (int) – The number of splits the split_func return.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • extract_metadata (bool, default: False) – Whether to return metadata (length, width, ip) of the result. Note that True value is not supported in PandasDataframeAxisPartition class.

Returns:

A list of pandas DataFrames.

Return type:

list

instance_type#

alias of Future

property list_of_ips#

Get the IPs holding the physical objects composing this partition.

Returns:

A list of IPs as distributed.Future or str.

Return type:

List

partition_type#

alias of PandasOnDaskDataframePartition

wait()#

Wait completing computations on the object wrapped by the partition.

PandasOnDaskDataframeColumnPartition#

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeColumnPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

PandasOnDaskDataframeRowPartition#

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeRowPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#

Initialize self. See help(type(self)) for accurate signature.