PandasOnDaskDataframeAxisPartition#

The class is the specific implementation of PandasDataframeAxisPartition, providing the API to perform operations on an axis (column or row) partition using Dask as the execution engine. The axis partition is a wrapper over a list of block partitions that are stored in this class.

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.virtual_partition.PandasOnDaskDataframeAxisPartition(list_of_blocks, get_ip=False, full_axis=True)#

The class implements the interface in PandasDataframeAxisPartition.

Parameters
  • list_of_blocks (list) – List of PandasOnDaskDataframePartition objects.

  • get_ip (bool, default: False) – Whether to get node IP addresses of conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.

classmethod deploy_axis_func(axis, func, num_splits, maintain_partitioning, *partitions, **kwargs)#

Deploy a function along a full axis.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of distributed.Future.

Return type

list

classmethod deploy_func_between_two_axis_partitions(axis, func, num_splits, len_of_left, other_shape, *partitions, **kwargs)#

Deploy a function along a full axis between two data sets.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of distributed.Future.

Return type

list

instance_type#

alias of Future

partition_type#

alias of PandasOnDaskDataframePartition

PandasOnDaskDataframeColumnPartition#

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.virtual_partition.PandasOnDaskDataframeColumnPartition(list_of_blocks, get_ip=False, full_axis=True)#

The column partition implementation.

All of the implementation for this class is in the parent class, and this class defines the axis to perform the computation over.

Parameters
  • list_of_blocks (list) – List of PandasOnDaskDataframePartition objects.

  • get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.

PandasOnDaskDataframeRowPartition#

Public API#

class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.virtual_partition.PandasOnDaskDataframeRowPartition(list_of_blocks, get_ip=False, full_axis=True)#

The row partition implementation.

All of the implementation for this class is in the parent class, and this class defines the axis to perform the computation over.

Parameters
  • list_of_blocks (list) – List of PandasOnDaskDataframePartition objects.

  • get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.

  • full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.