PandasOnDaskDataframeVirtualPartition#
The class is the specific implementation of PandasOnDaskDataframeVirtualPartition
,
providing the API to perform operations on an axis (column or row) partition using Dask as the execution engine.
The axis partition is a wrapper over a list of block partitions that are stored in this class.
Public API#
- class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeVirtualPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
The class implements the interface in
PandasDataframeAxisPartition
.- Parameters
list_of_partitions (Union[list, PandasOnDaskDataframePartition]) – List of
PandasOnDaskDataframePartition
andPandasOnDaskDataframeVirtualPartition
objects, or a singlePandasOnDaskDataframePartition
.get_ip (bool, default: False) – Whether to get node IP addresses of conforming partitions or not.
full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.
call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.
length (distributed.Future or int, optional) – Length, or reference to length, of wrapped
pandas.DataFrame
.width (distributed.Future or int, optional) – Width, or reference to width, of wrapped
pandas.DataFrame
.
- classmethod deploy_axis_func(axis, func, f_args, f_kwargs, num_splits, maintain_partitioning, *partitions, lengths=None, manual_partition=False)#
Deploy a function along a full axis.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to
func
.f_kwargs (dict) – Keyword arguments to pass to
func
.num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).
maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.
*partitions (iterable) – All partitions that make up the full axis (row or column).
lengths (iterable, default: None) – The list of lengths to shuffle the partition into.
manual_partition (bool, default: False) – If True, partition the result with lengths.
- Returns
A list of distributed.Future.
- Return type
list
- classmethod deploy_func_between_two_axis_partitions(axis, func, f_args, f_kwargs, num_splits, len_of_left, other_shape, *partitions)#
Deploy a function along a full axis between two data sets.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to
func
.f_kwargs (dict) – Keyword arguments to pass to
func
.num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).
len_of_left (int) – The number of values in partitions that belong to the left data set.
other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.
*partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.
- Returns
A list of distributed.Future.
- Return type
list
- classmethod deploy_splitting_func(axis, func, f_args, f_kwargs, num_splits, *partitions, extract_metadata=False)#
Deploy a splitting function along a full axis.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
split_func (callable(pandas.DataFrame) -> list[pandas.DataFrame]) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to split_func.
f_kwargs (dict) – Keyword arguments to pass to split_func.
num_splits (int) – The number of splits the split_func return.
*partitions (iterable) – All partitions that make up the full axis (row or column).
extract_metadata (bool, default: False) – Whether to return metadata (length, width, ip) of the result. Note that True value is not supported in PandasDataframeAxisPartition class.
- Returns
A list of pandas DataFrames.
- Return type
list
- instance_type#
alias of
Future
- property list_of_ips#
Get the IPs holding the physical objects composing this partition.
- Returns
A list of IPs as
distributed.Future
or str.- Return type
List
- partition_type#
alias of
PandasOnDaskDataframePartition
- wait()#
Wait completing computations on the object wrapped by the partition.
PandasOnDaskDataframeColumnPartition#
Public API#
- class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeColumnPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
PandasOnDaskDataframeRowPartition#
Public API#
- class modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframeRowPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
Initialize self. See help(type(self)) for accurate signature.