PandasOnUnidistDataframeVirtualPartition#
This class is the specific implementation of PandasDataframeAxisPartition
,
providing the API to perform operations on an axis partition, using Unidist as an execution engine. The virtual partition is
a wrapper over a list of block partitions, which are stored in this class, with the capability to combine the smaller partitions into the one “virtual”.
Public API#
- class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeVirtualPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
The class implements the interface in
PandasDataframeAxisPartition
.- Parameters
list_of_partitions (Union[list, PandasOnUnidistDataframePartition]) – List of
PandasOnUnidistDataframePartition
andPandasOnUnidistDataframeVirtualPartition
objects, or a singlePandasOnUnidistDataframePartition
.get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.
full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.
call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.
length (unidist.ObjectRef or int, optional) – Length, or reference to length, of wrapped
pandas.DataFrame
.width (unidist.ObjectRef or int, optional) – Width, or reference to width, of wrapped
pandas.DataFrame
.
- classmethod deploy_axis_func(axis, func, f_args, f_kwargs, num_splits, maintain_partitioning, *partitions, lengths=None, manual_partition=False, max_retries=None)#
Deploy a function along a full axis.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to
func
.f_kwargs (dict) – Keyword arguments to pass to
func
.num_splits (int) – The number of splits to return (see
split_result_of_axis_func_pandas
).maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.
*partitions (iterable) – All partitions that make up the full axis (row or column).
lengths (list, optional) – The list of lengths to shuffle the object.
manual_partition (bool, default: False) – If True, partition the result with lengths.
max_retries (int, default: None) – The max number of times to retry the func.
- Returns
A list of
unidist.ObjectRef
-s.- Return type
list
- classmethod deploy_func_between_two_axis_partitions(axis, func, f_args, f_kwargs, num_splits, len_of_left, other_shape, *partitions)#
Deploy a function along a full axis between two data sets.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to
func
.f_kwargs (dict) – Keyword arguments to pass to
func
.num_splits (int) – The number of splits to return (see
split_result_of_axis_func_pandas
).len_of_left (int) – The number of values in partitions that belong to the left data set.
other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.
*partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.
- Returns
A list of
unidist.ObjectRef
-s.- Return type
list
- classmethod deploy_splitting_func(axis, func, f_args, f_kwargs, num_splits, *partitions, extract_metadata=False)#
Deploy a splitting function along a full axis.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
split_func (callable(pandas.DataFrame) -> list[pandas.DataFrame]) – The function to perform.
f_args (list or tuple) – Positional arguments to pass to split_func.
f_kwargs (dict) – Keyword arguments to pass to split_func.
num_splits (int) – The number of splits the split_func return.
*partitions (iterable) – All partitions that make up the full axis (row or column).
extract_metadata (bool, default: False) – Whether to return metadata (length, width, ip) of the result. Note that True value is not supported in PandasDataframeAxisPartition class.
- Returns
A list of pandas DataFrames.
- Return type
list
- instance_type#
alias of
ObjectRef
- property list_of_ips#
Get the IPs holding the physical objects composing this partition.
- Returns
A list of IPs as
unidist.ObjectRef
or str.- Return type
List
- partition_type#
alias of
PandasOnUnidistDataframePartition
- wait()#
Wait completing computations on the object wrapped by the partition.
PandasOnUnidistDataframeColumnPartition#
Public API#
- class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeColumnPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
Initialize self. See help(type(self)) for accurate signature.
PandasOnUnidistDataframeRowPartition#
Public API#
- class modin.core.execution.unidist.implementations.pandas_on_unidist.partitioning.PandasOnUnidistDataframeRowPartition(list_of_partitions, get_ip=False, full_axis=True, call_queue=None, length=None, width=None)#
Initialize self. See help(type(self)) for accurate signature.