PandasOnRayDataframeVirtualPartition#
This class is the specific implementation of PandasDataframeAxisPartition
,
providing the API to perform operations on an axis partition, using Ray as an execution engine. The virtual partition is
a wrapper over a list of block partitions, which are stored in this class, with the capability to combine the smaller partitions into the one “virtual”.
Public API#
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.virtual_partition.PandasOnRayDataframeVirtualPartition(list_of_blocks, get_ip=False, full_axis=True, call_queue=None)#
The class implements the interface in
PandasDataframeAxisPartition
.- Parameters
list_of_blocks (Union[list, PandasOnRayDataframePartition]) – List of
PandasOnRayDataframePartition
andPandasOnRayDataframeVirtualPartition
objects, or a singlePandasOnRayDataframePartition
.get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.
full_axis (bool, default: True) – Whether or not the virtual partition encompasses the whole axis.
call_queue (list, optional) – A list of tuples (callable, args, kwargs) that contains deferred calls.
- add_to_apply_calls(func, *args, **kwargs)#
Add a function to the call queue.
- Parameters
func (callable or ray.ObjectRef) – Function to be added to the call queue.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
A new
PandasOnRayDataframeVirtualPartition
object.- Return type
Notes
It does not matter if func is callable or an
ray.ObjectRef
. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.
- apply(func, *args, num_splits=None, other_axis_partition=None, maintain_partitioning=True, **kwargs)#
Apply a function to this axis partition along full axis.
- Parameters
func (callable) – The function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
num_splits (int, default: None) – The number of times to split the result object.
other_axis_partition (PandasDataframeAxisPartition, default: None) – Another PandasDataframeAxisPartition object to be applied to func. This is for operations that are between two data sets.
maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual AxisPartition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A list of PandasOnRayDataframeVirtualPartition objects.
- Return type
list
- classmethod deploy_axis_func(axis, func, num_splits, maintain_partitioning, *partitions, **kwargs)#
Deploy a function along a full axis.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
num_splits (int) – The number of splits to return (see
split_result_of_axis_func_pandas
).maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.
*partitions (iterable) – All partitions that make up the full axis (row or column).
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A list of
ray.ObjectRef
-s.- Return type
list
- classmethod deploy_func_between_two_axis_partitions(axis, func, num_splits, len_of_left, other_shape, *partitions, **kwargs)#
Deploy a function along a full axis between two data sets.
- Parameters
axis ({0, 1}) – The axis to perform the function along.
func (callable) – The function to perform.
num_splits (int) – The number of splits to return (see
split_result_of_axis_func_pandas
).len_of_left (int) – The number of values in partitions that belong to the left data set.
other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.
*partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A list of
ray.ObjectRef
-s.- Return type
list
- drain_call_queue(num_splits=None)#
Execute all operations stored in this partition’s call queue.
- Parameters
num_splits (int, default: None) – The number of times to split the result object.
- force_materialization(get_ip=False)#
Materialize partitions into a single partition.
- Parameters
get_ip (bool, default: False) – Whether to get node ip address to a single partition or not.
- Returns
An axis partition containing only a single materialized partition.
- Return type
- instance_type#
alias of
ObjectRef
- length()#
Get the length of this partition.
- Returns
The length of the partition.
- Return type
int
- property list_of_blocks#
Get the list of physical partition objects that compose this partition.
- Returns
A list of
ray.ObjectRef
.- Return type
List
- property list_of_ips#
Get the IPs holding the physical objects composing this partition.
- Returns
A list of IPs as
ray.ObjectRef
or str.- Return type
List
- mask(row_indices, col_indices)#
Create (synchronously) a mask that extracts the indices provided.
- Parameters
row_indices (list-like, slice or label) – The row labels for the rows to extract.
col_indices (list-like, slice or label) – The column labels for the columns to extract.
- Returns
A new
PandasOnRayDataframeVirtualPartition
object, materialized.- Return type
- partition_type#
alias of
PandasOnRayDataframePartition
- to_pandas()#
Convert the data in this partition to a
pandas.DataFrame
.- Return type
pandas DataFrame.
- wait()#
Wait completing computations on the object wrapped by the partition.
- width()#
Get the width of this partition.
- Returns
The width of the partition.
- Return type
int
PandasOnRayDataframeColumnPartition#
Public API#
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.virtual_partition.PandasOnRayDataframeColumnPartition(list_of_blocks, get_ip=False, full_axis=True, call_queue=None)#
The column partition implementation.
All of the implementation for this class is in the parent class, and this class defines the axis to perform the computation over.
- Parameters
list_of_blocks (list) – List of
PandasOnRayDataframePartition
objects.get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.
full_axis (bool, default: True) – Whether this partition spans an entire axis of the dataframe.
call_queue (list, default: None) – Call queue that needs to be executed on the partition.
PandasOnRayDataframeRowPartition#
Public API#
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.virtual_partition.PandasOnRayDataframeRowPartition(list_of_blocks, get_ip=False, full_axis=True, call_queue=None)#
The row partition implementation.
All of the implementation for this class is in the parent class, and this class defines the axis to perform the computation over.
- Parameters
list_of_blocks (list) – List of
PandasOnRayDataframePartition
objects.get_ip (bool, default: False) – Whether to get node IP addresses to conforming partitions or not.
full_axis (bool, default: True) – Whether this partition spans an entire axis of the dataframe.
call_queue (list, default: None) – Call queue that needs to be executed on the partition.