PandasOnRayDataframePartitionManager#
This class is the specific implementation of GenericRayDataframePartitionManager
using Ray distributed engine. This class is responsible for partition manipulation and applying a function to
block/row/column partitions.
Public API#
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframePartitionManager#
The class implements the interface in PandasDataframePartitionManager.
- classmethod apply_func_to_indices_both_axis(partitions, func, row_partitions_list, col_partitions_list, item_to_distribute=NoDefault.no_default, row_lengths=None, col_widths=None)#
Apply a function along both axes.
- Parameters
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply.
row_partitions_list (list) – List of row partitions.
col_partitions_list (list) – List of column partitions.
item_to_distribute (np.ndarray or scalar, default: no_default) – The item to split up so it can be applied over both axes.
row_lengths (list of ints, optional) – Lengths of partitions for every row. If not specified this information is extracted from partitions itself.
col_widths (list of ints, optional) – Widths of partitions for every column. If not specified this information is extracted from partitions itself.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
For your func to operate directly on the indices provided, it must use
row_internal_indices
andcol_internal_indices
as keyword arguments.
- classmethod apply_func_to_select_indices(axis, partitions, func, indices, keep_remaining=False)#
Apply a func to select indices of partitions.
- Parameters
axis ({0, 1}) – Axis to apply the func over.
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply to these indices of partitions.
indices (dict) – The indices to apply the function to.
keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
Your internal function must take a kwarg internal_indices for this to work correctly. This prevents information leakage of the internal index to the external representation.
- classmethod apply_func_to_select_indices_along_full_axis(axis, partitions, func, indices, keep_remaining=False)#
Apply a func to a select subset of full columns/rows.
- Parameters
axis ({0, 1}) – The axis to apply the func over.
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply.
indices (list-like) – The global indices to apply the func to.
keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
This should be used when you need to apply a function that relies on some global information for the entire column/row, but only need to apply a function to a subset. For your func to operate directly on the indices provided, it must use internal_indices as a keyword argument.
- classmethod binary_operation(axis, left, func, right)#
Apply a function that requires partitions of two
PandasOnRayDataframe
objects.- Parameters
axis ({0, 1}) – The axis to apply the function over (0 - rows, 1 - columns).
left (np.ndarray) – The partitions of left
PandasOnRayDataframe
.func (callable) – The function to apply.
right (np.ndarray) – The partitions of right
PandasOnRayDataframe
.
- Returns
A NumPy array with new partitions.
- Return type
np.ndarray
- classmethod concat(axis, left_parts, right_parts)#
Concatenate the blocks of partitions with another set of blocks.
- Parameters
axis (int) – The axis to concatenate to.
left_parts (np.ndarray) – NumPy array of partitions to concatenate with.
right_parts (np.ndarray or list) – NumPy array of partitions to be concatenated.
- Returns
A new NumPy array with concatenated partitions.
- Return type
np.ndarray
Notes
Assumes that the left_parts and right_parts blocks are already the same shape on the dimension (opposite axis) as the one being concatenated. A
ValueError
will be thrown if this condition is not met.
- classmethod get_objects_from_partitions(partitions)#
Get the objects wrapped by partitions in parallel.
- Parameters
partitions (np.ndarray) – NumPy array with
PandasDataframePartition
-s.- Returns
The objects wrapped by partitions.
- Return type
list
- classmethod lazy_map_partitions(partitions, map_func)#
Apply map_func to every partition in partitions lazily.
- Parameters
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
- Returns
A NumPy array of partitions.
- Return type
np.ndarray
- classmethod map_axis_partitions(axis, partitions, map_func, keep_partitioning=False, lengths=None, enumerate_partitions=False, **kwargs)#
Apply map_func to every partition in partitions along given axis.
- Parameters
axis ({0, 1}) – Axis to perform the map across (0 - index, 1 - columns).
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
keep_partitioning (bool, default: False) – Whether to keep partitioning for Modin Frame. Setting it to True prevents data shuffling between partitions.
lengths (list of ints, default: None) – List of lengths to shuffle the object.
enumerate_partitions (bool, default: False) – Whether or not to pass partition index into map_func. Note that map_func must be able to accept partition_idx kwarg.
**kwargs (dict) – Additional options that could be used by different engines.
- Returns
A NumPy array of new partitions for Modin Frame.
- Return type
np.ndarray
Notes
This method should be used in the case when map_func relies on some global information about the axis.
- classmethod map_partitions(partitions, map_func)#
Apply map_func to every partition in partitions.
- Parameters
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
- Returns
A NumPy array of partitions.
- Return type
np.ndarray
- classmethod rebalance_partitions(partitions)#
Rebalance a 2-d array of partitions.
Rebalance the partitions by building a new array of partitions out of the original ones so that:
If all partitions have a length, each new partition has roughly the same number of rows.
Otherwise, each new partition spans roughly the same number of old partitions.
- Parameters
partitions (np.ndarray) – The 2-d array of partitions to rebalance.
- Returns
A new NumPy array with rebalanced partitions.
- Return type
np.ndarray