PandasOnRayFramePartitionManager¶
This class is the specific implementation of PandasFramePartitionManager
using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to
block/row/column partitions.
Public API¶
- class modin.engines.ray.pandas_on_ray.frame.partition_manager.PandasOnRayFramePartitionManager¶
The class implements the interface in PandasFramePartitionManager.
- classmethod apply_func_to_indices_both_axis(partitions, func, row_partitions_list, col_partitions_list, item_to_distribute=None)¶
Apply a function along both axes.
- Parameters
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply.
row_partitions_list (list) – List of row partitions.
col_partitions_list (list) – List of column partitions.
item_to_distribute (item, optional) – The item to split up so it can be applied over both axes.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
For your func to operate directly on the indices provided, it must use
row_internal_indices
andcol_internal_indices
as keyword arguments.
- classmethod apply_func_to_select_indices(axis, partitions, func, indices, keep_remaining=False)¶
Apply a func to select indices of partitions.
- Parameters
axis ({0, 1}) – Axis to apply the func over.
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply to these indices of partitions.
indices (dict) – The indices to apply the function to.
keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
Your internal function must take a kwarg internal_indices for this to work correctly. This prevents information leakage of the internal index to the external representation.
- classmethod apply_func_to_select_indices_along_full_axis(axis, partitions, func, indices, keep_remaining=False)¶
Apply a func to a select subset of full columns/rows.
- Parameters
axis ({0, 1}) – The axis to apply the func over.
partitions (np.ndarray) – The partitions to which the func will apply.
func (callable) – The function to apply.
indices (list-like) – The global indices to apply the func to.
keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.
- Returns
A NumPy array with partitions.
- Return type
np.ndarray
Notes
This should be used when you need to apply a function that relies on some global information for the entire column/row, but only need to apply a function to a subset. For your func to operate directly on the indices provided, it must use internal_indices as a keyword argument.
- classmethod binary_operation(axis, left, func, right)¶
Apply a function that requires partitions of two
PandasOnRayFrame
objects.- Parameters
axis ({0, 1}) – The axis to apply the function over (0 - rows, 1 - columns).
left (np.ndarray) – The partitions of left
PandasOnRayFrame
.func (callable) – The function to apply.
right (np.ndarray) – The partitions of right
PandasOnRayFrame
.
- Returns
A NumPy array with new partitions.
- Return type
np.ndarray
- classmethod broadcast_apply(axis, apply_func, left, right, other_name='r')¶
Broadcast the right partitions to left and apply apply_func to selected indices.
- Parameters
axis ({0, 1}) – Axis to apply and broadcast over.
apply_func (callable) – Function to apply.
left (np.ndarray) – NumPy 2D array of left partitions.
right (np.ndarray) – NumPy 2D array of right partitions.
other_name (str, default: "r") – Name of key-value argument for apply_func that is used to pass right to apply_func.
- Returns
An array of partition objects.
- Return type
np.ndarray
- classmethod get_indices(axis, partitions, index_func=None)¶
Get the internal indices stored in the partitions.
- Parameters
axis ({0, 1}) – Axis to extract the labels over.
partitions (np.ndarray) – NumPy array with
PandasFramePartition
-s.index_func (callable, default: None) – The function to be used to extract the indices.
- Returns
A
pandas.Index
object.- Return type
pandas.Index
Notes
These are the global indices of the object. This is mostly useful when you have deleted rows/columns internally, but do not know which ones were deleted.
- classmethod lazy_map_partitions(partitions, map_func)¶
Apply map_func to every partition in partitions lazily.
- Parameters
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
- Returns
A NumPy array of partitions.
- Return type
np.ndarray
- classmethod map_axis_partitions(axis, partitions, map_func, keep_partitioning=False, lengths=None, enumerate_partitions=False)¶
Apply map_func to every partition in partitions along given axis.
- Parameters
axis ({0, 1}) – Axis to perform the map across (0 - index, 1 - columns).
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
keep_partitioning (bool, default: False) – Whether to keep partitioning for Modin Frame. Setting it to True prevents data shuffling between partitions.
lengths (list of ints, default: None) – List of lengths to shuffle the object.
enumerate_partitions (bool, default: False) – Whether or not to pass partition index into map_func. Note that map_func must be able to accept partition_idx kwarg.
- Returns
A NumPy array of new partitions for Modin Frame.
- Return type
np.ndarray
Notes
This method should be used in the case when map_func relies on some global information about the axis.
- classmethod map_partitions(partitions, map_func)¶
Apply map_func to every partition in partitions.
- Parameters
partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.
map_func (callable) – Function to apply.
- Returns
A NumPy array of partitions.
- Return type
np.ndarray