PandasOnRayDataframePartitionManager#

This class is the specific implementation of GenericRayDataframePartitionManager using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to block/row/column partitions.

Public API#

class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframePartitionManager#

The class implements the interface in PandasDataframePartitionManager.

classmethod apply_func_to_indices_both_axis(partitions, func, row_partitions_list, col_partitions_list, item_to_distribute=NoDefault.no_default, row_lengths=None, col_widths=None)#

Apply a function along both axes.

Parameters
  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply.

  • row_partitions_list (list) – List of row partitions.

  • col_partitions_list (list) – List of column partitions.

  • item_to_distribute (np.ndarray or scalar, default: no_default) – The item to split up so it can be applied over both axes.

  • row_lengths (list of ints, optional) – Lengths of partitions for every row. If not specified this information is extracted from partitions itself.

  • col_widths (list of ints, optional) – Widths of partitions for every column. If not specified this information is extracted from partitions itself.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

For your func to operate directly on the indices provided, it must use row_internal_indices and col_internal_indices as keyword arguments.

classmethod apply_func_to_select_indices(axis, partitions, func, indices, keep_remaining=False)#

Apply a func to select indices of partitions.

Parameters
  • axis ({0, 1}) – Axis to apply the func over.

  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply to these indices of partitions.

  • indices (dict) – The indices to apply the function to.

  • keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

Your internal function must take a kwarg internal_indices for this to work correctly. This prevents information leakage of the internal index to the external representation.

classmethod apply_func_to_select_indices_along_full_axis(axis, partitions, func, indices, keep_remaining=False)#

Apply a func to a select subset of full columns/rows.

Parameters
  • axis ({0, 1}) – The axis to apply the func over.

  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply.

  • indices (list-like) – The global indices to apply the func to.

  • keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

This should be used when you need to apply a function that relies on some global information for the entire column/row, but only need to apply a function to a subset. For your func to operate directly on the indices provided, it must use internal_indices as a keyword argument.

classmethod binary_operation(axis, left, func, right)#

Apply a function that requires partitions of two PandasOnRayDataframe objects.

Parameters
  • axis ({0, 1}) – The axis to apply the function over (0 - rows, 1 - columns).

  • left (np.ndarray) – The partitions of left PandasOnRayDataframe.

  • func (callable) – The function to apply.

  • right (np.ndarray) – The partitions of right PandasOnRayDataframe.

Returns

A NumPy array with new partitions.

Return type

np.ndarray

classmethod concat(axis, left_parts, right_parts)#

Concatenate the blocks of partitions with another set of blocks.

Parameters
  • axis (int) – The axis to concatenate to.

  • left_parts (np.ndarray) – NumPy array of partitions to concatenate with.

  • right_parts (np.ndarray or list) – NumPy array of partitions to be concatenated.

Returns

A new NumPy array with concatenated partitions.

Return type

np.ndarray

Notes

Assumes that the left_parts and right_parts blocks are already the same shape on the dimension (opposite axis) as the one being concatenated. A ValueError will be thrown if this condition is not met.

classmethod get_objects_from_partitions(partitions)#

Get the objects wrapped by partitions in parallel.

Parameters

partitions (np.ndarray) – NumPy array with PandasDataframePartition-s.

Returns

The objects wrapped by partitions.

Return type

list

classmethod lazy_map_partitions(partitions, map_func)#

Apply map_func to every partition in partitions lazily.

Parameters
  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

Returns

A NumPy array of partitions.

Return type

np.ndarray

classmethod map_axis_partitions(axis, partitions, map_func, keep_partitioning=False, lengths=None, enumerate_partitions=False, **kwargs)#

Apply map_func to every partition in partitions along given axis.

Parameters
  • axis ({0, 1}) – Axis to perform the map across (0 - index, 1 - columns).

  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

  • keep_partitioning (bool, default: False) – Whether to keep partitioning for Modin Frame. Setting it to True prevents data shuffling between partitions.

  • lengths (list of ints, default: None) – List of lengths to shuffle the object.

  • enumerate_partitions (bool, default: False) – Whether or not to pass partition index into map_func. Note that map_func must be able to accept partition_idx kwarg.

  • **kwargs (dict) – Additional options that could be used by different engines.

Returns

A NumPy array of new partitions for Modin Frame.

Return type

np.ndarray

Notes

This method should be used in the case when map_func relies on some global information about the axis.

classmethod map_partitions(partitions, map_func)#

Apply map_func to every partition in partitions.

Parameters
  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

Returns

A NumPy array of partitions.

Return type

np.ndarray

classmethod rebalance_partitions(partitions)#

Rebalance a 2-d array of partitions.

Rebalance the partitions by building a new array of partitions out of the original ones so that:

  • If all partitions have a length, each new partition has roughly the same number of rows.

  • Otherwise, each new partition spans roughly the same number of old partitions.

Parameters

partitions (np.ndarray) – The 2-d array of partitions to rebalance.

Returns

A new NumPy array with rebalanced partitions.

Return type

np.ndarray