PandasOnRayFramePartitionManager

This class is the specific implementation of PandasFramePartitionManager using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to block/row/column partitions.

Public API

class modin.engines.ray.pandas_on_ray.frame.partition_manager.PandasOnRayFramePartitionManager

The class implements the interface in PandasFramePartitionManager.

classmethod apply_func_to_indices_both_axis(partitions, func, row_partitions_list, col_partitions_list, item_to_distribute=None)

Apply a function along both axes.

Parameters
  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply.

  • row_partitions_list (list) – List of row partitions.

  • col_partitions_list (list) – List of column partitions.

  • item_to_distribute (item, optional) – The item to split up so it can be applied over both axes.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

For your func to operate directly on the indices provided, it must use row_internal_indices and col_internal_indices as keyword arguments.

classmethod apply_func_to_select_indices(axis, partitions, func, indices, keep_remaining=False)

Apply a func to select indices of partitions.

Parameters
  • axis ({0, 1}) – Axis to apply the func over.

  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply to these indices of partitions.

  • indices (dict) – The indices to apply the function to.

  • keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

Your internal function must take a kwarg internal_indices for this to work correctly. This prevents information leakage of the internal index to the external representation.

classmethod apply_func_to_select_indices_along_full_axis(axis, partitions, func, indices, keep_remaining=False)

Apply a func to a select subset of full columns/rows.

Parameters
  • axis ({0, 1}) – The axis to apply the func over.

  • partitions (np.ndarray) – The partitions to which the func will apply.

  • func (callable) – The function to apply.

  • indices (list-like) – The global indices to apply the func to.

  • keep_remaining (bool, default: False) – Whether or not to keep the other partitions. Some operations may want to drop the remaining partitions and keep only the results.

Returns

A NumPy array with partitions.

Return type

np.ndarray

Notes

This should be used when you need to apply a function that relies on some global information for the entire column/row, but only need to apply a function to a subset. For your func to operate directly on the indices provided, it must use internal_indices as a keyword argument.

classmethod binary_operation(axis, left, func, right)

Apply a function that requires partitions of two PandasOnRayFrame objects.

Parameters
  • axis ({0, 1}) – The axis to apply the function over (0 - rows, 1 - columns).

  • left (np.ndarray) – The partitions of left PandasOnRayFrame.

  • func (callable) – The function to apply.

  • right (np.ndarray) – The partitions of right PandasOnRayFrame.

Returns

A NumPy array with new partitions.

Return type

np.ndarray

classmethod broadcast_apply(axis, apply_func, left, right, other_name='r')

Broadcast the right partitions to left and apply apply_func to selected indices.

Parameters
  • axis ({0, 1}) – Axis to apply and broadcast over.

  • apply_func (callable) – Function to apply.

  • left (np.ndarray) – NumPy 2D array of left partitions.

  • right (np.ndarray) – NumPy 2D array of right partitions.

  • other_name (str, default: "r") – Name of key-value argument for apply_func that is used to pass right to apply_func.

Returns

An array of partition objects.

Return type

np.ndarray

classmethod get_indices(axis, partitions, index_func=None)

Get the internal indices stored in the partitions.

Parameters
  • axis ({0, 1}) – Axis to extract the labels over.

  • partitions (np.ndarray) – NumPy array with PandasFramePartition-s.

  • index_func (callable, default: None) – The function to be used to extract the indices.

Returns

A pandas.Index object.

Return type

pandas.Index

Notes

These are the global indices of the object. This is mostly useful when you have deleted rows/columns internally, but do not know which ones were deleted.

classmethod lazy_map_partitions(partitions, map_func)

Apply map_func to every partition in partitions lazily.

Parameters
  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

Returns

A NumPy array of partitions.

Return type

np.ndarray

classmethod map_axis_partitions(axis, partitions, map_func, keep_partitioning=False, lengths=None, enumerate_partitions=False)

Apply map_func to every partition in partitions along given axis.

Parameters
  • axis ({0, 1}) – Axis to perform the map across (0 - index, 1 - columns).

  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

  • keep_partitioning (bool, default: False) – Whether to keep partitioning for Modin Frame. Setting it to True prevents data shuffling between partitions.

  • lengths (list of ints, default: None) – List of lengths to shuffle the object.

  • enumerate_partitions (bool, default: False) – Whether or not to pass partition index into map_func. Note that map_func must be able to accept partition_idx kwarg.

Returns

A NumPy array of new partitions for Modin Frame.

Return type

np.ndarray

Notes

This method should be used in the case when map_func relies on some global information about the axis.

classmethod map_partitions(partitions, map_func)

Apply map_func to every partition in partitions.

Parameters
  • partitions (np.ndarray) – A NumPy 2D array of partitions to perform operation on.

  • map_func (callable) – Function to apply.

Returns

A NumPy array of partitions.

Return type

np.ndarray