BaseFrameAxisPartition

The class is base for any axis partition class and serves as the last level on which operations that were conveyed from the partition manager are being performed on an entire column or row.

The class provides an API that has to be overridden by the child classes in order to manipulate on a list of block partitions (making up column or row partition) they store.

The procedures that use this class and its methods assume that they have some global knowledge about the entire axis. This may require the implementation to use concatenation or append on the list of block partitions.

The PandasFramePartitionManager object that controls these objects (through the API exposed here) has an invariant that requires that this object is never returned from a function. It assumes that there will always be PandasFramePartition object stored and structures itself accordingly.

Public API

class modin.engines.base.frame.axis_partition.BaseFrameAxisPartition

An abstract class that represents the parent class for any axis partition class.

This class is intended to simplify the way that operations are performed.

apply(func, num_splits=None, other_axis_partition=None, maintain_partitioning=True, **kwargs)

Apply a function to this axis partition along full axis.

Parameters
  • func (callable) – The function to apply. This will be preprocessed according to the corresponding PandasFramePartition objects.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (BaseFrameAxisPartition, default: None) – Another BaseFrameAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual axis partition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasFramePartition objects.

Return type

list

Notes

The procedures that invoke this method assume full axis knowledge. Implement this method accordingly.

You must return a list of PandasFramePartition objects from this method.

force_materialization(get_ip=False)

Materialize axis partitions into a single partition.

Parameters

get_ip (bool, default: False) – Whether to get node ip address to a single partition or not.

Returns

An axis partition containing only a single materialized partition.

Return type

BaseFrameAxisPartition

shuffle(func, lengths, **kwargs)

Shuffle the order of the data in this axis partition based on the lengths.

Parameters
  • func (callable) – The function to apply before splitting.

  • lengths (list) – The list of partition lengths to split the result into.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasFramePartition objects split by lengths.

Return type

list

unwrap(squeeze=False, get_ip=False)

Unwrap partitions from this axis partition.

Parameters
  • squeeze (bool, default: False) – Flag used to unwrap only one partition.

  • get_ip (bool, default: False) – Whether to get node ip address to each partition or not.

Returns

List of partitions from this axis partition.

Return type

list

Notes

If get_ip=True, a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and unwrapped partitions, respectively, is returned if Ray/Dask is used as an engine (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), …]).

PandasFrameAxisPartition

The class is base for any axis partition class of pandas backend.

Subclasses must implement list_of_blocks which represents data wrapped by the PandasFramePartition objects and creates something interpretable as a pandas DataFrame.

See modin.engines.ray.pandas_on_ray.axis_partition.PandasOnRayFrameAxisPartition for an example on how to override/use this class when the implementation needs to be augmented.

Public API

class modin.engines.base.frame.axis_partition.PandasFrameAxisPartition

An abstract class is created to simplify and consolidate the code for axis partition that run pandas.

Because much of the code is similar, this allows us to reuse this code.

apply(func, num_splits=None, other_axis_partition=None, maintain_partitioning=True, **kwargs)

Apply a function to this axis partition along full axis.

Parameters
  • func (callable) – The function to apply.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (PandasFrameAxisPartition, default: None) – Another PandasFrameAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual AxisPartition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasFramePartition objects.

Return type

list

classmethod deploy_axis_func(axis, func, num_splits, kwargs, maintain_partitioning, *partitions)

Deploy a function along a full axis.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • kwargs (dict) – Additional keywords arguments to be passed in func.

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

Returns

A list of pandas DataFrames.

Return type

list

classmethod deploy_func_between_two_axis_partitions(axis, func, num_splits, len_of_left, other_shape, kwargs, *partitions)

Deploy a function along a full axis between two data sets.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • kwargs (dict) – Additional keywords arguments to be passed in func.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

Returns

A list of pandas DataFrames.

Return type

list

shuffle(func, lengths, **kwargs)

Shuffle the order of the data in this axis partition based on the lengths.

Parameters
  • func (callable) – The function to apply before splitting.

  • lengths (list) – The list of partition lengths to split the result into.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasFramePartition objects split by lengths.

Return type

list