PandasDataframeAxisPartition

The class is base for any axis partition class of pandas storage format.

Subclasses must implement list_of_blocks which represents data wrapped by the PandasDataframePartition objects and creates something interpretable as a pandas.DataFrame.

See PandasOnRayDataframeAxisPartition for an example on how to override/use this class when the implementation needs to be augmented.

The PandasDataframeAxisPartition object has an invariant that requires that this object is never returned from a function. It assumes that there will always be PandasDataframeAxisPartition object stored and structures itself accordingly.

Public API

class modin.core.dataframe.pandas.partitioning.axis_partition.PandasDataframeAxisPartition

An abstract class is created to simplify and consolidate the code for axis partition that run pandas.

Because much of the code is similar, this allows us to reuse this code.

apply(func, num_splits=None, other_axis_partition=None, maintain_partitioning=True, **kwargs)

Apply a function to this axis partition along full axis.

Parameters
  • func (callable) – The function to apply.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (PandasDataframeAxisPartition, default: None) – Another PandasDataframeAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual AxisPartition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasDataframePartition objects.

Return type

list

classmethod deploy_axis_func(axis, func, num_splits, kwargs, maintain_partitioning, *partitions)

Deploy a function along a full axis.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • kwargs (dict) – Additional keywords arguments to be passed in func.

  • maintain_partitioning (bool) – If True, keep the old partitioning if possible. If False, create a new partition layout.

  • *partitions (iterable) – All partitions that make up the full axis (row or column).

Returns

A list of pandas DataFrames.

Return type

list

classmethod deploy_func_between_two_axis_partitions(axis, func, num_splits, len_of_left, other_shape, kwargs, *partitions)

Deploy a function along a full axis between two data sets.

Parameters
  • axis ({0, 1}) – The axis to perform the function along.

  • func (callable) – The function to perform.

  • num_splits (int) – The number of splits to return (see split_result_of_axis_func_pandas).

  • len_of_left (int) – The number of values in partitions that belong to the left data set.

  • other_shape (np.ndarray) – The shape of right frame in terms of partitions, i.e. (other_shape[i-1], other_shape[i]) will indicate slice to restore i-1 axis partition.

  • kwargs (dict) – Additional keywords arguments to be passed in func.

  • *partitions (iterable) – All partitions that make up the full axis (row or column) for both data sets.

Returns

A list of pandas DataFrames.

Return type

list

shuffle(func, lengths, **kwargs)

Shuffle the order of the data in this axis partition based on the lengths.

Parameters
  • func (callable) – The function to apply before splitting.

  • lengths (list) – The list of partition lengths to split the result into.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of PandasDataframePartition objects split by lengths.

Return type

list