BaseDataframeAxisPartition#

The class is base for any axis partition class and serves as the last level on which operations that were conveyed from the partition manager are being performed on an entire column or row.

Note: modin.core.dataframe.base intentionally does not describe any particular partition interface, as it is the partition manager responsibility (if said partition manager is implemented), i.e. it is too low-level to be present on the base, abstract level.

The class provides an API that has to be overridden by the child classes in order to manipulate on a list of block partitions (making up column or row partition) they store.

The procedures that use this class and its methods assume that they have some global knowledge about the entire axis. This may require the implementation to use concatenation or append on the list of block partitions.

Public API#

class modin.core.dataframe.base.partitioning.axis_partition.BaseDataframeAxisPartition#

An abstract class that represents the parent class for any axis partition class.

This class is intended to simplify the way that operations are performed.

apply(func, num_splits=None, other_axis_partition=None, maintain_partitioning=True, **kwargs)#

Apply a function to this axis partition along full axis.

Parameters
  • func (callable) – The function to apply. This will be preprocessed according to the corresponding BaseDataframePartition objects.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (BaseDataframeAxisPartition, default: None) – Another BaseDataframeAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual axis partition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of BaseDataframePartition objects.

Return type

list

Notes

The procedures that invoke this method assume full axis knowledge. Implement this method accordingly.

You must return a list of BaseDataframePartition objects from this method.

force_materialization(get_ip=False)#

Materialize axis partitions into a single partition.

Parameters

get_ip (bool, default: False) – Whether to get node ip address to a single partition or not.

Returns

An axis partition containing only a single materialized partition.

Return type

BaseDataframeAxisPartition

shuffle(func, lengths, **kwargs)#

Shuffle the order of the data in this axis partition based on the lengths.

Parameters
  • func (callable) – The function to apply before splitting.

  • lengths (list) – The list of partition lengths to split the result into.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns

A list of BaseDataframePartition objects split by lengths.

Return type

list

unwrap(squeeze=False, get_ip=False)#

Unwrap partitions from this axis partition.

Parameters
  • squeeze (bool, default: False) – Flag used to unwrap only one partition.

  • get_ip (bool, default: False) – Whether to get node ip address to each partition or not.

Returns

List of partitions from this axis partition.

Return type

list

Notes

If get_ip=True, a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and unwrapped partitions, respectively, is returned if Ray/Dask is used as an engine (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), …]).