BaseDataframeAxisPartition#

The class is base for any axis partition class and serves as the last level on which operations that were conveyed from the partition manager are being performed on an entire column or row.

Note: modin.core.dataframe.base intentionally does not describe any particular partition interface, as it is the partition manager responsibility (if said partition manager is implemented), i.e. it is too low-level to be present on the base, abstract level.

The class provides an API that has to be overridden by the child classes in order to manipulate on a list of block partitions (making up column or row partition) they store.

The procedures that use this class and its methods assume that they have some global knowledge about the entire axis. This may require the implementation to use concatenation or append on the list of block partitions.

Public API#

class modin.core.dataframe.base.partitioning.axis_partition.BaseDataframeAxisPartition#

An abstract class that represents the parent class for any axis partition class.

This class is intended to simplify the way that operations are performed.

_PARTITIONS_METADATA_LEN#

The number of metadata values that the object of partition_type consumes.

Type:

int

apply(func: Callable, *args: Iterable, num_splits: Optional[int] = None, other_axis_partition: Optional[BaseDataframeAxisPartition] = None, maintain_partitioning: bool = True, lengths: Optional[Iterable] = None, manual_partition: bool = False, **kwargs: dict) Any#

Apply a function to this axis partition along full axis.

Parameters:
  • func (callable) – The function to apply. This will be preprocessed according to the corresponding BaseDataframePartition objects.

  • *args (iterable) – Positional arguments to pass to func.

  • num_splits (int, default: None) – The number of times to split the result object.

  • other_axis_partition (BaseDataframeAxisPartition, default: None) – Another BaseDataframeAxisPartition object to be applied to func. This is for operations that are between two data sets.

  • maintain_partitioning (bool, default: True) – Whether to keep the partitioning in the same orientation as it was previously or not. This is important because we may be operating on an individual axis partition and not touching the rest. In this case, we have to return the partitioning to its previous orientation (the lengths will remain the same). This is ignored between two axis partitions.

  • lengths (iterable, default: None) – The list of lengths to shuffle the partition into.

  • manual_partition (bool, default: False) – If True, partition the result with lengths.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns:

A list of BaseDataframePartition objects.

Return type:

list

Notes

The procedures that invoke this method assume full axis knowledge. Implement this method accordingly.

You must return a list of BaseDataframePartition objects from this method.

force_materialization(get_ip: bool = False) BaseDataframeAxisPartition#

Materialize axis partitions into a single partition.

Parameters:

get_ip (bool, default: False) – Whether to get node ip address to a single partition or not.

Returns:

An axis partition containing only a single materialized partition.

Return type:

BaseDataframeAxisPartition

abstract property list_of_blocks: list#

Get the list of physical partition objects that compose this partition.

unwrap(squeeze: bool = False, get_ip: bool = False) Union[list, Tuple[list, list]]#

Unwrap partitions from this axis partition.

Parameters:
  • squeeze (bool, default: False) – Flag used to unwrap only one partition.

  • get_ip (bool, default: False) – Whether to get node ip address to each partition or not.

Returns:

List of partitions from this axis partition.

Return type:

list

Notes

If get_ip=True, a tuple of lists of Ray.ObjectRef/Dask.Future to node ip addresses and unwrapped partitions, respectively, is returned if Ray/Dask is used as an engine (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), …]).