Pandas partitioning API#

This page contains a description of the API to extract partitions from and build Modin Dataframes.

unwrap_partitions#

modin.distributed.dataframe.pandas.unwrap_partitions(api_layer_object: Union[DataFrame, Series], axis: Optional[int] = None, get_ip: bool = False) list#

Unwrap partitions of the api_layer_object.

Parameters:
  • api_layer_object (DataFrame or Series) – The API layer object.

  • axis ({None, 0, 1}, default: None) – The axis to unwrap partitions for (0 - row partitions, 1 - column partitions). If axis is None, the partitions are unwrapped as they are currently stored.

  • get_ip (bool, default: False) – Whether to get node ip address to each partition or not.

Returns:

A list of Ray.ObjectRef/Dask.Future to partitions of the api_layer_object if Ray/Dask is used as an engine.

Return type:

list

Notes

If get_ip=True, a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions of the api_layer_object, respectively, is returned if Ray/Dask is used as an engine (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]).

from_partitions#

modin.distributed.dataframe.pandas.from_partitions(partitions: list, axis: Optional[int], index: Optional[Union[ExtensionArray, ndarray, Index, Series, SequenceNotStr, range]] = None, columns: Optional[Union[ExtensionArray, ndarray, Index, Series, SequenceNotStr, range]] = None, row_lengths: Optional[list] = None, column_widths: Optional[list] = None) DataFrame#

Create DataFrame from remote partitions.

Parameters:
  • partitions (list) – A list of Ray.ObjectRef/Dask.Future to partitions depending on the engine used. Or a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions depending on the engine used (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]).

  • axis ({None, 0 or 1}) –

    The axis parameter is used to identify what are the partitions passed. You have to set:

    • axis=0 if you want to create DataFrame from row partitions

    • axis=1 if you want to create DataFrame from column partitions

    • axis=None if you want to create DataFrame from 2D list of partitions

  • index (sequence, optional) – The index for the DataFrame. Is computed if not provided.

  • columns (sequence, optional) – The columns for the DataFrame. Is computed if not provided.

  • row_lengths (list, optional) – The length of each partition in the rows. The “height” of each of the block partitions. Is computed if not provided.

  • column_widths (list, optional) – The width of each partition in the columns. The “width” of each of the block partitions. Is computed if not provided.

Returns:

DataFrame instance created from remote partitions.

Return type:

modin.pandas.DataFrame

Notes

Pass index, columns, row_lengths and column_widths to avoid triggering extra computations of the metadata when creating a DataFrame.

Example#

import modin.pandas as pd
from modin.distributed.dataframe.pandas import unwrap_partitions, from_partitions
import numpy as np
data = np.random.randint(0, 100, size=(2 ** 10, 2 ** 8))
df = pd.DataFrame(data)
partitions = unwrap_partitions(df, axis=0, get_ip=True)
print(partitions)
new_df = from_partitions(partitions, axis=0)
print(new_df)