Pandas Partition API

This page contains a description of the API to extract partitions from and build Modin Dataframes.

unwrap_partitions

modin.distributed.dataframe.pandas.unwrap_partitions(api_layer_object, axis=None, get_ip=False)

Unwrap partitions of the api_layer_object.

Parameters
  • api_layer_object (DataFrame or Series) – The API layer object.

  • axis ({None, 0, 1}, default: None) – The axis to unwrap partitions for (0 - row partitions, 1 - column partitions). If axis is None, the partitions are unwrapped as they are currently stored.

  • get_ip (bool, default: False) – Whether to get node ip address to each partition or not.

Returns

A list of Ray.ObjectRef/Dask.Future to partitions of the api_layer_object if Ray/Dask is used as an engine.

Return type

list

Notes

If get_ip=True, a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions of the api_layer_object, respectively, is returned if Ray/Dask is used as an engine (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]).

from_partitions

modin.distributed.dataframe.pandas.from_partitions(partitions, axis)

Create DataFrame from remote partitions.

Parameters
  • partitions (list) – A list of Ray.ObjectRef/Dask.Future to partitions depending on the engine used. Or a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions depending on the engine used (i.e. [(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]).

  • axis ({None, 0 or 1}) –

    The axis parameter is used to identify what are the partitions passed. You have to set:

    • axis=0 if you want to create DataFrame from row partitions

    • axis=1 if you want to create DataFrame from column partitions

    • axis=None if you want to create DataFrame from 2D list of partitions

Returns

DataFrame instance created from remote partitions.

Return type

modin.pandas.DataFrame

Example

import modin.pandas as pd
from modin.distributed.dataframe.pandas import unwrap_partitions, from_partitions
import numpy as np
data = np.random.randint(0, 100, size=(2 ** 10, 2 ** 8))
df = pd.DataFrame(data)
partitions = unwrap_partitions(df, axis=0, get_ip=True)
print(partitions)
new_df = from_partitions(partitions, axis=0)
print(new_df)