Pandas partitioning API#
This page contains a description of the API to extract partitions from and build Modin Dataframes.
unwrap_partitions#
- modin.distributed.dataframe.pandas.unwrap_partitions(api_layer_object: Union[DataFrame, Series], axis: Optional[int] = None, get_ip: bool = False) list #
Unwrap partitions of the
api_layer_object
.- Parameters:
api_layer_object (DataFrame or Series) – The API layer object.
axis ({None, 0, 1}, default: None) – The axis to unwrap partitions for (0 - row partitions, 1 - column partitions). If
axis is None
, the partitions are unwrapped as they are currently stored.get_ip (bool, default: False) – Whether to get node ip address to each partition or not.
- Returns:
A list of Ray.ObjectRef/Dask.Future to partitions of the
api_layer_object
if Ray/Dask is used as an engine.- Return type:
list
Notes
If
get_ip=True
, a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions of theapi_layer_object
, respectively, is returned if Ray/Dask is used as an engine (i.e.[(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]
).
from_partitions#
- modin.distributed.dataframe.pandas.from_partitions(partitions: list, axis: Optional[int], index: Optional[Union[ExtensionArray, ndarray, Index, Series, SequenceNotStr, range]] = None, columns: Optional[Union[ExtensionArray, ndarray, Index, Series, SequenceNotStr, range]] = None, row_lengths: Optional[list] = None, column_widths: Optional[list] = None) DataFrame #
Create DataFrame from remote partitions.
- Parameters:
partitions (list) – A list of Ray.ObjectRef/Dask.Future to partitions depending on the engine used. Or a list of tuples of Ray.ObjectRef/Dask.Future to node ip addresses and partitions depending on the engine used (i.e.
[(Ray.ObjectRef/Dask.Future, Ray.ObjectRef/Dask.Future), ...]
).axis ({None, 0 or 1}) –
The
axis
parameter is used to identify what are the partitions passed. You have to set:axis=0
if you want to create DataFrame from row partitionsaxis=1
if you want to create DataFrame from column partitionsaxis=None
if you want to create DataFrame from 2D list of partitions
index (sequence, optional) – The index for the DataFrame. Is computed if not provided.
columns (sequence, optional) – The columns for the DataFrame. Is computed if not provided.
row_lengths (list, optional) – The length of each partition in the rows. The “height” of each of the block partitions. Is computed if not provided.
column_widths (list, optional) – The width of each partition in the columns. The “width” of each of the block partitions. Is computed if not provided.
- Returns:
DataFrame instance created from remote partitions.
- Return type:
modin.pandas.DataFrame
Notes
Pass index, columns, row_lengths and column_widths to avoid triggering extra computations of the metadata when creating a DataFrame.
Example#
import modin.pandas as pd
from modin.distributed.dataframe.pandas import unwrap_partitions, from_partitions
import numpy as np
data = np.random.randint(0, 100, size=(2 ** 10, 2 ** 8))
df = pd.DataFrame(data)
partitions = unwrap_partitions(df, axis=0, get_ip=True)
print(partitions)
new_df = from_partitions(partitions, axis=0)
print(new_df)