PandasOnUnidistDataframe#

The class is specific implementation of PandasDataframe class using Unidist distributed engine. It serves as an intermediate level between PandasQueryCompiler and PandasOnUnidistDataframePartitionManager.

Public API#

class modin.core.execution.unidist.implementations.pandas_on_unidist.dataframe.PandasOnUnidistDataframe(partitions, index=None, columns=None, row_lengths=None, column_widths=None, dtypes: Optional[Union[Series, ModinDtypes, Callable]] = None, pandas_backend: Optional[str] = None)#

The class implements the interface in PandasDataframe using unidist.

Parameters:

partitions (np.ndarray) – A 2D NumPy array of partitions.
index (sequence) – The index for the dataframe. Converted to a pandas.Index.
columns (sequence) – The columns object for the dataframe. Converted to a pandas.Index.
row_lengths (list, optional) – The length of each partition in the rows. The “height” of each of the block partitions. Is computed if not provided.
column_widths (list, optional) – The width of each partition in the columns. The “width” of each of the block partitions. Is computed if not provided.
dtypes (pandas.Series, optional) – The data types for the dataframe columns.
pandas_backend ({"pyarrow", None}, optional) – Backend used by pandas. None - means default NumPy backend.

property engine: str#

The engine for this frame.

Returns:: The engine.
Return type:: str

support_materialization_in_worker_process() → bool#

Whether it’s possible to call function to_pandas during the pickling process, at the moment of recreating the object.

Return type:: bool