PandasOnDaskDataframe

The class is the specific implementation of the dataframe algebra for the Dask execution engine. It serves as an intermediate level between pandas query compiler and PandasOnDaskDataframePartitionManager.

Public API

class modin.core.execution.dask.implementations.pandas_on_dask.dataframe.dataframe.PandasOnDaskDataframe(partitions, index, columns, row_lengths=None, column_widths=None, dtypes=None)

The class implements the interface in PandasDataframe.

Parameters
  • partitions (np.ndarray) – A 2D NumPy array of partitions.

  • index (sequence) – The index for the dataframe. Converted to a pandas.Index.

  • columns (sequence) – The columns object for the dataframe. Converted to a pandas.Index.

  • row_lengths (list, optional) – The length of each partition in the rows. The “height” of each of the block partitions. Is computed if not provided.

  • column_widths (list, optional) – The width of each partition in the columns. The “width” of each of the block partitions. Is computed if not provided.

  • dtypes (pandas.Series, optional) – The data types for the dataframe columns.