cuDFOnRayDataframe¶

The class is the specific implementation of PandasDataframe class using Ray distributed engine. It serves as an intermediate level between cuDFQueryCompiler and cuDFOnRayDataframePartitionManager.

Public API¶

class modin.core.execution.ray.implementations.cudf_on_ray.dataframe.dataframe.cuDFOnRayDataframe(partitions, index, columns, row_lengths=None, column_widths=None, dtypes=None)¶

The class implements the interface in PandasOnRayDataframe using cuDF.

Parameters

partitions (np.ndarray) – A 2D NumPy array of partitions.
index (sequence) – The index for the dataframe. Converted to a pandas.Index.
columns (sequence) – The columns object for the dataframe. Converted to a pandas.Index.
row_lengths (list, optional) – The length of each partition in the rows. The “height” of each of the block partitions. Is computed if not provided.
column_widths (list, optional) – The width of each partition in the columns. The “width” of each of the block partitions. Is computed if not provided.
dtypes (pandas.Series, optional) – The data types for the dataframe columns.

mask(row_indices=None, row_numeric_idx=None, col_indices=None, col_numeric_idx=None)¶

Lazily select columns or rows from given indices.

Parameters

row_indices (list of hashable, optional) – The row labels to extract.
row_numeric_idx (list of int, optional) – The row indices to extract.
col_indices (list of hashable, optional) – The column labels to extract.
col_numeric_idx (list of int, optional) – The column indices to extract.

Returns

A new cuDFOnRayDataframe from the mask provided.

Return type

cuDFOnRayDataframe

Notes

If both row_indices and row_numeric_idx are set, row_indices will be used. The same rule applied to col_indices and col_numeric_idx.

synchronize_labels(axis=None)¶

Synchronize labels by applying the index object (Index or Columns) to the partitions eagerly.

Parameters: axis ({0, 1, None}, default: None) – The axis to apply to. If None, it applies to both axes.