cuDFOnRayDataframePartition#

The class is the specific implementation of PandasDataframePartition, providing the API to perform operations on a block partition, namely, cudf.DataFrame, using Ray as an execution engine.

An operation on a block partition can be performed asynchronously in two ways:

Public API#

class modin.core.execution.ray.implementations.cudf_on_ray.partitioning.cuDFOnRayDataframePartition(gpu_manager, key, length=None, width=None)#

The class implements the interface in PandasDataframePartition using cuDF on Ray.

Parameters:
  • gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.

  • key (ray.ObjectRef or int) – An integer key (or reference to key) associated with cudf.DataFrame stored in gpu_manager.

  • length (ray.ObjectRef or int, optional) – Length or reference to it of wrapped pandas.DataFrame.

  • width (ray.ObjectRef or int, optional) – Width or reference to it of wrapped pandas.DataFrame.

add_to_apply_calls(func, length=None, width=None, *args, **kwargs)#

Apply func to this partition and create new.

Parameters:
  • func (callable) – A function to apply.

  • length (ray.ObjectRef or int, optional) – Length, or reference to length, of wrapped pandas.DataFrame.

  • width (ray.ObjectRef or int, optional) – Width, or reference to width, of wrapped pandas.DataFrame.

  • *args (tuple) – Positional arguments to be passed in func.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns:

New partition based on result of func.

Return type:

cuDFOnRayDataframePartition

Notes

We eagerly schedule the apply func and produce a new cuDFOnRayDataframePartition.

apply(func, *args, **kwargs)#

Apply func to this partition.

Parameters:
  • func (callable) – A function to apply.

  • *args (iterable) – Additional positional arguments to be passed in func.

  • **kwargs (dict) – Additional keyword arguments to be passed in func.

Returns:

A reference to integer key of result in internal dict-storage of self.gpu_manager.

Return type:

ray.ObjectRef

apply_result_not_dataframe(func, **kwargs)#

Apply func to this partition.

Parameters:
  • func (callable) – A function to apply.

  • **kwargs (dict) – Additional keywords arguments to be passed in func.

Returns:

A reference to integer key of result in internal dict-storage of self.gpu_manager.

Return type:

ray.ObjectRef

copy()#

Create a full copy of this object.

Return type:

cuDFOnRayDataframePartition

free()#

Free the dataFrame and associated self.key out of self.gpu_manager.

get()#

Get object stored by this partition from self.gpu_manager.

Return type:

ray.ObjectRef

get_gpu_manager()#

Get gpu manager associated with this partition.

Returns:

GPUManager associated with this object.

Return type:

modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager

get_key()#

Get integer key of this partition in dict-storage of self.gpu_manager.

Return type:

int

get_object_id()#

Get object stored for this partition from self.gpu_manager.

Return type:

ray.ObjectRef

length(materialize=True)#

Get the length of the object wrapped by this partition.

Parameters:

materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If False was specified, may return a future of the result if it hasn’t been materialized yet.

Returns:

The length (or reference to length) of the object.

Return type:

int or ray.ObjectRef

mask(row_labels, col_labels)#

Select columns or rows from given indices.

Parameters:
  • row_labels (list of hashable) – The row labels to extract.

  • col_labels (list of hashable) – The column labels to extract.

Returns:

A reference to integer key of result in internal dict-storage of self.gpu_manager.

Return type:

ray.ObjectRef

classmethod preprocess_func(func)#

Put func to Ray object store.

Parameters:

func (callable) – Function to put.

Returns:

A reference to func in Ray object store.

Return type:

ray.ObjectRef

classmethod put(gpu_manager, pandas_dataframe)#

Put pandas_dataframe to gpu_manager.

Parameters:
  • gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.

  • pandas_dataframe (pandas.DataFrame/pandas.Series) – A pandas.DataFrame/pandas.Series to put.

Returns:

A reference to integer key of added pandas.DataFrame to internal dict-storage in gpu_manager.

Return type:

ray.ObjectRef

to_numpy()#

Convert this partition to NumPy array.

Return type:

NumPy array

to_pandas()#

Convert this partition to pandas.DataFrame.

Return type:

pandas.DataFrame

width(materialize=True)#

Get the width of the object wrapped by this partition.

Parameters:

materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If False was specified, may return a future of the result if it hasn’t been materialized yet.

Returns:

The width (or reference to width) of the object.

Return type:

int or ray.ObjectRef