cuDFOnRayDataframePartition#
The class is the specific implementation of PandasDataframePartition
,
providing the API to perform operations on a block partition, namely, cudf.DataFrame
,
using Ray as an execution engine.
An operation on a block partition can be performed asynchronously in two ways:
apply()
returnsray.ObjectRef
with integer key of operation result from internal storage.add_to_apply_calls()
returns a newcuDFOnRayDataframePartition
object that is based on result of operation.
Public API#
- class modin.core.execution.ray.implementations.cudf_on_ray.partitioning.cuDFOnRayDataframePartition(gpu_manager, key, length=None, width=None)#
The class implements the interface in
PandasDataframePartition
using cuDF on Ray.- Parameters
gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.
key (ray.ObjectRef or int) – An integer key (or reference to key) associated with
cudf.DataFrame
stored in gpu_manager.length (ray.ObjectRef or int, optional) – Length or reference to it of wrapped
pandas.DataFrame
.width (ray.ObjectRef or int, optional) – Width or reference to it of wrapped
pandas.DataFrame
.
- add_to_apply_calls(func, length=None, width=None, *args, **kwargs)#
Apply func to this partition and create new.
- Parameters
func (callable) – A function to apply.
length (ray.ObjectRef or int, optional) – Length, or reference to length, of wrapped
pandas.DataFrame
.width (ray.ObjectRef or int, optional) – Width, or reference to width, of wrapped
pandas.DataFrame
.*args (tuple) – Positional arguments to be passed in func.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
New partition based on result of func.
- Return type
Notes
We eagerly schedule the apply func and produce a new
cuDFOnRayDataframePartition
.
- apply(func, *args, **kwargs)#
Apply func to this partition.
- Parameters
func (callable) – A function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- apply_result_not_dataframe(func, **kwargs)#
Apply func to this partition.
- Parameters
func (callable) – A function to apply.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- copy()#
Create a full copy of this object.
- Return type
- free()#
Free the dataFrame and associated self.key out of self.gpu_manager.
- get()#
Get object stored by this partition from self.gpu_manager.
- Return type
ray.ObjectRef
- get_gpu_manager()#
Get gpu manager associated with this partition.
- Returns
GPUManager
associated with this object.- Return type
modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager
- get_key()#
Get integer key of this partition in dict-storage of self.gpu_manager.
- Return type
int
- get_object_id()#
Get object stored for this partition from self.gpu_manager.
- Return type
ray.ObjectRef
- length(materialize=True)#
Get the length of the object wrapped by this partition.
- Parameters
materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If
False
was specified, may return a future of the result if it hasn’t been materialized yet.- Returns
The length (or reference to length) of the object.
- Return type
int or ray.ObjectRef
- mask(row_labels, col_labels)#
Select columns or rows from given indices.
- Parameters
row_labels (list of hashable) – The row labels to extract.
col_labels (list of hashable) – The column labels to extract.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- classmethod preprocess_func(func)#
Put func to Ray object store.
- Parameters
func (callable) – Function to put.
- Returns
A reference to func in Ray object store.
- Return type
ray.ObjectRef
- classmethod put(gpu_manager, pandas_dataframe)#
Put pandas_dataframe to gpu_manager.
- Parameters
gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.
pandas_dataframe (pandas.DataFrame/pandas.Series) – A
pandas.DataFrame/pandas.Series
to put.
- Returns
A reference to integer key of added pandas.DataFrame to internal dict-storage in gpu_manager.
- Return type
ray.ObjectRef
- to_numpy()#
Convert this partition to NumPy array.
- Return type
NumPy array
- to_pandas()#
Convert this partition to pandas.DataFrame.
- Return type
pandas.DataFrame
- width(materialize=True)#
Get the width of the object wrapped by this partition.
- Parameters
materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If
False
was specified, may return a future of the result if it hasn’t been materialized yet.- Returns
The width (or reference to width) of the object.
- Return type
int or ray.ObjectRef