cuDFOnRayDataframePartition¶
The class is the specific implementation of PandasDataframePartition
,
providing the API to perform operations on a block partition, namely, cudf.DataFrame
,
using Ray as an execution engine.
An operation on a block partition can be performed asynchronously in two ways:
apply()
returnsray.ObjectRef
with integer key of operation result from internal storage.add_to_apply_calls()
returns a newcuDFOnRayDataframePartition
object that is based on result of operation.
Public API¶
- class modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartition(gpu_manager, key, length=None, width=None)¶
The class implements the interface in
PandasDataframePartition
using cuDF on Ray.- Parameters
gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.
key (ray.ObjectRef or int) – An integer key (or reference to key) associated with
cudf.DataFrame
stored in gpu_manager.length (ray.ObjectRef or int, optional) – Length or reference to it of wrapped
pandas.DataFrame
.width (ray.ObjectRef or int, optional) – Width or reference to it of wrapped
pandas.DataFrame
.
- add_to_apply_calls(func, **kwargs)¶
Apply func to this partition and create new.
- Parameters
func (callable) – A function to apply.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
New partition based on result of func.
- Return type
Notes
We eagerly schedule the apply func and produce a new
cuDFOnRayDataframePartition
.
- apply(func, **kwargs)¶
Apply func to this partition.
- Parameters
func (callable) – A function to apply.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- apply_result_not_dataframe(func, **kwargs)¶
Apply func to this partition.
- Parameters
func (callable) – A function to apply.
**kwargs (dict) – Additional keywords arguments to be passed in func.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- copy()¶
Create a full copy of this object.
- Returns
- Return type
- free()¶
Free the dataFrame and associated self.key out of self.gpu_manager.
- get()¶
Get object stored by this partition from self.gpu_manager.
- Returns
- Return type
ray.ObjectRef
- get_gpu_manager()¶
Get gpu manager associated with this partition.
- Returns
GPUManager
associated with this object.- Return type
modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager
- get_key()¶
Get integer key of this partition in dict-storage of self.gpu_manager.
- Returns
- Return type
int
- get_object_id()¶
Get object stored for this partition from self.gpu_manager.
- Returns
- Return type
ray.ObjectRef
- length()¶
Get the length of the object wrapped by this partition.
- Returns
The length (or reference to length) of the object.
- Return type
int or ray.ObjectRef
- mask(row_indices, col_indices)¶
Select columns or rows from given indices.
- Parameters
row_indices (list of hashable) – The row labels to extract.
col_indices (list of hashable) – The column labels to extract.
- Returns
A reference to integer key of result in internal dict-storage of self.gpu_manager.
- Return type
ray.ObjectRef
- classmethod preprocess_func(func)¶
Put func to Ray object store.
- Parameters
func (callable) – Function to put.
- Returns
A reference to func in Ray object store.
- Return type
ray.ObjectRef
- classmethod put(gpu_manager, pandas_dataframe)¶
Put pandas_dataframe to gpu_manager.
- Parameters
gpu_manager (modin.core.execution.ray.implementations.cudf_on_ray.partitioning.GPUManager) – A gpu manager to store cuDF dataframes.
pandas_dataframe (pandas.DataFrame/pandas.Series) – A
pandas.DataFrame/pandas.Series
to put.
- Returns
A reference to integer key of added pandas.DataFrame to internal dict-storage in gpu_manager.
- Return type
ray.ObjectRef
- to_numpy()¶
Convert this partition to NumPy array.
- Returns
- Return type
NumPy array
- to_pandas()¶
Convert this partition to pandas.DataFrame.
- Returns
- Return type
- width()¶
Get the width of the object wrapped by this partition.
- Returns
The width (or reference to width) of the object.
- Return type
int or ray.ObjectRef