PandasOnRayDataframePartition¶
The class is the specific implementation of PandasDataframePartition
,
providing the API to perform operations on a block partition, namely, pandas.DataFrame
, using Ray as an execution engine.
In addition to wrapping a pandas.DataFrame
, the class also holds the following metadata:
length
- length ofpandas.DataFrame
wrappedwidth
- width ofpandas.DataFrame
wrappedip
- node IP address that holdspandas.DataFrame
wrapped
An operation on a block partition can be performed in two modes:
asynchronously - via
apply()
lazily - via
add_to_apply_calls()
Public API¶
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.partition.PandasOnRayDataframePartition(object_id, length=None, width=None, ip=None, call_queue=None)¶
The class implements the interface in
PandasDataframePartition
.- Parameters
object_id (ray.ObjectRef) – A reference to
pandas.DataFrame
that need to be wrapped with this class.length (ray.ObjectRef or int, optional) – Length or reference to it of wrapped
pandas.DataFrame
.width (ray.ObjectRef or int, optional) – Width or reference to it of wrapped
pandas.DataFrame
.ip (ray.ObjectRef or str, optional) – Node IP address or reference to it that holds wrapped
pandas.DataFrame
.call_queue (list) – Call queue that needs to be executed on wrapped
pandas.DataFrame
.
- add_to_apply_calls(func, *args, **kwargs)¶
Add a function to the call queue.
- Parameters
func (callable or ray.ObjectRef) – Function to be added to the call queue.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
A new
PandasOnRayDataframePartition
object.- Return type
Notes
It does not matter if func is callable or an
ray.ObjectRef
. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.
- apply(func, *args, **kwargs)¶
Apply a function to the object wrapped by this partition.
- Parameters
func (callable or ray.ObjectRef) – A function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
A new
PandasOnRayDataframePartition
object.- Return type
Notes
It does not matter if func is callable or an
ray.ObjectRef
. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.
- drain_call_queue()¶
Execute all operations stored in the call queue on the object wrapped by this partition.
- classmethod empty()¶
Create a new partition that wraps an empty pandas DataFrame.
- Returns
A new
PandasOnRayDataframePartition
object.- Return type
- get()¶
Get the object wrapped by this partition out of the Plasma store.
- Returns
The object from the Plasma store.
- Return type
- ip()¶
Get the node IP address of the object wrapped by this partition.
- Returns
IP address of the node that holds the data.
- Return type
str
- length()¶
Get the length of the object wrapped by this partition.
- Returns
The length of the object.
- Return type
int
- mask(row_indices, col_indices)¶
Lazily create a mask that extracts the indices provided.
- Parameters
row_indices (list-like, slice or label) – The indices for the rows to extract.
col_indices (list-like, slice or label) – The indices for the columns to extract.
- Returns
A new
PandasOnRayDataframePartition
object.- Return type
- classmethod preprocess_func(func)¶
Put a function into the Plasma store to use in
apply
.- Parameters
func (callable) – A function to preprocess.
- Returns
A reference to func.
- Return type
ray.ObjectRef
- classmethod put(obj)¶
Put an object into Plasma store and wrap it with partition object.
- Parameters
obj (any) – An object to be put.
- Returns
A new
PandasOnRayDataframePartition
object.- Return type
- to_numpy(**kwargs)¶
Convert the object wrapped by this partition to a NumPy array.
- Parameters
**kwargs (dict) – Additional keyword arguments to be passed in
to_numpy
.- Returns
- Return type
np.ndarray
- to_pandas()¶
Convert the object wrapped by this partition to a
pandas.DataFrame
.- Returns
- Return type
pandas DataFrame.
- wait()¶
Wait completing computations on the object wrapped by the partition.
- width()¶
Get the width of the object wrapped by the partition.
- Returns
The width of the object.
- Return type
int