PandasOnRayDataframePartition#
The class is the specific implementation of PandasDataframePartition
,
providing the API to perform operations on a block partition, namely, pandas.DataFrame
, using Ray as an execution engine.
In addition to wrapping a pandas.DataFrame
, the class also holds the following metadata:
length
- length ofpandas.DataFrame
wrappedwidth
- width ofpandas.DataFrame
wrappedip
- node IP address that holdspandas.DataFrame
wrapped
An operation on a block partition can be performed in two modes:
asynchronously - via
apply()
lazily - via
add_to_apply_calls()
Public API#
- class modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframePartition(data: Union[ObjectRef, ClientObjectRef, DeferredExecution], length: int = None, width: int = None, ip: str = None, meta: MetaList = None, meta_offset: int = 0)#
The class implements the interface in
PandasDataframePartition
.- Parameters:
data (ObjectIDType or DeferredExecution) – A reference to
pandas.DataFrame
that needs to be wrapped with this class or a reference to DeferredExecution that needs to be executed on demand.length (ObjectIDType or int, optional) – Length or reference to it of wrapped
pandas.DataFrame
.width (ObjectIDType or int, optional) – Width or reference to it of wrapped
pandas.DataFrame
.ip (ObjectIDType or str, optional) – Node IP address or reference to it that holds wrapped
pandas.DataFrame
.meta (MetaList) – Meta information, containing the lengths and the worker address (the last value).
meta_offset (int) – The lengths offset in the meta list.
- add_to_apply_calls(func: Union[Callable, ObjectRef], *args, length=None, width=None, **kwargs)#
Add a function to the call queue.
- Parameters:
func (callable) – Function to be added to the call queue.
*args (iterable) – Additional positional arguments to be passed in func.
length (reference or int, optional) – Length, or reference to length, of wrapped
pandas.DataFrame
.width (reference or int, optional) – Width, or reference to width, of wrapped
pandas.DataFrame
.**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns:
New PandasDataframePartition object with the function added to the call queue.
- Return type:
Notes
This function will be executed when apply is called. It will be executed in the order inserted; apply’s func operates the last and return.
- apply(func: Union[Callable, ObjectRef], *args, **kwargs)#
Apply a function to the object wrapped by this partition.
- Parameters:
func (callable or ray.ObjectRef) – A function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns:
A new
PandasOnRayDataframePartition
object.- Return type:
Notes
It does not matter if func is callable or an
ray.ObjectRef
. Ray will handle it correctly either way. The keyword arguments are sent as a dictionary.
- drain_call_queue()#
Execute all operations stored in the call queue on the object wrapped by this partition.
- execution_wrapper#
alias of
RayWrapper
- ip(materialize=True)#
Get the node IP address of the object wrapped by this partition.
- Parameters:
materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If
False
was specified, may return a future of the result if it hasn’t been materialized yet.- Returns:
IP address of the node that holds the data.
- Return type:
str
- length(materialize=True)#
Get the length of the object wrapped by this partition.
- Parameters:
materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If
False
was specified, may return a future of the result if it hasn’t been materialized yet.- Returns:
The length of the object.
- Return type:
int or ray.ObjectRef
- mask(row_labels, col_labels)#
Lazily create a mask that extracts the indices provided.
- Parameters:
row_labels (list-like, slice or label) – The row labels for the rows to extract.
col_labels (list-like, slice or label) – The column labels for the columns to extract.
- Returns:
A new
PandasOnRayDataframePartition
object.- Return type:
- classmethod preprocess_func(func)#
Put a function into the Plasma store to use in
apply
.- Parameters:
func (callable) – A function to preprocess.
- Returns:
A reference to func.
- Return type:
ray.ObjectRef
- classmethod put(obj: DataFrame)#
Put the data frame into Plasma store and wrap it with partition object.
- Parameters:
obj (pandas.DataFrame) – A data frame to be put.
- Returns:
A new
PandasOnRayDataframePartition
object.- Return type:
- wait()#
Wait for completion of computations on the object wrapped by the partition.
- width(materialize=True)#
Get the width of the object wrapped by the partition.
- Parameters:
materialize (bool, default: True) – Whether to forcibly materialize the result into an integer. If
False
was specified, may return a future of the result if it hasn’t been materialized yet.- Returns:
The width of the object.
- Return type:
int or ray.ObjectRef