PandasOnPythonDataframePartition#
The class is specific implementation of PandasDataframePartition,
providing the API to perform operations on a block partition using Python as the execution engine.
In addition to wrapping a pandas.DataFrame, the class also holds the following metadata:
length- length ofpandas.DataFramewrappedwidth- width ofpandas.DataFramewrapped
An operation on a block partition can be performed in two modes:
immediately via
apply()- in this case accumulated call queue and new function will be executed immediately.lazily via
add_to_apply_calls()- in this case function will be added to the call queue and no computations will be done at the moment.
Public API#
- class modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition(data, length=None, width=None, call_queue=None)#
Partition class with interface for pandas storage format and Python engine.
Class holds the data and metadata for a single partition and implements methods of parent abstract class
PandasDataframePartition.- Parameters
data (pandas.DataFrame) –
pandas.DataFramethat should be wrapped with this class.length (int, optional) – Length of data (number of rows in the input dataframe).
width (int, optional) – Width of data (number of columns in the input dataframe).
call_queue (list, optional) – Call queue of the partition (list with entities that should be called before partition materialization).
Notes
Objects of this class are treated as immutable by partition manager subclasses. There is no logic for updating in-place.
- add_to_apply_calls(func, *args, length=None, width=None, **kwargs)#
Add a function to the call queue.
- Parameters
func (callable) – Function to be added to the call queue.
*args (iterable) – Additional positional arguments to be passed in func.
length (int, optional) – Length of wrapped
pandas.DataFrame.width (int, optional) – Width of wrapped
pandas.DataFrame.**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
New
PandasOnPythonDataframePartitionobject with extended call queue.- Return type
- apply(func, *args, **kwargs)#
Apply a function to the object wrapped by this partition.
- Parameters
func (callable) – Function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
New
PandasOnPythonDataframePartitionobject.- Return type
- drain_call_queue()#
Execute all operations stored in the call queue on the object wrapped by this partition.
- get()#
Flush the call_queue and return copy of the data.
- Returns
Copy of DataFrame that was wrapped by this partition.
- Return type
pandas.DataFrame
Notes
Since this object is a simple wrapper, just return the copy of data.
- classmethod preprocess_func(func)#
Preprocess a function before an
applycall.- Parameters
func (callable) – Function to preprocess.
- Returns
An object that can be accepted by
apply.- Return type
callable
Notes
No special preprocessing action is required, so unmodified func will be returned.
- classmethod put(obj)#
Create partition containing obj.
- Parameters
obj (pandas.DataFrame) – DataFrame to be put into the new partition.
- Returns
New
PandasOnPythonDataframePartitionobject.- Return type
- wait()#
Wait for completion of computations on the object wrapped by the partition.
Internally will be done by flushing the call queue.