PandasOnPythonDataframePartition#
The class is specific implementation of PandasDataframePartition
,
providing the API to perform operations on a block partition using Python as the execution engine.
In addition to wrapping a pandas.DataFrame
, the class also holds the following metadata:
length
- length ofpandas.DataFrame
wrappedwidth
- width ofpandas.DataFrame
wrapped
An operation on a block partition can be performed in two modes:
immediately via
apply()
- in this case accumulated call queue and new function will be executed immediately.lazily via
add_to_apply_calls()
- in this case function will be added to the call queue and no computations will be done at the moment.
Public API#
- class modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition(data, length=None, width=None, call_queue=None)#
Partition class with interface for pandas storage format and Python engine.
Class holds the data and metadata for a single partition and implements methods of parent abstract class
PandasDataframePartition
.- Parameters
data (pandas.DataFrame) –
pandas.DataFrame
that should be wrapped with this class.length (int, optional) – Length of data (number of rows in the input dataframe).
width (int, optional) – Width of data (number of columns in the input dataframe).
call_queue (list, optional) – Call queue of the partition (list with entities that should be called before partition materialization).
Notes
Objects of this class are treated as immutable by partition manager subclasses. There is no logic for updating in-place.
- apply(func, *args, **kwargs)#
Apply a function to the object wrapped by this partition.
- Parameters
func (callable) – Function to apply.
*args (iterable) – Additional positional arguments to be passed in func.
**kwargs (dict) – Additional keyword arguments to be passed in func.
- Returns
New
PandasOnPythonDataframePartition
object.- Return type
- drain_call_queue()#
Execute all operations stored in the call queue on the object wrapped by this partition.
- execution_wrapper#
alias of
PythonWrapper
- get()#
Flush the call_queue and return copy of the data.
- Returns
Copy of DataFrame that was wrapped by this partition.
- Return type
pandas.DataFrame
Notes
Since this object is a simple wrapper, just return the copy of data.
- classmethod preprocess_func(func)#
Preprocess a function before an
apply
call.- Parameters
func (callable) – Function to preprocess.
- Returns
An object that can be accepted by
apply
.- Return type
callable
Notes
No special preprocessing action is required, so unmodified func will be returned.
- classmethod put(obj)#
Create partition containing obj.
- Parameters
obj (pandas.DataFrame) – DataFrame to be put into the new partition.
- Returns
New
PandasOnPythonDataframePartition
object.- Return type
- wait()#
Wait for completion of computations on the object wrapped by the partition.
Internally will be done by flushing the call queue.