TransformMapper#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.TransformMapper(op)#
A helper class for
InputMapper
.This class is used to map column references to expressions used for their computation. This mapper is used to fold expressions from multiple
TransformNode
-s into a single expression.- Parameters:
op (TransformNode) – Transformation used for mapping.
- _op#
Transformation used for mapping.
- Type:
FrameMapper#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.FrameMapper(frame)#
A helper class for
InputMapper
.This class is used to map column references to another frame. This mapper is used to replace input frame in expressions.
- Parameters:
frame (HdkOnNativeDataframe) – Target frame.
- _frame#
Target frame.
- Type:
InputMapper#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.InputMapper#
Input reference mapper.
This class is used for input translation/replacement in expressions via
BaseExpr.translate_input
method.Translation is performed using column mappers registered via add_mapper method. Each input frame can have at most one mapper. References to frames with no registered mapper are not translated.
- _mappers#
Column mappers to use for translation.
- Type:
dict
- add_mapper(frame, mapper)#
Register a mapper for a frame.
- Parameters:
frame (HdkOnNativeDataframe) – A frame for which a mapper is registered.
mapper (object) – A mapper to register.
- translate(ref)#
Translate column reference by its name.
- Parameters:
ref (InputRefExpr) – A column reference to translate.
- Returns:
Translated expression.
- Return type:
DFAlgNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.DFAlgNode#
A base class for dataframe algebra tree node.
A dataframe algebra tree is used to describe how dataframe is computed.
- can_execute_arrow() bool #
Check for possibility of Arrow execution.
Check if the computation can be executed using the Arrow API instead of HDK query.
- Return type:
bool
- can_execute_hdk() bool #
Check for possibility of HDK execution.
Check if the computation can be executed using an HDK query.
- Return type:
bool
- collect_frames()#
Collect all frames participating in a tree.
- Returns:
A list of collected frames.
- Return type:
list
- collect_partitions()#
Collect all partitions participating in a tree.
- Returns:
A list of collected partitions.
- Return type:
list
- dump(prefix='')#
Dump the tree.
- Parameters:
prefix (str, default: '') – A prefix to add at each string of the dump.
- dumps(prefix='')#
Return a string representation of the tree.
- Parameters:
prefix (str, default: '') – A prefix to add at each string of the dump.
- Return type:
str
- execute_arrow(arrow_input: Union[None, Table, List[Table]]) Table #
Compute the frame data using the Arrow API.
- Parameters:
arrow_input (None, pa.Table or list of pa.Table) – The input, converted to arrow.
- Returns:
The resulting table.
- Return type:
pyarrow.Table
- require_executed_base() bool #
Check if materialization of input frames is required.
- Return type:
bool
- walk_dfs(cb, *args, **kwargs)#
Perform a depth-first walk over a tree.
Walk over an input in the depth-first order and call a callback function for each node.
- Parameters:
cb (callable) – A callback function.
*args (list) – Arguments for the callback.
**kwargs (dict) – Keyword arguments for the callback.
FrameNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.FrameNode(modin_frame: HdkOnNativeDataframe)#
A node to reference a materialized frame.
- Parameters:
modin_frame (HdkOnNativeDataframe) – Referenced frame.
- modin_frame#
Referenced frame.
- Type:
- can_execute_arrow() bool #
Check for possibility of Arrow execution.
Check if the computation can be executed using the Arrow API instead of HDK query.
- Return type:
bool
- execute_arrow(ignore=None) Union[DbTable, Table, DataFrame] #
Materialized frame.
If can_execute_arrow returns True, this method returns an arrow table, otherwise - a pandas Dataframe or DbTable.
- Parameters:
ignore (None, pa.Table or list of pa.Table, default: None) –
- Return type:
DbTable or pa.Table or pandas.Dataframe
MaskNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.MaskNode(base: HdkOnNativeDataframe, row_labels: List[str] = None, row_positions: List[int] = None)#
A filtering node which filters rows by index values or row id.
- Parameters:
base (HdkOnNativeDataframe) – A filtered frame.
row_labels (list, optional) – List of row labels to select.
row_positions (list of int, optional) – List of rows ids to select.
- input#
Holds a single filtered frame.
- Type:
list of HdkOnNativeDataframe
- row_labels#
List of row labels to select.
- Type:
list or None
- row_positions#
List of rows ids to select.
- Type:
list of int or None
- can_execute_arrow() bool #
Check for possibility of Arrow execution.
Check if the computation can be executed using the Arrow API instead of HDK query.
- Return type:
bool
- execute_arrow(table: Table) Table #
Perform row selection on the frame using Arrow API.
- Parameters:
table (pa.Table) –
- Returns:
The resulting table.
- Return type:
pyarrow.Table
- require_executed_base() bool #
Check if materialization of input frames is required.
- Return type:
bool
GroupbyAggNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.GroupbyAggNode(base, by, agg_exprs, groupby_opts)#
A node to represent a groupby aggregation operation.
- Parameters:
base (DFAlgNode) – An aggregated frame.
by (list of str) – A list of columns used for grouping.
agg_exprs (dict) – Aggregates to compute.
groupby_opts (dict) – Additional groupby parameters.
- by#
A list of columns used for grouping.
- Type:
list of str
- agg_exprs#
Aggregates to compute.
- Type:
dict
- groupby_opts#
Additional groupby parameters.
- Type:
dict
- copy()#
Make a shallow copy of the node.
- Return type:
TransformNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.TransformNode(base: HdkOnNativeDataframe, exprs: Dict[str, Union[InputRefExpr, LiteralExpr, OpExpr]], fold: bool = True)#
A node to represent a projection of a single frame.
Provides expressions to compute each column of the projection.
- Parameters:
base (HdkOnNativeDataframe) – A transformed frame.
exprs (dict) – Expressions for frame’s columns computation.
fold (bool) –
- input#
Holds a single projected frame.
- Type:
list of HdkOnNativeDataframe
- exprs#
Expressions used to compute frame’s columns.
- Type:
dict
- can_execute_arrow() bool #
Check for possibility of Arrow execution.
Check if the computation can be executed using the Arrow API instead of HDK query.
- Return type:
bool
- can_execute_hdk() bool #
Check for possibility of HDK execution.
Check if the computation can be executed using an HDK query.
- Return type:
bool
- copy()#
Make a shallow copy of the node.
- Return type:
- execute_arrow(table: Table) Table #
Perform column selection on the frame using Arrow API.
- Parameters:
table (pa.Table) –
- Returns:
The resulting table.
- Return type:
pyarrow.Table
- is_simple_select()#
Check if transform node is a simple selection.
Simple selection can only use InputRefExpr expressions.
- Returns:
True for simple select and False otherwise.
- Return type:
bool
JoinNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.JoinNode(left, right, how='inner', exprs=None, condition=None)#
A node to represent a join of two frames.
- Parameters:
- input#
Holds joined frames. The first frame in the list is considered as the left join operand.
- Type:
list of DFAlgNode
- how#
A type of join.
- Type:
str
- exprs#
Expressions for the resulting frame’s columns.
- Type:
dict
UnionNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.UnionNode(frames: List[HdkOnNativeDataframe], columns: Dict[str, dtype], ignore_index: bool)#
A node to represent rows union of input frames.
- Parameters:
frames (list of HdkOnNativeDataframe) – Input frames.
columns (dict) – Column names and dtypes.
ignore_index (bool) –
- input#
Input frames.
- Type:
list of HdkOnNativeDataframe
- can_execute_arrow() bool #
Check for possibility of Arrow execution.
Check if the computation can be executed using the Arrow API instead of HDK query.
- Return type:
bool
- can_execute_hdk() bool #
Check for possibility of HDK execution.
Check if the computation can be executed using an HDK query.
- Return type:
bool
- execute_arrow(tables: Union[Table, List[Table]]) Table #
Concat frames’ rows using Arrow API.
- Parameters:
tables (pa.Table or list of pa.Table) –
- Returns:
The resulting table.
- Return type:
pyarrow.Table
- require_executed_base() bool #
Check if materialization of input frames is required.
- Return type:
bool
SortNode#
Public API#
- class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.SortNode(frame, columns, ascending, na_position)#
A sort node to order frame’s rows in a specified order.
- Parameters:
frame (DFAlgNode) – Sorted frame.
columns (list of str) – A list of key columns for a sort.
ascending (list of bool) – Ascending or descending sort.
na_position ({"first", "last"}) – “first” to put NULLs at the start of the result, “last” to put NULLs at the end of the result.
- columns#
A list of key columns for a sort.
- Type:
list of str
- ascending#
Ascending or descending sort.
- Type:
list of bool
- na_position#
“first” to put NULLs at the start of the result, “last” to put NULLs at the end of the result.
- Type:
{“first”, “last”}
FilterNode#
Public API#
Utilities#
Public API#
- modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.translate_exprs_to_base(exprs, base)#
Fold expressions.
Fold expressions with their input nodes until base frame is the only input frame.
- Parameters:
exprs (dict) – Expressions to translate.
base (HdkOnNativeDataframe) – Required input frame for translated expressions.
- Returns:
Translated expressions.
- Return type:
dict
- modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.replace_frame_in_exprs(exprs, old_frame, new_frame)#
Translate input expression replacing an input frame in them.
- Parameters:
exprs (dict) – Expressions to translate.
old_frame (HdkOnNativeDataframe) – An input frame to replace.
new_frame (HdkOnNativeDataframe) – A new input frame to use.
- Returns:
Translated expressions.
- Return type:
dict