TransformMapper#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.TransformMapper(op)#

A helper class for InputMapper.

This class is used to map column references to expressions used for their computation. This mapper is used to fold expressions from multiple TransformNode-s into a single expression.

Parameters:

op (TransformNode) – Transformation used for mapping.

_op#

Transformation used for mapping.

Type:

TransformNode

translate(col)#

Translate column reference by its name.

Parameters:

col (str) – A name of the column to translate.

Returns:

Translated expression.

Return type:

BaseExpr

FrameMapper#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.FrameMapper(frame)#

A helper class for InputMapper.

This class is used to map column references to another frame. This mapper is used to replace input frame in expressions.

Parameters:

frame (HdkOnNativeDataframe) – Target frame.

_frame#

Target frame.

Type:

HdkOnNativeDataframe

translate(col)#

Translate column reference by its name.

Parameters:

col (str) – A name of the column to translate.

Returns:

Translated expression.

Return type:

BaseExpr

InputMapper#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.InputMapper#

Input reference mapper.

This class is used for input translation/replacement in expressions via BaseExpr.translate_input method.

Translation is performed using column mappers registered via add_mapper method. Each input frame can have at most one mapper. References to frames with no registered mapper are not translated.

_mappers#

Column mappers to use for translation.

Type:

dict

add_mapper(frame, mapper)#

Register a mapper for a frame.

Parameters:
  • frame (HdkOnNativeDataframe) – A frame for which a mapper is registered.

  • mapper (object) – A mapper to register.

translate(ref)#

Translate column reference by its name.

Parameters:

ref (InputRefExpr) – A column reference to translate.

Returns:

Translated expression.

Return type:

BaseExpr

DFAlgNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.DFAlgNode#

A base class for dataframe algebra tree node.

A dataframe algebra tree is used to describe how dataframe is computed.

input#

Holds child nodes.

Type:

list of DFAlgNode, optional

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

can_execute_hdk() bool#

Check for possibility of HDK execution.

Check if the computation can be executed using an HDK query.

Return type:

bool

collect_frames()#

Collect all frames participating in a tree.

Returns:

A list of collected frames.

Return type:

list

collect_partitions()#

Collect all partitions participating in a tree.

Returns:

A list of collected partitions.

Return type:

list

abstract copy()#

Make a shallow copy of the node.

Return type:

DFAlgNode

dump(prefix='')#

Dump the tree.

Parameters:

prefix (str, default: '') – A prefix to add at each string of the dump.

dumps(prefix='')#

Return a string representation of the tree.

Parameters:

prefix (str, default: '') – A prefix to add at each string of the dump.

Return type:

str

execute_arrow(arrow_input: Union[None, Table, List[Table]]) Table#

Compute the frame data using the Arrow API.

Parameters:

arrow_input (None, pa.Table or list of pa.Table) – The input, converted to arrow.

Returns:

The resulting table.

Return type:

pyarrow.Table

require_executed_base() bool#

Check if materialization of input frames is required.

Return type:

bool

walk_dfs(cb, *args, **kwargs)#

Perform a depth-first walk over a tree.

Walk over an input in the depth-first order and call a callback function for each node.

Parameters:
  • cb (callable) – A callback function.

  • *args (list) – Arguments for the callback.

  • **kwargs (dict) – Keyword arguments for the callback.

FrameNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.FrameNode(modin_frame: HdkOnNativeDataframe)#

A node to reference a materialized frame.

Parameters:

modin_frame (HdkOnNativeDataframe) – Referenced frame.

modin_frame#

Referenced frame.

Type:

HdkOnNativeDataframe

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

copy()#

Make a shallow copy of the node.

Return type:

FrameNode

execute_arrow(ignore=None) Union[DbTable, Table, DataFrame]#

Materialized frame.

If can_execute_arrow returns True, this method returns an arrow table, otherwise - a pandas Dataframe or DbTable.

Parameters:

ignore (None, pa.Table or list of pa.Table, default: None) –

Return type:

DbTable or pa.Table or pandas.Dataframe

MaskNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.MaskNode(base: HdkOnNativeDataframe, row_labels: List[str] = None, row_positions: List[int] = None)#

A filtering node which filters rows by index values or row id.

Parameters:
  • base (HdkOnNativeDataframe) – A filtered frame.

  • row_labels (list, optional) – List of row labels to select.

  • row_positions (list of int, optional) – List of rows ids to select.

input#

Holds a single filtered frame.

Type:

list of HdkOnNativeDataframe

row_labels#

List of row labels to select.

Type:

list or None

row_positions#

List of rows ids to select.

Type:

list of int or None

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

copy()#

Make a shallow copy of the node.

Return type:

MaskNode

execute_arrow(table: Table) Table#

Perform row selection on the frame using Arrow API.

Parameters:

table (pa.Table) –

Returns:

The resulting table.

Return type:

pyarrow.Table

require_executed_base() bool#

Check if materialization of input frames is required.

Return type:

bool

GroupbyAggNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.GroupbyAggNode(base, by, agg_exprs, groupby_opts)#

A node to represent a groupby aggregation operation.

Parameters:
  • base (DFAlgNode) – An aggregated frame.

  • by (list of str) – A list of columns used for grouping.

  • agg_exprs (dict) – Aggregates to compute.

  • groupby_opts (dict) – Additional groupby parameters.

input#

Holds a single aggregated frame.

Type:

list of DFAlgNode

by#

A list of columns used for grouping.

Type:

list of str

agg_exprs#

Aggregates to compute.

Type:

dict

groupby_opts#

Additional groupby parameters.

Type:

dict

copy()#

Make a shallow copy of the node.

Return type:

GroupbyAggNode

TransformNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.TransformNode(base: HdkOnNativeDataframe, exprs: Dict[str, Union[InputRefExpr, LiteralExpr, OpExpr]], fold: bool = True)#

A node to represent a projection of a single frame.

Provides expressions to compute each column of the projection.

Parameters:
  • base (HdkOnNativeDataframe) – A transformed frame.

  • exprs (dict) – Expressions for frame’s columns computation.

  • fold (bool) –

input#

Holds a single projected frame.

Type:

list of HdkOnNativeDataframe

exprs#

Expressions used to compute frame’s columns.

Type:

dict

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

can_execute_hdk() bool#

Check for possibility of HDK execution.

Check if the computation can be executed using an HDK query.

Return type:

bool

copy()#

Make a shallow copy of the node.

Return type:

TransformNode

execute_arrow(table: Table) Table#

Perform column selection on the frame using Arrow API.

Parameters:

table (pa.Table) –

Returns:

The resulting table.

Return type:

pyarrow.Table

is_simple_select()#

Check if transform node is a simple selection.

Simple selection can only use InputRefExpr expressions.

Returns:

True for simple select and False otherwise.

Return type:

bool

JoinNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.JoinNode(left, right, how='inner', exprs=None, condition=None)#

A node to represent a join of two frames.

Parameters:
  • left (DFAlgNode) – A left frame to join.

  • right (DFAlgNode) – A right frame to join.

  • how (str, default: "inner") – A type of join.

  • exprs (dict, default: None) – Expressions for the resulting frame’s columns.

  • condition (BaseExpr, default: None) – Join condition.

input#

Holds joined frames. The first frame in the list is considered as the left join operand.

Type:

list of DFAlgNode

how#

A type of join.

Type:

str

exprs#

Expressions for the resulting frame’s columns.

Type:

dict

condition#

Join condition.

Type:

BaseExpr

property by_rowid#

Return True if this is a join by the rowid column.

Return type:

bool

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

copy()#

Make a shallow copy of the node.

Return type:

JoinNode

execute_arrow(tables: List[Table]) Table#

Compute the frame data using the Arrow API.

Parameters:

arrow_input (None, pa.Table or list of pa.Table) – The input, converted to arrow.

Returns:

The resulting table.

Return type:

pyarrow.Table

require_executed_base() bool#

Check if materialization of input frames is required.

Return type:

bool

UnionNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.UnionNode(frames: List[HdkOnNativeDataframe], columns: Dict[str, dtype], ignore_index: bool)#

A node to represent rows union of input frames.

Parameters:
  • frames (list of HdkOnNativeDataframe) – Input frames.

  • columns (dict) – Column names and dtypes.

  • ignore_index (bool) –

input#

Input frames.

Type:

list of HdkOnNativeDataframe

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

can_execute_hdk() bool#

Check for possibility of HDK execution.

Check if the computation can be executed using an HDK query.

Return type:

bool

copy()#

Make a shallow copy of the node.

Return type:

UnionNode

execute_arrow(tables: Union[Table, List[Table]]) Table#

Concat frames’ rows using Arrow API.

Parameters:

tables (pa.Table or list of pa.Table) –

Returns:

The resulting table.

Return type:

pyarrow.Table

require_executed_base() bool#

Check if materialization of input frames is required.

Return type:

bool

SortNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.SortNode(frame, columns, ascending, na_position)#

A sort node to order frame’s rows in a specified order.

Parameters:
  • frame (DFAlgNode) – Sorted frame.

  • columns (list of str) – A list of key columns for a sort.

  • ascending (list of bool) – Ascending or descending sort.

  • na_position ({"first", "last"}) – “first” to put NULLs at the start of the result, “last” to put NULLs at the end of the result.

input#

Holds a single sorted frame.

Type:

list of DFAlgNode

columns#

A list of key columns for a sort.

Type:

list of str

ascending#

Ascending or descending sort.

Type:

list of bool

na_position#

“first” to put NULLs at the start of the result, “last” to put NULLs at the end of the result.

Type:

{“first”, “last”}

copy()#

Make a shallow copy of the node.

Return type:

SortNode

FilterNode#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.FilterNode(frame, condition)#

A node for generic rows filtering.

For rows filter by row id a MaskNode should be preferred.

Parameters:
input#

Holds a single filtered frame.

Type:

list of DFAlgNode

condition#

Filter condition.

Type:

BaseExpr

copy()#

Make a shallow copy of the node.

Return type:

FilterNode

Utilities#

Public API#

modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.translate_exprs_to_base(exprs, base)#

Fold expressions.

Fold expressions with their input nodes until base frame is the only input frame.

Parameters:
  • exprs (dict) – Expressions to translate.

  • base (HdkOnNativeDataframe) – Required input frame for translated expressions.

Returns:

Translated expressions.

Return type:

dict

modin.experimental.core.execution.native.implementations.hdk_on_native.df_algebra.replace_frame_in_exprs(exprs, old_frame, new_frame)#

Translate input expression replacing an input frame in them.

Parameters:
Returns:

Translated expressions.

Return type:

dict