BaseExpr#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.expr.BaseExpr#

An abstract base class for expression tree node.

An expression tree is used to describe how a single column of a dataframe is computed.

Each node can belong to multiple trees and therefore should be immutable until proven to have no parent nodes (e.g. by making a copy).

operands#

Holds child nodes. Leaf nodes shouldn’t have operands attribute.

Type:

list of BaseExpr, optional

add(other)#

Build an add expression.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting add expression.

Return type:

BaseExpr

bin_op(other, op_name)#

Build a binary operation expression.

Parameters:
  • other (BaseExpr) – The second operand.

  • op_name (str) – A binary operation name.

Returns:

The resulting binary operation expression.

Return type:

BaseExpr

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

can_execute_hdk() bool#

Check for possibility of HDK execution.

Check if the computation can be executed using an HDK query.

Return type:

bool

cast(res_type)#

Build a cast expression.

Parameters:

res_type (dtype) – A data type to cast to.

Returns:

The cast expression.

Return type:

BaseExpr

cmp(op, other)#

Build a comparison expression with other.

Parameters:
  • op (str) – A comparison operation.

  • other (BaseExpr or scalar) – An operand to compare with.

Returns:

The resulting comparison expression.

Return type:

BaseExpr

collect_frames(frames)#

Recursively collect all frames participating in the expression.

Collected frames are put into the frames set. Default implementation collects frames from the operands of the expression. Derived classes directly holding frames should provide their own implementations.

Parameters:

frames (set) – Output set of collected frames.

abstract copy()#

Make a shallow copy of the expression.

Return type:

BaseExpr

eq(other)#

Build an equality comparison of self with other.

Parameters:

other (BaseExpr or scalar) – An operand to compare with.

Returns:

The resulting comparison expression.

Return type:

BaseExpr

execute_arrow(table: Table) ChunkedArray#

Compute the column data using the Arrow API.

Parameters:

table (pa.Table) –

Return type:

pa.ChunkedArray

floor()#

Build a floor expression.

Returns:

The resulting floor expression.

Return type:

BaseExpr

floordiv(other)#

Build a floordiv expression.

The result always has an integer data type.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting floordiv expression.

Return type:

BaseExpr

fold()#

Fold the operands.

This operation is used by TransformNode when translating to base.

Return type:

BaseExpr

ge(other)#

Build a greater or equal comparison with other.

Parameters:

other (BaseExpr or scalar) – An operand to compare with.

Returns:

The resulting comparison expression.

Return type:

BaseExpr

invert() OpExpr#

Build a bitwise inverse expression.

Returns:

The resulting bitwise inverse expression.

Return type:

OpExpr

is_not_null()#

Build a NOT NULL check expression.

Returns:

The NOT NULL check expression.

Return type:

BaseExpr

is_null()#

Build a NULL check expression.

Returns:

The NULL check expression.

Return type:

BaseExpr

le(other)#

Build a less or equal comparison with other.

Parameters:

other (BaseExpr or scalar) – An operand to compare with.

Returns:

The resulting comparison expression.

Return type:

BaseExpr

mod(other)#

Build a mod expression.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting mod expression.

Return type:

BaseExpr

mul(other)#

Build a mul expression.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting mul expression.

Return type:

BaseExpr

nested_expressions() Generator[Type[BaseExpr], Type[BaseExpr], Type[BaseExpr]]#

Return a generator that allows to iterate over and replace the nested expressions.

If the generator receives a new expression, it creates a copy of self and replaces the expression in the copy. The copy is returned to the sender.

Return type:

Generator

pow(other)#

Build a power expression.

Parameters:

other (BaseExpr) – The power operand.

Returns:

The resulting power expression.

Return type:

BaseExpr

sub(other)#

Build a sub expression.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting sub expression.

Return type:

BaseExpr

translate_input(mapper)#

Make a deep copy of the expression translating input nodes using mapper.

The default implementation builds a copy and recursively run translation for all its operands. For leaf expressions _translate_input is called.

Parameters:

mapper (InputMapper) – A mapper to use for input columns translation.

Returns:

The expression copy with translated input columns.

Return type:

BaseExpr

truediv(other)#

Build a truediv expression.

The result always has float data type.

Parameters:

other (BaseExpr) – The second operand.

Returns:

The resulting truediv expression.

Return type:

BaseExpr

InputRefExpr#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.expr.InputRefExpr(frame, col, dtype)#

An expression tree node to represent an input frame column.

Parameters:
  • frame (HdkOnNativeDataframe) – An input frame.

  • col (str) – An input column name.

  • dtype (dtype) – Input column data type.

modin_frame#

An input frame.

Type:

HdkOnNativeDataframe

column#

An input column name.

Type:

str

_dtype#

Input column data type.

Type:

dtype

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

collect_frames(frames)#

Add referenced frame to the frames set.

Parameters:

frames (set) – Output set of collected frames.

copy()#

Make a shallow copy of the expression.

Return type:

InputRefExpr

execute_arrow(table: Table) ChunkedArray#

Compute the column data using the Arrow API.

Parameters:

table (pa.Table) –

Return type:

pa.ChunkedArray

fold()#

Fold the operands.

This operation is used by TransformNode when translating to base.

Return type:

BaseExpr

LiteralExpr#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.expr.LiteralExpr(val, dtype=None)#

An expression tree node to represent a literal value.

Parameters:
  • val (int, np.int, float, bool, str, np.datetime64 or None) – Literal value.

  • dtype (None or dtype, default: None) – Value dtype.

val#

Literal value.

Type:

int, np.int, float, bool, str, np.datetime64 or None

_dtype#

Literal data type.

Type:

dtype

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

cast(res_type)#

Build a cast expression.

Parameters:

res_type (dtype) – A data type to cast to.

Returns:

The cast expression.

Return type:

BaseExpr

copy()#

Make a shallow copy of the expression.

Return type:

LiteralExpr

execute_arrow(table: Table) ChunkedArray#

Compute the column data using the Arrow API.

Parameters:

table (pa.Table) –

Return type:

pa.ChunkedArray

fold()#

Fold the operands.

This operation is used by TransformNode when translating to base.

Return type:

BaseExpr

is_not_null()#

Build a NULL check expression.

Returns:

The NULL check expression.

Return type:

BaseExpr

is_null()#

Build a NULL check expression.

Returns:

The NULL check expression.

Return type:

BaseExpr

OpExpr#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.expr.OpExpr(op, operands, dtype)#

A generic operation expression.

Used for arithmetic, comparisons, conditional operations, etc.

Parameters:
  • op (str) – Operation name.

  • operands (list of BaseExpr) – Operation operands.

  • dtype (dtype) – Result data type.

op#

Operation name.

Type:

str

operands#

Operation operands.

Type:

list of BaseExpr

_dtype#

Result data type.

Type:

dtype

partition_keys#

This attribute is used with window functions only and contains a list of column expressions to partition the result set.

Type:

list of BaseExpr, optional

order_keys#

This attribute is used with window functions only and contains order clauses.

Type:

list of dict, optional

lower_bound#

Lover bound for windowed aggregates.

Type:

dict, optional

upper_bound#

Upper bound for windowed aggregates.

Type:

dict, optional

can_execute_arrow() bool#

Check for possibility of Arrow execution.

Check if the computation can be executed using the Arrow API instead of HDK query.

Return type:

bool

can_execute_hdk() bool#

Check for possibility of HDK execution.

Check if the computation can be executed using an HDK query.

Return type:

bool

copy()#

Make a shallow copy of the expression.

Return type:

OpExpr

execute_arrow(table: Table) ChunkedArray#

Compute the column data using the Arrow API.

Parameters:

table (pa.Table) –

Return type:

pa.ChunkedArray

fold()#

Fold the operands.

This operation is used by TransformNode when translating to base.

Return type:

BaseExpr

nested_expressions() Generator[Type[BaseExpr], Type[BaseExpr], Type[BaseExpr]]#

Return a generator that allows to iterate over and replace the nested expressions.

If the generator receives a new expression, it creates a copy of self and replaces the expression in the copy. The copy is returned to the sender.

Return type:

Generator

set_window_opts(partition_keys, order_keys, order_ascending, na_pos)#

Set the window function options.

Parameters:
  • partition_keys (list of BaseExpr) –

  • order_keys (list of BaseExpr) –

  • order_ascending (list of bool) –

  • na_pos ({"FIRST", "LAST"}) –

AggregateExpr#

Public API#

class modin.experimental.core.execution.native.implementations.hdk_on_native.expr.AggregateExpr(agg, op, distinct=False, dtype=None)#

An aggregate operation expression.

Parameters:
  • agg (str) – Aggregate name.

  • op (BaseExpr or list of BaseExpr) – Aggregate operand.

  • distinct (bool, default: False) – Distinct modifier for ‘count’ aggregate.

  • dtype (dtype, optional) – Aggregate data type. Computed if not specified.

agg#

Aggregate name.

Type:

str

operands#

Aggregate operands.

Type:

list of BaseExpr

distinct#

Distinct modifier for ‘count’ aggregate.

Type:

bool

_dtype#

Aggregate data type.

Type:

dtype

copy()#

Make a shallow copy of the expression.

Return type:

AggregateExpr

Utilities#

Public API#

modin.experimental.core.execution.native.implementations.hdk_on_native.expr.is_cmp_op(op)#

Check if operation is a comparison.

Parameters:

op (str) – Operation to check.

Returns:

True for comparison operations and False otherwise.

Return type:

bool

modin.experimental.core.execution.native.implementations.hdk_on_native.expr.build_row_idx_filter_expr(row_idx, row_col)#

Build an expression to filter rows by rowid.

Parameters:
  • row_idx (int or list of int) – The row numeric indices to select.

  • row_col (InputRefExpr) – The rowid column reference expression.

Returns:

The resulting filtering expression.

Return type:

BaseExpr

modin.experimental.core.execution.native.implementations.hdk_on_native.expr.build_if_then_else(cond, then_val, else_val, res_type)#

Build a conditional operator expression.

Parameters:
  • cond (BaseExpr) – A condition to check.

  • then_val (BaseExpr) – A value to use for passed condition.

  • else_val (BaseExpr) – A value to use for failed condition.

  • res_type (dtype) – The result data type.

Returns:

The conditional operator expression.

Return type:

BaseExpr

modin.experimental.core.execution.native.implementations.hdk_on_native.expr.build_dt_expr(dt_operation, col_expr)#

Build a datetime extraction expression.

Parameters:
  • dt_operation (str) – Datetime field to extract.

  • col_expr (BaseExpr) – An expression to extract from.

Returns:

The extract expression.

Return type:

BaseExpr