OmnisciOnNativeFrame

Public API

class modin.experimental.engines.omnisci_on_native.frame.data.OmnisciOnNativeFrame(partitions=None, index=None, columns=None, row_lengths=None, column_widths=None, dtypes=None, op=None, index_cols=None, uses_rowid=False, force_execution_mode=None, has_unsupported_data=False)

Lazy dataframe based on Arrow table representation and embedded OmniSci backend.

Currently, materialized dataframe always has a single partition. This partition can hold either Arrow table or pandas dataframe.

Operations on a dataframe are not instantly executed and build an operations tree instead. When frame’s data is accessed this tree is transformed into a query which is executed in OmniSci backend. In case of simple transformations Arrow API can be used instead of OmniSci backend.

Since frames are used as an input for other frames, all operations produce new frames and are not executed in-place.

Parameters
  • partitions (np.ndarray, optional) – Partitions of the frame.

  • index (pandas.Index, optional) – Index of the frame to be used as an index cache. If None then will be computed on demand.

  • columns (pandas.Index, optional) – Columns of the frame.

  • row_lengths (np.ndarray, optional) – Partition lengths. Should be None if lengths are unknown.

  • column_widths (np.ndarray, optional) – Partition widths. Should be None if widths are unknown.

  • dtypes (pandas.Index, optional) – Column data types.

  • op (DFAlgNode, optional) – A tree describing how frame is computed. For materialized frames it is always FrameNode.

  • index_cols (list of str, optional) – A list of columns included into the frame’s index. None value means a default index (row id is used as an index).

  • uses_rowid (bool, default: False) – True for frames which require access to the virtual ‘rowid’ column for its execution.

  • force_execution_mode (str or None) – Used by tests to control frame’s execution process.

  • has_unsupported_data (bool) – True for frames holding data not supported by Arrow or OmniSci backend.

id

ID of the frame. Used for debug prints only.

Type

int

_op

A tree to be used to compute the frame. For materialized frames it is always FrameNode.

Type

DFAlgNode

_partitions

Partitions of the frame. For materialized dataframes it holds a single partition. None for frames requiring execution.

Type

numpy.ndarray or None

_index_cols

Names of index columns. None for default index. Index columns have mangled names to handle labels which cannot be directly used as an OmniSci table column name (e.g. non-string labels, SQL keywords etc.).

Type

list of str or None

_table_cols

A list of all frame’s columns. It includes index columns if any. Index columns are always in the head of the list.

Type

list of str

_index_cache

Materialized index of the frame or None when index is not materialized.

Type

pandas.Index or None

_has_unsupported_data

True for frames holding data not supported by Arrow or OmniSci backend. Operations on such frames are not allowed and should be defaulted to pandas instead.

Type

bool

_dtypes

Column types.

Type

pandas.Series

_uses_rowid

True for frames which require access to the virtual ‘rowid’ column for its execution.

Type

bool

_force_execution_mode

Used by tests to control frame’s execution process. Value “lazy” is used to raise RuntimeError if execution is triggered for the frame. Value “arrow” is used to raise RuntimeError execution is triggered and cannot be done using Arrow API (have to use OmniSci for execution).

Type

str or None

agg(agg)

Perform specified aggregation along columns.

Parameters

agg (str) – Name of the aggregation function to perform.

Returns

New frame containing the result of aggregation.

Return type

OmnisciOnNativeFrame

astype(col_dtypes, **kwargs)

Cast frame columns to specified types.

Parameters
  • col_dtypes (dict) – Maps column names to new data types.

  • **kwargs (dict) – Keyword args. Not used.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

bin_op(other, op_name, **kwargs)

Perform binary operation.

An arithemtic binary operation or a comparison operation to perform on columns.

Parameters
  • other (scalar, list-like, or OmnisciOnNativeFrame) – The second operand.

  • op_name (str) – An operation to perform.

  • **kwargs (dict) – Keyword args.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

cat_codes()

Extract codes for a category column.

The frame should have a single data column.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

property columns

Return column labels of the frame.

Returns

Return type

pandas.Index

concat(axis, other_modin_frames, join='outer', sort=False, ignore_index=False)

Concatenate frames along a particular axis.

Parameters
  • axis (0 or 1) – The axis to concatenate along.

  • other_modin_frames (list of OmnisciOnNativeFrame) – Frames to concat.

  • join ({"outer", "inner"}, default: "outer") – How to handle mismatched indexes on other axis.

  • sort (bool, default: False) – Sort non-concatenation axis if it is not already aligned when join is ‘outer’.

  • ignore_index (bool, default: False) – Ignore index along the concatenation axis.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

dropna(subset, how='any')

Drop rows with NULLs.

Parameters
  • subset (list of str) – Columns to check.

  • how ({"any", "all"}, default: "any") – Determine if row is removed from DataFrame, when we have at least one NULL or all NULLs.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

dt_extract(obj)

Extract a date or a time unit from a datetime value.

Parameters

obj (str) – Datetime unit to expract.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

property dtypes

Return column data types.

Returns

A pandas Series containing the data types for this dataframe.

Return type

pandas.Series

fillna(value=None, method=None, axis=None, limit=None, downcast=None)

Replace NULLs operation.

Parameters
  • value (dict or scalar, optional) – A value to replace NULLs with. Can be a dictionary to assign different values to columns.

  • method (None, optional) – Should be None.

  • axis ({0, 1}, optional) – Should be 0.

  • limit (None, optional) – Should be None.

  • downcast (None, optional) – Should be None.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

filter(key)

Filter rows by a boolean key column.

Parameters

key (OmnisciOnNativeFrame) – A frame with a single bool data column used as a filter.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

classmethod from_arrow(at, index_cols=None, index=None)

Build a frame from an Arrow table.

Parameters
  • at (pyarrow.Table) – Source table.

  • index_cols (list of str, optional) – List of index columns in the source table which are ignored in tranformation.

  • index (pandas.Index, optional) – An index to be used by the new frame. Should present if index_cols is not None.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

classmethod from_pandas(df)

Build a frame from a pandas.DataFrame.

Parameters

df (pandas.DataFrame) – Source frame.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

get_dtype(col)

Get data type for a column.

Parameters

col (str) – Column name.

Returns

Return type

dtype

get_index_name()

Get the name of the index column.

Returns None for default index and multi-index.

Returns

Return type

str or None

get_index_names()

Get index column names.

Returns

Return type

list of str

groupby_agg(by, axis, agg, groupby_args, **kwargs)

Groupby with aggregation operation.

Parameters
  • by (DFAlgQueryCompiler or list-like of str) – Grouping keys.

  • axis ({0, 1}) – Only rows groupby is supported, so should be 0.

  • agg (str or dict) – Aggregates to compute.

  • groupby_args (dict) – Additional groupby args.

  • **kwargs (dict) – Keyword args. Currently ignored.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

has_multiindex()

Check for multi-index usage.

Return True if the frame has a multi-index (index with multiple columns) and False otherwise.

Returns

Return type

bool

id_str()

Return string identifier of the frame.

Used for debug dumps.

Returns

Return type

str

property index

Get the index of the frame in pandas format.

Materializes the frame if required.

Returns

Return type

pandas.Index

insert(loc, column, value)

Insert a constant column.

Parameters
  • loc (int) – Inserted column location.

  • column (str) – Inserted column name.

  • value (scalar) – Inserted column value.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

join(other, how='inner', left_on=None, right_on=None, sort=False, suffixes=('_x', '_y'))

Join operation.

Parameters
  • other (OmnisciOnNativeFrame) – A frame to join with.

  • how (str, default: "inner") – A type of join.

  • left_on (list of str, optional) – A list of columns for the left frame to join on.

  • right_on (list of str, optional) – A list of columns for the right frame to join on.

  • sort (bool, default: False) – Sort the result by join keys.

  • suffixes (list-like of str, default: ("_x", "_y")) – A length-2 sequence of suffixes to add to overlapping column names of left and right operands respectively.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

mask(row_indices=None, row_numeric_idx=None, col_indices=None, col_numeric_idx=None)

Mask operation.

Parameters
  • row_indices (list, optional) – Indices of rows to select.

  • row_numeric_idx (list of int, optional) – Numeric indices of rows to select.

  • col_indices (list, optional) – Indices of columns to select.

  • col_numeric_idx (list of int, optional) – Numeric indices of columns to select.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

ref(col)

Return an expression referencing a frame’s column.

Parameters

col (str) – Column name.

Returns

Return type

InputRefExpr

reset_index(drop)

Set the default index for the frame.

Parameters

drop (bool) – If True then drop current index columns, otherwise make them data columns.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

set_index_name(name)

Set new name for the index column.

Sohuldn’t be called for frames with multi-index.

Parameters

name (str or None) – New index name.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

set_index_names(names)

Set index labels for frames with multi-index.

Parameters

names (list of str) – New index labels.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

sort_rows(columns, ascending, ignore_index, na_position)

Sort rows of the frame.

Parameters
  • columns (str or list of str) – Sorting keys.

  • ascending (bool or list of bool) – Sort order.

  • ignore_index (bool) – Drop index columns.

  • na_position ({"first", "last"}) – NULLs position.

Returns

The new frame.

Return type

OmnisciOnNativeFrame

to_pandas()

Transform the frame to pandas format.

Returns

Return type

pandas.DataFrame