Base Query Compiler¶

Brief description¶

BaseQueryCompiler is an abstract class of query compiler, and sets a common interface that every other query compiler implementation in Modin must follow. The Base class contains a basic implementations for most of the interface methods, all of which default to pandas.

Subclassing `BaseQueryCompiler`¶

If you want to add new type of query compiler to Modin the new class needs to inherit from BaseQueryCompiler and implement the abstract methods:

from_pandas() build query compiler from pandas DataFrame.
from_arrow() build query compiler from Arrow Table.
to_pandas() get query compiler representation as pandas DataFrame.
default_to_pandas() do fallback to pandas for the passed function.
finalize() finalize object constructing.
free() trigger memory cleaning.

(Please refer to the code documentation to see the full documentation for these functions).

This is a minimum set of operations to ensure a new query compiler will function in the Modin architecture, and the rest of the API can safely default to the pandas implementation via the base class implementation. To add a backend-specific implementation for some of the query compiler operations, just override the corresponding method in your query compiler class.

Example¶

As an exercise let’s define a new query compiler in Modin, just to see how easy it is. Usually, the query compiler routes formed queries to the underlying frame class, which submits operators to an execution engine. For the sake of simplicity and independence of this example, our execution engine will be the pandas itself.

We need to inherit a new class from BaseQueryCompiler and implement all of the abstract methods. In this case, with pandas as an execution engine, it’s trivial:

from modin.backends import BaseQueryCompiler

class DefaultToPandasQueryCompiler(BaseQueryCompiler):
    def __init__(self, pandas_df):
        self._pandas_df = pandas_df

    @classmethod
    def from_pandas(cls, df, *args, **kwargs):
        return cls(df)

    @classmethod
    def from_arrow(cls, at, *args, **kwargs):
        return cls(at.to_pandas())

    def to_pandas(self):
        return self._pandas_df.copy()

    def default_to_pandas(self, pandas_op, *args, **kwargs):
        return type(self)(pandas_op(self.to_pandas(), *args, **kwargs))

    def finalize(self):
        pass

    def free(self):
        pass

All done! Now you’ve got a fully functional query compiler, which is ready for extensions and already can be used in Modin DataFrame:

import pandas
pandas_df = pandas.DataFrame({"col1": [1, 2, 2, 1], "col2": [10, 2, 3, 40]})
# Building our query compiler from pandas object
qc = DefaultToPandasQueryCompiler.from_pandas(pandas_df)

import modin.pandas as pd
# Building Modin DataFrame from newly created query compiler
modin_df = pd.DataFrame(query_compiler=qc)

# Got fully functional Modin DataFrame
>>> print(modin_df.groupby("col1").sum().reset_index())
   col1  col2
0     1    50
1     2     5

To be able to select this query compiler as default via modin.config you also need to define the combination of your query compiler and pandas execution engine as a backend by adding the corresponding factory. To find more information about factories, visit corresponding section of the flow documentation.

Query Compiler API¶

class modin.backends.base.query_compiler.BaseQueryCompiler¶

Abstract class that handles the queries to Modin dataframes.

This class defines common query compilers API, most of the methods are already implemented and defaulting to pandas.

lazy_execution¶

Whether underlying execution engine is designed to be executed in a lazy mode only. If True, such QueryCompiler will be handled differently at the front-end in order to reduce execution triggering as much as possible.

Type: bool

Notes

See the Abstract Methods and Fields section immediately below this for a list of requirements for subclassing this object.

abs()¶

Get absolute numeric value of each element.

Returns: QueryCompiler with absolute numeric value of each element.
Return type: BaseQueryCompiler

add(other, **kwargs)¶

Perform element-wise addition (self + other).

If axes are not equal, perform frames alignment first.

Parameters

other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

Result of binary operation.

Return type

Base Query Compiler¶

Brief description¶

Subclassing BaseQueryCompiler¶

Example¶

Query Compiler API¶

Subclassing `BaseQueryCompiler`¶