PandasQueryCompiler#

PandasQueryCompiler is responsible for compiling a set of known predefined functions and pairing those with dataframe algebra operators in the PandasDataframe, specifically for dataframes backed by pandas.DataFrame objects.

Each PandasQueryCompiler contains an instance of PandasDataframe which it queries to get the result.

PandasQueryCompiler supports methods built by the algebra module. If you want to add an implementation for a query compiler method, visit the algebra module documentation to see whether the new operation fits one of the existing function templates and can be easily implemented with them.

Public API#

PandasQueryCompiler implements common query compilers API defined by the BaseQueryCompiler. Some functionalities are inherited from the base class, in the following section only overridden methods are presented.

class modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler(modin_frame)#

Query compiler for the pandas storage format.

This class translates common query compiler API into the DataFrame Algebra queries, that is supposed to be executed by PandasDataframe.

Parameters

modin_frame (PandasDataframe) – Modin Frame to query with the compiled queries.

abs(*args, **kwargs)#

Execute Map function against passed query compiler.

add(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

add_prefix(prefix, axis=1)#

Add string prefix to the index labels along specified axis.

Parameters
  • prefix (str) – The string to add before each label.

  • axis ({0, 1}, default: 1) – Axis to add prefix along. 0 is for index and 1 is for columns.

Returns

New query compiler with updated labels.

Return type

BaseQueryCompiler

add_suffix(suffix, axis=1)#

Add string suffix to the index labels along specified axis.

Parameters
  • suffix (str) – The string to add after each label.

  • axis ({0, 1}, default: 1) – Axis to add suffix along. 0 is for index and 1 is for columns.

Returns

New query compiler with updated labels.

Return type

BaseQueryCompiler

all(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

any(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

apply(func, axis, *args, **kwargs)#

Apply passed function across given axis.

Parameters
  • func (callable(pandas.Series) -> scalar, str, list or dict of such) – The function to apply to each column or row.

  • axis ({0, 1}) – Target axis to apply the function along. 0 is for index, 1 is for columns.

  • raw (bool, default: False) – Whether to pass a high-level Series object (False) or a raw representation of the data (True).

  • result_type ({"expand", "reduce", "broadcast", None}, default: None) –

    Determines how to treat list-like return type of the func (works only if a single function was passed):

    • ”expand”: expand list-like result into columns.

    • ”reduce”: keep result into a single cell (opposite of “expand”).

    • ”broadcast”: broadcast result to original data shape (overwrite the existing column/row with the function result).

    • None: use “expand” strategy if Series is returned, “reduce” otherwise.

  • *args (iterable) – Positional arguments to pass to func.

  • **kwargs (dict) – Keyword arguments to pass to func.

Returns

QueryCompiler that contains the results of execution and is built by the following rules:

  • Index of the specified axis contains: the names of the passed functions if multiple functions are passed, otherwise: indices of the func result if “expand” strategy is used, indices of the original frame if “broadcast” strategy is used, a single label MODIN_UNNAMED_SERIES_LABEL if “reduce” strategy is used.

  • Labels of the opposite axis are preserved.

  • Each element is the result of execution of func against corresponding row/column.

Return type

BaseQueryCompiler

apply_on_series(func, *args, **kwargs)#

Apply passed function on underlying Series.

Parameters
  • func (callable(pandas.Series) -> scalar, str, list or dict of such) – The function to apply to each row.

  • *args (iterable) – Positional arguments to pass to func.

  • **kwargs (dict) – Keyword arguments to pass to func.

Return type

BaseQueryCompiler

applymap(*args, **kwargs)#

Execute Map function against passed query compiler.

astype(col_dtypes, errors: str = 'raise')#

Convert columns dtypes to given dtypes.

Parameters
  • col_dtypes (dict) – Map for column names and new dtypes.

  • errors ({'raise', 'ignore'}, default: 'raise') – Control raising of exceptions on invalid data for provided dtype. - raise : allow exceptions to be raised - ignore : suppress exceptions. On error return original object.

Returns

New QueryCompiler with updated dtypes.

Return type

BaseQueryCompiler

cat_codes()#

Convert underlying categories data into its codes.

Returns

New QueryCompiler containing the integer codes of the underlying categories.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.cat.codes for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

clip(lower, upper, **kwargs)#

Trim values at input threshold.

Parameters
  • lower (float or list-like) –

  • upper (float or list-like) –

  • axis ({0, 1}) –

  • inplace ({False}) – This parameter serves the compatibility purpose. Always has to be False.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler with values limited by the specified thresholds.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.clip for more information about parameters and output format.

columnarize()#

Transpose this QueryCompiler if it has a single row but multiple columns.

This method should be called for QueryCompilers representing a Series object, i.e. self.is_series_like() should be True.

Returns

Transposed new QueryCompiler or self.

Return type

BaseQueryCompiler

combine(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

combine_first(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

compare(other, **kwargs)#

Compare data of two QueryCompilers and highlight the difference.

Parameters
  • other (BaseQueryCompiler) – Query compiler to compare with. Have to be the same shape and the same labeling as self.

  • align_axis ({0, 1}) –

  • keep_shape (bool) –

  • keep_equal (bool) –

  • result_names (tuple) –

Returns

New QueryCompiler containing the differences between self and passed query compiler.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.compare for more information about parameters and output format.

concat(axis, other, **kwargs)#

Concatenate self with passed query compilers along specified axis.

Parameters
  • axis ({0, 1}) – Axis to concatenate along. 0 is for index and 1 is for columns.

  • other (BaseQueryCompiler or list of such) – Objects to concatenate with self.

  • join ({'outer', 'inner', 'right', 'left'}, default: 'outer') – Type of join that will be used if indices on the other axis are different. (note: if specified, has to be passed as join=value).

  • ignore_index (bool, default: False) – If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. (note: if specified, has to be passed as ignore_index=value).

  • sort (bool, default: False) – Whether or not to sort non-concatenation axis. (note: if specified, has to be passed as sort=value).

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

Concatenated objects.

Return type

BaseQueryCompiler

conj(*args, **kwargs)#

Execute Map function against passed query compiler.

convert_dtypes(*args, **kwargs)#

Execute Map function against passed query compiler.

copy()#

Make a copy of this object.

Returns

Copy of self.

Return type

BaseQueryCompiler

Notes

For copy, we don’t want a situation where we modify the metadata of the copies if we end up modifying something here. We copy all of the metadata to prevent that.

corr(method='pearson', min_periods=1)#

Compute pairwise correlation of columns, excluding NA/null values.

Parameters
  • method ({'pearson', 'kendall', 'spearman'} or callable(pandas.Series, pandas.Series) -> pandas.Series) – Correlation method.

  • min_periods (int) – Minimum number of observations required per pair of columns to have a valid result. If fewer than min_periods non-NA values are present the result will be NA.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

Correlation matrix.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.corr for more information about parameters and output format.

count(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

cov(min_periods=None)#

Compute pairwise covariance of columns, excluding NA/null values.

Parameters
  • min_periods (int) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

Covariance matrix.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.cov for more information about parameters and output format.

cummax(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

cummin(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

cumprod(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

cumsum(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

default_to_pandas(pandas_op, *args, **kwargs)#

Do fallback to pandas for the passed function.

Parameters
  • pandas_op (callable(pandas.DataFrame) -> object) – Function to apply to the casted to pandas frame.

  • *args (iterable) – Positional arguments to pass to pandas_op.

  • **kwargs (dict) – Key-value arguments to pass to pandas_op.

Returns

The result of the pandas_op, converted back to BaseQueryCompiler.

Return type

BaseQueryCompiler

describe(**kwargs)#

Generate descriptive statistics.

Parameters
  • percentiles (list-like) –

  • include ("all" or list of dtypes, optional) –

  • exclude (list of dtypes, optional) –

  • datetime_is_numeric (bool) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler object containing the descriptive statistics of the underlying data.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.describe for more information about parameters and output format.

df_update(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

diff(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

dot(other, squeeze_self=None, squeeze_other=None)#

Compute the matrix multiplication of self and other.

Parameters
  • other (BaseQueryCompiler or NumPy array) – The other query compiler or NumPy array to matrix multiply with self.

  • squeeze_self (boolean) – If self is a one-column query compiler, indicates whether it represents Series object.

  • squeeze_other (boolean) – If other is a one-column query compiler, indicates whether it represents Series object.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

A new query compiler that contains result of the matrix multiply.

Return type

BaseQueryCompiler

drop(index=None, columns=None, errors: str = 'raise')#

Drop specified rows or columns.

Parameters
  • index (list of labels, optional) – Labels of rows to drop.

  • columns (list of labels, optional) – Labels of columns to drop.

  • errors (str, default: "raise") – If ‘ignore’, suppress error and only existing labels are dropped.

Returns

New QueryCompiler with removed data.

Return type

BaseQueryCompiler

dropna(**kwargs)#

Remove missing values.

Parameters
  • axis ({0, 1}) –

  • how ({"any", "all"}) –

  • thresh (int, optional) –

  • subset (list of labels) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with null values dropped along given axis.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.dropna for more information about parameters and output format.

dt_ceil(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_date(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_day(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_day_name(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_dayofweek(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_dayofyear(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_days(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_days_in_month(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_daysinmonth(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_end_time(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_floor(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_freq()#

Get the time frequency of the underlying time-series data.

Returns

QueryCompiler containing a single value, the frequency of the data.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.dt.freq for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

dt_hour(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_leap_year(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_month_end(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_month_start(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_quarter_end(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_quarter_start(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_year_end(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_is_year_start(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_microsecond(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_microseconds(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_minute(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_month(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_month_name(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_nanosecond(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_nanoseconds(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_normalize(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_quarter(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_qyear(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_round(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_second(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_seconds(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_start_time(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_strftime(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_time(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_timetz(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_to_period(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_to_pydatetime(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_to_pytimedelta(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_to_timestamp(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_total_seconds(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_tz()#

Get the time-zone of the underlying time-series data.

Returns

QueryCompiler containing a single value, time-zone of the data.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.dt.tz for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

dt_tz_convert(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_tz_localize(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_week(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_weekday(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_weekofyear(*args, **kwargs)#

Execute Map function against passed query compiler.

dt_year(*args, **kwargs)#

Execute Map function against passed query compiler.

property dtypes#

Get columns dtypes.

Returns

Series with dtypes of each column.

Return type

pandas.Series

eq(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

eval(expr, **kwargs)#

Evaluate string expression on QueryCompiler columns.

Parameters
  • expr (str) –

  • **kwargs (dict) –

Returns

QueryCompiler containing the result of evaluation.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.eval for more information about parameters and output format.

explode(column)#

Explode the given columns.

Parameters

column (Union[Hashable, Sequence[Hashable]]) – The columns to explode.

Returns

QueryCompiler that contains the results of execution. For each row in the input QueryCompiler, if the selected columns each contain M items, there will be M rows created by exploding the columns.

Return type

BaseQueryCompiler

fillna(**kwargs)#

Replace NaN values using provided method.

Parameters
  • value (scalar or dict) –

  • method ({"backfill", "bfill", "pad", "ffill", None}) –

  • axis ({0, 1}) –

  • inplace ({False}) – This parameter serves the compatibility purpose. Always has to be False.

  • limit (int, optional) –

  • downcast (dict, optional) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with all null values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.fillna for more information about parameters and output format.

finalize()#

Finalize constructing the dataframe calling all deferred functions which were used to build it.

first_valid_index()#

Return index label of first non-NaN/NULL value.

Return type

scalar

floordiv(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

free()#

Trigger a cleanup of this object.

classmethod from_arrow(at, data_cls)#

Build QueryCompiler from Arrow Table.

Parameters
  • at (Arrow Table) – The Arrow Table to convert from.

  • data_cls (type) – PandasDataframe class (or its descendant) to convert to.

Returns

QueryCompiler containing data from the pandas DataFrame.

Return type

BaseQueryCompiler

classmethod from_dataframe(df, data_cls)#

Build QueryCompiler from a DataFrame object supporting the dataframe exchange protocol __dataframe__().

Parameters
  • df (DataFrame) – The DataFrame object supporting the dataframe exchange protocol.

  • data_cls (type) – PandasDataframe class (or its descendant) to convert to.

Returns

QueryCompiler containing data from the DataFrame.

Return type

BaseQueryCompiler

classmethod from_pandas(df, data_cls)#

Build QueryCompiler from pandas DataFrame.

Parameters
  • df (pandas.DataFrame) – The pandas DataFrame to convert from.

  • data_cls (type) – PandasDataframe class (or its descendant) to convert to.

Returns

QueryCompiler containing data from the pandas DataFrame.

Return type

BaseQueryCompiler

ge(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

get_dummies(columns, **kwargs)#

Convert categorical variables to dummy variables for certain columns.

Parameters
  • columns (label or list of such) – Columns to convert.

  • prefix (str or list of such) –

  • prefix_sep (str) –

  • dummy_na (bool) –

  • drop_first (bool) –

  • dtype (dtype) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with categorical variables converted to dummy.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.get_dummies for more information about parameters and output format.

getitem_array(key)#

Mask QueryCompiler with key.

Parameters

key (BaseQueryCompiler, np.ndarray or list of column labels) – Boolean mask represented by QueryCompiler or np.ndarray of the same shape as self, or enumerable of columns to pick.

Returns

New masked QueryCompiler.

Return type

BaseQueryCompiler

getitem_column_array(key, numeric=False)#

Get column data for target labels.

Parameters
  • key (list-like) – Target labels by which to retrieve data.

  • numeric (bool, default: False) – Whether or not the key passed in represents the numeric index or the named index.

Returns

New QueryCompiler that contains specified columns.

Return type

BaseQueryCompiler

getitem_row_array(key)#

Get row data for target indices.

Parameters

key (list-like) – Numeric indices of the rows to pick.

Returns

New QueryCompiler that contains specified rows.

Return type

BaseQueryCompiler

groupby_agg(by, agg_func, axis, groupby_kwargs, agg_args, agg_kwargs, how='axis_wise', drop=False)#

Group QueryCompiler data and apply passed aggregation function.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • agg_func (str, dict or callable(Series | DataFrame) -> scalar | Series | DataFrame) – Function to apply to the GroupBy object.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • how ({'axis_wise', 'group_wise', 'transform'}, default: 'axis_wise') –

    How to apply passed agg_func:
    • ’axis_wise’: apply the function against each row/column.

    • ’group_wise’: apply the function against every group.

    • ’transform’: apply the function against every group and broadcast the result to the original Query Compiler shape.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

QueryCompiler containing the result of groupby aggregation.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.GroupBy.aggregate for more information about parameters and output format.

groupby_all(**kwargs)#

Group QueryCompiler data and check whether all elements are True for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the boolean of whether all elements are True for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.all for more information about parameters and output format.

groupby_any(**kwargs)#

Group QueryCompiler data and check whether any element is True for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the boolean of whether there is any element which is True for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.any for more information about parameters and output format.

groupby_count(**kwargs)#

Group QueryCompiler data and count non-null values for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the number of non-null values for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.count for more information about parameters and output format.

groupby_dtypes(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#

Group QueryCompiler data and get data types for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the data type for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.dtypes for more information about parameters and output format.

groupby_max(**kwargs)#

Group QueryCompiler data and get the maximum value for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the maximum value for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.max for more information about parameters and output format.

groupby_mean(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#

Group QueryCompiler data and compute the mean value for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the mean value for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.mean for more information about parameters and output format.

groupby_min(**kwargs)#

Group QueryCompiler data and get the minimum value for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the minimum value for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.min for more information about parameters and output format.

groupby_prod(**kwargs)#

Group QueryCompiler data and compute product for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the product for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.prod for more information about parameters and output format.

groupby_size(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#

Group QueryCompiler data and get the number of elements for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the number of elements for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.size for more information about parameters and output format.

groupby_sum(**kwargs)#

Group QueryCompiler data and compute sum for every group.

Parameters
  • by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.

  • axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.

  • groupby_kwargs (dict) – GroupBy parameters as expected by modin.pandas.DataFrame.groupby signature.

  • agg_args (list-like) – Positional arguments to pass to the agg_func.

  • agg_kwargs (dict) – Key arguments to pass to the agg_func.

  • drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.

Returns

  • BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:

    • Labels on the opposite of specified axis are preserved.

    • If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.

    • If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.

    • Each element of QueryCompiler is the sum for the corresponding group and column/row.

  • .. warningmap_args and reduce_args parameters are deprecated. They’re leaked here from PandasQueryCompiler.groupby_*, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.

Notes

Please refer to modin.pandas.GroupBy.sum for more information about parameters and output format.

gt(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

idxmax(*args, **kwargs)#

Execute Reduce function against passed query compiler.

idxmin(*args, **kwargs)#

Execute Reduce function against passed query compiler.

infer_objects()#

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Returns

New query compiler with udpated dtypes.

Return type

BaseQueryCompiler

insert(loc, column, value)#

Insert new column.

Parameters
  • loc (int) – Insertion position.

  • column (label) – Label of the new column.

  • value (One-column BaseQueryCompiler, 1D array or scalar) – Data to fill new column with.

Returns

QueryCompiler with new column inserted.

Return type

BaseQueryCompiler

invert(*args, **kwargs)#

Execute Map function against passed query compiler.

is_monotonic_decreasing()#

Return boolean if values in the object are monotonically decreasing.

Return type

bool

is_monotonic_increasing()#

Return boolean if values in the object are monotonically increasing.

Return type

bool

is_series_like()#

Check whether this QueryCompiler can represent modin.pandas.Series object.

Returns

Return True if QueryCompiler has a single column or row, False otherwise.

Return type

bool

isin(*args, **kwargs)#

Execute Map function against passed query compiler.

isna(*args, **kwargs)#

Execute Map function against passed query compiler.

join(right, **kwargs)#

Join columns of another QueryCompiler.

Parameters
  • right (BaseQueryCompiler) – QueryCompiler of the right frame to join with.

  • on (label or list of such) –

  • how ({"left", "right", "outer", "inner"}) –

  • lsuffix (str) –

  • rsuffix (str) –

  • sort (bool) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler that contains result of the join.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.join for more information about parameters and output format.

kurt(*args, **kwargs)#

Execute Reduce function against passed query compiler.

last_valid_index()#

Return index label of last non-NaN/NULL value.

Return type

scalar

property lazy_execution#

Whether underlying Modin frame should be executed in a lazy mode.

If True, such QueryCompiler will be handled differently at the front-end in order to reduce triggering the computation as much as possible.

Return type

bool

le(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

lt(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

mad(*args, **kwargs)#

Execute Reduce function against passed query compiler.

max(axis, **kwargs)#

Get the maximum value for each column or row.

Parameters
  • axis ({{0, 1}}) –

  • level (None, default: None) – Serves the compatibility purpose. Always has to be None.

  • numeric_only (bool, optional) –

  • skipna (bool, default: True) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

One-column QueryCompiler with index labels of the specified axis, where each row contains the maximum value for the corresponding row or column.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.max for more information about parameters and output format.

mean(axis, **kwargs)#

Get the mean value for each column or row.

Parameters
  • axis ({{0, 1}}) –

  • level (None, default: None) – Serves the compatibility purpose. Always has to be None.

  • numeric_only (bool, optional) –

  • skipna (bool, default: True) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

One-column QueryCompiler with index labels of the specified axis, where each row contains the mean value for the corresponding row or column.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.mean for more information about parameters and output format.

median(*args, **kwargs)#

Execute Reduce function against passed query compiler.

melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)#

Unpivot QueryCompiler data from wide to long format.

Parameters
  • id_vars (list of labels, optional) –

  • value_vars (list of labels, optional) –

  • var_name (label) –

  • value_name (label) –

  • col_level (int or label) –

  • ignore_index (bool) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with unpivoted data.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.melt for more information about parameters and output format.

memory_usage(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

merge(right, **kwargs)#

Merge QueryCompiler objects using a database-style join.

Parameters
  • right (BaseQueryCompiler) – QueryCompiler of the right frame to merge with.

  • how ({"left", "right", "outer", "inner", "cross"}) –

  • on (label or list of such) –

  • left_on (label or list of such) –

  • right_on (label or list of such) –

  • left_index (bool) –

  • right_index (bool) –

  • sort (bool) –

  • suffixes (list-like) –

  • copy (bool) –

  • indicator (bool or str) –

  • validate (str) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler that contains result of the merge.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.merge for more information about parameters and output format.

min(axis, **kwargs)#

Get the minimum value for each column or row.

Parameters
  • axis ({{0, 1}}) –

  • level (None, default: None) – Serves the compatibility purpose. Always has to be None.

  • numeric_only (bool, optional) –

  • skipna (bool, default: True) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

One-column QueryCompiler with index labels of the specified axis, where each row contains the minimum value for the corresponding row or column.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.min for more information about parameters and output format.

mod(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

mode(**kwargs)#

Get the modes for every column or row.

Parameters
  • axis ({0, 1}) –

  • numeric_only (bool) –

  • dropna (bool) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with modes calculated along given axis.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.mode for more information about parameters and output format.

mul(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

ne(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

negative(*args, **kwargs)#

Execute Map function against passed query compiler.

nlargest(*args, **kwargs)#

Return the first n rows ordered by columns in descending order.

Parameters
  • n (int, default: 5) –

  • columns (list of labels, optional) – Column labels to order by. (note: this parameter can be omitted only for a single-column query compilers representing Series object, otherwise columns has to be specified).

  • keep ({"first", "last", "all"}, default: "first") –

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.nlargest for more information about parameters and output format.

notna(*args, **kwargs)#

Execute Map function against passed query compiler.

nsmallest(*args, **kwargs)#

Return the first n rows ordered by columns in ascending order.

Parameters
  • n (int, default: 5) –

  • columns (list of labels, optional) – Column labels to order by. (note: this parameter can be omitted only for a single-column query compilers representing Series object, otherwise columns has to be specified).

  • keep ({"first", "last", "all"}, default: "first") –

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.nsmallest for more information about parameters and output format.

nunique(*args, **kwargs)#

Execute Reduce function against passed query compiler.

pivot(index, columns, values)#

Produce pivot table based on column values.

Parameters
  • index (label or list of such, pandas.Index, optional) –

  • columns (label or list of such) –

  • values (label or list of such, optional) –

Returns

New QueryCompiler containing pivot table.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.pivot for more information about parameters and output format.

pivot_table(index, values, columns, aggfunc, fill_value, margins, dropna, margins_name, observed, sort)#

Create a spreadsheet-style pivot table from underlying data.

Parameters
  • index (label, pandas.Grouper, array or list of such) –

  • values (label, optional) –

  • columns (column, pandas.Grouper, array or list of such) –

  • aggfunc (callable(pandas.Series) -> scalar, dict of list of such) –

  • fill_value (scalar, optional) –

  • margins (bool) –

  • dropna (bool) –

  • margins_name (str) –

  • observed (bool) –

  • sort (bool) –

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.pivot_table for more information about parameters and output format.

pow(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

prod(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

prod_min_count(*args, **kwargs)#

Execute Reduce function against passed query compiler.

quantile_for_list_of_values(**kwargs)#

Get the value at the given quantile for each column or row.

Parameters
  • q (list-like) –

  • axis ({0, 1}) –

  • numeric_only (bool) –

  • interpolation ({"linear", "lower", "higher", "midpoint", "nearest"}) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

One-column QueryCompiler with index labels of the specified axis, where each row contains the value at the given quantile for the corresponding row or column.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.quantile for more information about parameters and output format.

quantile_for_single_value(*args, **kwargs)#

Execute Reduce function against passed query compiler.

query(expr, **kwargs)#

Query columns of the QueryCompiler with a boolean expression.

Parameters
  • expr (str) –

  • **kwargs (dict) –

Returns

New QueryCompiler containing the rows where the boolean expression is satisfied.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.query for more information about parameters and output format.

radd(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rank(**kwargs)#

Compute numerical rank along the specified axis.

By default, equal values are assigned a rank that is the average of the ranks of those values, this behavior can be changed via method parameter.

Parameters
  • axis ({0, 1}) –

  • method ({"average", "min", "max", "first", "dense"}) –

  • numeric_only (bool) –

  • na_option ({"keep", "top", "bottom"}) –

  • ascending (bool) –

  • pct (bool) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler of the same shape as self, where each element is the numerical rank of the corresponding value along row or column.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.rank for more information about parameters and output format.

reindex(axis, labels, **kwargs)#

Align QueryCompiler data with a new index along specified axis.

Parameters
  • axis ({0, 1}) – Axis to align labels along. 0 is for index, 1 is for columns.

  • labels (list-like) – Index-labels to align with.

  • method ({None, "backfill"/"bfill", "pad"/"ffill", "nearest"}) – Method to use for filling holes in reindexed frame.

  • fill_value (scalar) – Value to use for missing values in the resulted frame.

  • limit (int) –

  • tolerance (int) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler with aligned axis.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.reindex for more information about parameters and output format.

replace(*args, **kwargs)#

Execute Map function against passed query compiler.

resample_agg_df(resample_kwargs, func, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and apply passed aggregation function for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • func (str, dict, callable(pandas.Series) -> scalar, or list of such) –

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the function names.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.agg for more information about parameters and output format.

resample_agg_ser(resample_kwargs, func, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and apply passed aggregation function in a one-column query compiler for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • func (str, dict, callable(pandas.Series) -> scalar, or list of such) –

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the function names.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.agg for more information about parameters and output format.

Warning

This method duplicates logic of resample_agg_df and will be removed soon.

resample_app_df(resample_kwargs, func, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and apply passed aggregation function for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • func (str, dict, callable(pandas.Series) -> scalar, or list of such) –

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the function names.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.apply for more information about parameters and output format.

Warning

This method duplicates logic of resample_agg_df and will be removed soon.

resample_app_ser(resample_kwargs, func, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and apply passed aggregation function in a one-column query compiler for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • func (str, dict, callable(pandas.Series) -> scalar, or list of such) –

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the function names.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.apply for more information about parameters and output format.

Warning

This method duplicates logic of resample_agg_df and will be removed soon.

resample_asfreq(resample_kwargs, fill_value)#

Resample time-series data and get the values at the new frequency.

Group data into intervals by time-series row/column with a specified frequency and get values at the new frequency.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • fill_value (scalar) –

Returns

New QueryCompiler containing values at the specified frequency.

Return type

BaseQueryCompiler

resample_backfill(resample_kwargs, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using back-fill method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.backfill for more information about parameters and output format.

resample_bfill(resample_kwargs, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using back-fill method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.bfill for more information about parameters and output format.

resample_count(resample_kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute number of non-NA values for each group.

Parameters

resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the number of non-NA values for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.count for more information about parameters and output format.

resample_ffill(resample_kwargs, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using forward-fill method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.ffill for more information about parameters and output format.

resample_fillna(resample_kwargs, method, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using specified method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • method (str) –

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.fillna for more information about parameters and output format.

resample_first(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute first element for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the first element for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.first for more information about parameters and output format.

resample_get_group(resample_kwargs, name, obj)#

Resample time-series data and get the specified group.

Group data into intervals by time-series row/column with a specified frequency and get the values of the specified group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • name (object) –

  • obj (modin.pandas.DataFrame, optional) –

Returns

New QueryCompiler containing the values from the specified group.

Return type

BaseQueryCompiler

resample_interpolate(resample_kwargs, method, axis, limit, inplace, limit_direction, limit_area, downcast, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using specified interpolation method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • method (str) –

  • axis ({0, 1}) –

  • limit (int) –

  • inplace ({False}) – This parameter serves the compatibility purpose. Always has to be False.

  • limit_direction ({"forward", "backward", "both"}) –

  • limit_area ({None, "inside", "outside"}) –

  • downcast (str, optional) –

  • **kwargs (dict) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.interpolate for more information about parameters and output format.

resample_last(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute last element for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the last element for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.last for more information about parameters and output format.

resample_max(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute maximum value for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the maximum value for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.max for more information about parameters and output format.

resample_mean(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute mean value for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the mean value for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.mean for more information about parameters and output format.

resample_median(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute median value for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the median value for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.median for more information about parameters and output format.

resample_min(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute minimum value for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the minimum value for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.min for more information about parameters and output format.

resample_nearest(resample_kwargs, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using ‘nearest’ method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.nearest for more information about parameters and output format.

resample_nunique(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute number of unique values for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the number of unique values for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.nunique for more information about parameters and output format.

resample_ohlc_df(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute open, high, low and close values for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the labels of columns containing computed values.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.ohlc for more information about parameters and output format.

resample_ohlc_ser(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute open, high, low and close values for each group over the specified axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Positional arguments to pass to the aggregation function.

  • **kwargs (dict) – Keyword arguments to pass to the aggregation function.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are a MultiIndex, where first level contains preserved labels of this axis and the second level is the labels of columns containing computed values.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.ohlc for more information about parameters and output format.

resample_pad(resample_kwargs, limit)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and fill missing values in each group independently using ‘pad’ method.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • limit (int) –

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • QueryCompiler contains unsampled data with missing values filled.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.pad for more information about parameters and output format.

resample_pipe(resample_kwargs, func, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency, build equivalent pandas.Resampler object and apply passed function to it.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • func (callable(pandas.Resampler) -> object or tuple(callable, str)) –

  • *args (iterable) – Positional arguments to pass to function.

  • **kwargs (dict) – Keyword arguments to pass to function.

Returns

New QueryCompiler containing the result of passed function.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Resampler.pipe for more information about parameters and output format.

resample_prod(resample_kwargs, min_count, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute product for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • min_count (int) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the product for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.prod for more information about parameters and output format.

resample_quantile(resample_kwargs, q, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute quantile for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • q (float) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the quantile for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.quantile for more information about parameters and output format.

resample_sem(resample_kwargs, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute standard error of the mean for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the standard error of the mean for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.sem for more information about parameters and output format.

resample_size(resample_kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute number of elements in a group for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the number of elements in a group for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.size for more information about parameters and output format.

resample_std(resample_kwargs, ddof, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute standard deviation for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • ddof (int) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the standard deviation for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.std for more information about parameters and output format.

resample_sum(resample_kwargs, min_count, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute sum for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • min_count (int) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the sum for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.sum for more information about parameters and output format.

resample_transform(resample_kwargs, arg, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and call passed function on each group. In contrast to resample_app_df apply function to the whole group, instead of a single axis.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • arg (callable(pandas.DataFrame) -> pandas.Series) –

  • *args (iterable) – Positional arguments to pass to function.

  • **kwargs (dict) – Keyword arguments to pass to function.

Returns

New QueryCompiler containing the result of passed function.

Return type

BaseQueryCompiler

resample_var(resample_kwargs, ddof, *args, **kwargs)#

Resample time-series data and apply aggregation on it.

Group data into intervals by time-series row/column with a specified frequency and compute variance for each group.

Parameters
  • resample_kwargs (dict) – Resample parameters as expected by modin.pandas.DataFrame.resample signature.

  • ddof (int) –

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the result of resample aggregation built by the following rules:

  • Labels on the specified axis are the group names (time-stamps)

  • Labels on the opposite of specified axis are preserved.

  • Each element of QueryCompiler is the variance for the corresponding group and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.resample.Resampler.var for more information about parameters and output format.

reset_index(**kwargs)#

Reset the index, or a level of it.

Parameters
  • drop (bool) – Whether to drop the reset index or insert it at the beginning of the frame.

  • level (int or label, optional) – Level to remove from index. Removes all levels by default.

  • col_level (int or label) – If the columns have multiple levels, determines which level the labels are inserted into.

  • col_fill (label) – If the columns have multiple levels, determines how the other levels are named.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler with reset index.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.reset_index for more information about parameters and output format.

rfloordiv(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rmod(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rmul(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rolling_aggregate(axis, rolling_args, func, *args, **kwargs)#

Create rolling window and apply specified functions for each window over the given axis.

Parameters
  • fold_axis ({0, 1}) –

  • rolling_args (list) – Rolling windows arguments with the same signature as modin.pandas.DataFrame.rolling.

  • func (str, dict, callable(pandas.Series) -> scalar, or list of such) –

  • *args (iterable) –

  • **kwargs (dict) –

Returns

New QueryCompiler containing the result of passed functions for each window, built by the following rules:

  • Labels on the specified axis are preserved.

  • Labels on the opposite of specified axis are MultiIndex, where first level contains preserved labels of this axis and the second level has the function names.

  • Each element of QueryCompiler is the result of corresponding function for the corresponding window and column/row.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Rolling.aggregate for more information about parameters and output format.

rolling_apply(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_corr(axis, rolling_args, other, pairwise, *args, **kwargs)#

Create rolling window and compute correlation for each window over the given axis.

Parameters
  • fold_axis ({0, 1}) –

  • rolling_args (list) – Rolling windows arguments with the same signature as modin.pandas.DataFrame.rolling.

  • other (modin.pandas.Series, modin.pandas.DataFrame, list-like, optional) –

  • pairwise (bool, optional) –

  • *args (iterable) –

  • **kwargs (dict) –

Returns

New QueryCompiler containing correlation for each window, built by the following rules:

  • Output QueryCompiler has the same shape and axes labels as the source.

  • Each element is the correlation for the corresponding window.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Rolling.corr for more information about parameters and output format.

rolling_count(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_cov(axis, rolling_args, other, pairwise, ddof, **kwargs)#

Create rolling window and compute covariance for each window over the given axis.

Parameters
  • fold_axis ({0, 1}) –

  • rolling_args (list) – Rolling windows arguments with the same signature as modin.pandas.DataFrame.rolling.

  • other (modin.pandas.Series, modin.pandas.DataFrame, list-like, optional) –

  • pairwise (bool, optional) –

  • ddof (int, default: 1) –

  • **kwargs (dict) –

Returns

New QueryCompiler containing covariance for each window, built by the following rules:

  • Output QueryCompiler has the same shape and axes labels as the source.

  • Each element is the covariance for the corresponding window.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Rolling.cov for more information about parameters and output format.

rolling_kurt(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_max(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_mean(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_median(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_min(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_quantile(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_skew(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_std(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_sum(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

rolling_var(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

round(*args, **kwargs)#

Execute Map function against passed query compiler.

rpow(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rsub(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

rtruediv(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

searchsorted(**kwargs)#

Find positions in a sorted self where value should be inserted to maintain order.

Parameters
  • value (list-like) –

  • side ({"left", "right"}) –

  • sorter (list-like, optional) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

One-column QueryCompiler which contains indices to insert.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.searchsorted for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

sem(*args, **kwargs)#

Execute Reduce function against passed query compiler.

series_update(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

series_view(*args, **kwargs)#

Execute Map function against passed query compiler.

set_index_from_columns(keys: List[Hashable], drop: bool = True, append: bool = False)#

Create new row labels from a list of columns.

Parameters
  • keys (list of hashable) – The list of column names that will become the new index.

  • drop (bool, default: True) – Whether or not to drop the columns provided in the keys argument.

  • append (bool, default: True) – Whether or not to add the columns in keys as new levels appended to the existing index.

Returns

A new QueryCompiler with updated index.

Return type

BaseQueryCompiler

setitem(axis, key, value)#

Set the row/column defined by key to the value provided.

Parameters
  • axis ({0, 1}) – Axis to set value along. 0 means set row, 1 means set column.

  • key (label) – Row/column label to set value in.

  • value (BaseQueryCompiler, list-like or scalar) – Define new row/column value.

Returns

New QueryCompiler with updated key value.

Return type

BaseQueryCompiler

skew(*args, **kwargs)#

Execute Reduce function against passed query compiler.

sort_columns_by_row_values(rows, ascending=True, **kwargs)#

Reorder the columns based on the lexicographic order of the given rows.

Parameters
  • rows (label or list of labels) – The row or rows to sort by.

  • ascending (bool, default: True) – Sort in ascending order (True) or descending order (False).

  • kind ({"quicksort", "mergesort", "heapsort"}) –

  • na_position ({"first", "last"}) –

  • ignore_index (bool) –

  • key (callable(pandas.Index) -> pandas.Index, optional) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler that contains result of the sort.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.sort_values for more information about parameters and output format.

sort_index(**kwargs)#

Sort data by index or column labels.

Parameters
  • axis ({0, 1}) –

  • level (int, label or list of such) –

  • ascending (bool) –

  • inplace (bool) –

  • kind ({"quicksort", "mergesort", "heapsort"}) –

  • na_position ({"first", "last"}) –

  • sort_remaining (bool) –

  • ignore_index (bool) –

  • key (callable(pandas.Index) -> pandas.Index, optional) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler containing the data sorted by columns or indices.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.sort_index for more information about parameters and output format.

sort_rows_by_column_values(columns, ascending=True, **kwargs)#

Reorder the rows based on the lexicographic order of the given columns.

Parameters
  • columns (label or list of labels) – The column or columns to sort by.

  • ascending (bool, default: True) – Sort in ascending order (True) or descending order (False).

  • kind ({"quicksort", "mergesort", "heapsort"}) –

  • na_position ({"first", "last"}) –

  • ignore_index (bool) –

  • key (callable(pandas.Index) -> pandas.Index, optional) –

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

New QueryCompiler that contains result of the sort.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.sort_values for more information about parameters and output format.

stack(level, dropna)#

Stack the prescribed level(s) from columns to index.

Parameters
  • level (int or label) –

  • dropna (bool) –

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.stack for more information about parameters and output format.

std(*args, **kwargs)#

Execute Reduce function against passed query compiler.

str___getitem__(*args, **kwargs)#

Execute Map function against passed query compiler.

str_capitalize(*args, **kwargs)#

Execute Map function against passed query compiler.

str_center(*args, **kwargs)#

Execute Map function against passed query compiler.

str_contains(*args, **kwargs)#

Execute Map function against passed query compiler.

str_count(*args, **kwargs)#

Execute Map function against passed query compiler.

str_endswith(*args, **kwargs)#

Execute Map function against passed query compiler.

str_extract(pat, flags, expand)#

Apply “extract” function to each string value in QueryCompiler.

Parameters
  • pat (str) –

  • flags (int, default: 0) –

  • expand (bool, default: True) –

Returns

New QueryCompiler containing the result of execution of the “extract” function against each string element.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.str.extract for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

str_find(*args, **kwargs)#

Execute Map function against passed query compiler.

str_findall(*args, **kwargs)#

Execute Map function against passed query compiler.

str_get(*args, **kwargs)#

Execute Map function against passed query compiler.

str_index(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isalnum(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isalpha(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isdecimal(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isdigit(*args, **kwargs)#

Execute Map function against passed query compiler.

str_islower(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isnumeric(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isspace(*args, **kwargs)#

Execute Map function against passed query compiler.

str_istitle(*args, **kwargs)#

Execute Map function against passed query compiler.

str_isupper(*args, **kwargs)#

Execute Map function against passed query compiler.

str_join(*args, **kwargs)#

Execute Map function against passed query compiler.

str_len(*args, **kwargs)#

Execute Map function against passed query compiler.

str_ljust(*args, **kwargs)#

Execute Map function against passed query compiler.

str_lower(*args, **kwargs)#

Execute Map function against passed query compiler.

str_lstrip(*args, **kwargs)#

Execute Map function against passed query compiler.

str_match(*args, **kwargs)#

Execute Map function against passed query compiler.

str_normalize(*args, **kwargs)#

Execute Map function against passed query compiler.

str_pad(*args, **kwargs)#

Execute Map function against passed query compiler.

str_partition(*args, **kwargs)#

Execute Map function against passed query compiler.

str_repeat(*args, **kwargs)#

Execute Map function against passed query compiler.

str_replace(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rfind(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rindex(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rjust(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rpartition(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rsplit(*args, **kwargs)#

Execute Map function against passed query compiler.

str_rstrip(*args, **kwargs)#

Execute Map function against passed query compiler.

str_slice(*args, **kwargs)#

Execute Map function against passed query compiler.

str_slice_replace(*args, **kwargs)#

Execute Map function against passed query compiler.

str_split(*args, **kwargs)#

Execute Map function against passed query compiler.

str_startswith(*args, **kwargs)#

Execute Map function against passed query compiler.

str_strip(*args, **kwargs)#

Execute Map function against passed query compiler.

str_swapcase(*args, **kwargs)#

Execute Map function against passed query compiler.

str_title(*args, **kwargs)#

Execute Map function against passed query compiler.

str_translate(*args, **kwargs)#

Execute Map function against passed query compiler.

str_upper(*args, **kwargs)#

Execute Map function against passed query compiler.

str_wrap(*args, **kwargs)#

Execute Map function against passed query compiler.

str_zfill(*args, **kwargs)#

Execute Map function against passed query compiler.

sub(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

sum(*args, **kwargs)#

Execute TreeReduce function against passed query compiler.

sum_min_count(*args, **kwargs)#

Execute Reduce function against passed query compiler.

take_2d_positional(index=None, columns=None)#

Index QueryCompiler with passed keys.

Parameters
  • index (list-like of ints, optional) – Positional indices of rows to grab.

  • columns (list-like of ints, optional) – Positional indices of columns to grab.

Returns

New masked QueryCompiler.

Return type

BaseQueryCompiler

to_dataframe(nan_as_null: bool = False, allow_copy: bool = True)#

Get a DataFrame exchange protocol object representing data of the Modin DataFrame.

See more about the protocol in https://data-apis.org/dataframe-protocol/latest/index.html.

Parameters
  • nan_as_null (bool, default: False) – A keyword intended for the consumer to tell the producer to overwrite null values in the data with NaN (or NaT). This currently has no effect; once support for nullable extension dtypes is added, this value should be propagated to columns.

  • allow_copy (bool, default: True) – A keyword that defines whether or not the library is allowed to make a copy of the data. For example, copying data would be necessary if a library supports strided buffers, given that this protocol specifies contiguous buffers. Currently, if the flag is set to False and a copy is needed, a RuntimeError will be raised.

Returns

A dataframe object following the DataFrame protocol specification.

Return type

ProtocolDataframe

to_datetime(*args, **kwargs)#

Convert columns of the QueryCompiler to the datetime dtype.

Parameters
  • *args (iterable) –

  • **kwargs (dict) –

Returns

QueryCompiler with all columns converted to datetime dtype.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.to_datetime for more information about parameters and output format.

to_numeric(*args, **kwargs)#

Execute Map function against passed query compiler.

to_numpy(**kwargs)#

Convert underlying query compilers data to NumPy array.

Parameters
  • dtype (dtype) – The dtype of the resulted array.

  • copy (bool) – Whether to ensure that the returned value is not a view on another array.

  • na_value (object) – The value to replace missing values with.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

The QueryCompiler converted to NumPy array.

Return type

np.ndarray

to_pandas()#

Convert underlying query compilers data to pandas.DataFrame.

Returns

The QueryCompiler converted to pandas.

Return type

pandas.DataFrame

to_timedelta(*args, **kwargs)#

Execute Map function against passed query compiler.

transpose(*args, **kwargs)#

Transpose this QueryCompiler.

Parameters
  • copy (bool) – Whether to copy the data after transposing.

  • *args (iterable) – Serves the compatibility purpose. Does not affect the result.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

Transposed new QueryCompiler.

Return type

BaseQueryCompiler

truediv(other, broadcast=False, *args, dtypes=None, **kwargs)#

Apply binary func to passed operands.

Parameters
  • query_compiler (QueryCompiler) – Left operand of func.

  • other (QueryCompiler, list-like object or scalar) – Right operand of func.

  • broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that passed from a high level API.

  • *args (args,) – Arguments that will be passed to func.

  • dtypes ("copy" or None, default: None) – Whether to keep old dtypes or infer new dtypes from data.

  • **kwargs (kwargs,) – Arguments that will be passed to func.

Returns

Result of binary function.

Return type

QueryCompiler

unique()#

Get unique values of self.

Parameters

**kwargs (dict) – Serves compatibility purpose. Does not affect the result.

Returns

New QueryCompiler with unique values.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.Series.unique for more information about parameters and output format.

Warning

This method is supported only by one-column query compilers.

unstack(level, fill_value)#

Pivot a level of the (necessarily hierarchical) index labels.

Parameters
  • level (int or label) –

  • fill_value (scalar or dict) –

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.unstack for more information about parameters and output format.

var(*args, **kwargs)#

Execute Reduce function against passed query compiler.

where(cond, other, **kwargs)#

Update values of self using values from other at positions where cond is False.

Parameters
  • cond (BaseQueryCompiler) – Boolean mask. True - keep the self value, False - replace by other value.

  • other (BaseQueryCompiler or pandas.Series) – Object to grab replacement values from.

  • axis ({0, 1}) – Axis to align frames along if axes of self, cond and other are not equal. 0 is for index, when 1 is for columns.

  • level (int or label, optional) – Level of MultiIndex to align frames along if axes of self, cond and other are not equal. Currently level parameter is not implemented, so only None value is acceptable.

  • **kwargs (dict) – Serves the compatibility purpose. Does not affect the result.

Returns

QueryCompiler with updated data.

Return type

BaseQueryCompiler

Notes

Please refer to modin.pandas.DataFrame.where for more information about parameters and output format.

window_mean(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

window_std(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

window_sum(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

window_var(fold_axis=None, *args, **kwargs)#

Execute Fold function against passed query compiler.

Parameters
  • query_compiler (BaseQueryCompiler) – The query compiler to execute the function on.

  • fold_axis (int, optional) – 0 or None means apply across full column partitions. 1 means apply across full row partitions.

  • *args (iterable) – Additional arguments passed to fold_function.

  • **kwargs (dict) – Additional keyword arguments passed to fold_function.

Returns

A new query compiler representing the result of executing the function.

Return type

BaseQueryCompiler

write_items(row_numeric_index, col_numeric_index, broadcasted_items)#

Update QueryCompiler elements at the specified positions by passed values.

In contrast to setitem this method allows to do 2D assignments.

Parameters
  • row_numeric_index (list of ints) – Row positions to write value.

  • col_numeric_index (list of ints) – Column positions to write value.

  • broadcasted_items (2D-array) – Values to write. Have to be same size as defined by row_numeric_index and col_numeric_index.

Returns

New QueryCompiler with updated values.

Return type

BaseQueryCompiler