DFAlgQueryCompiler#
DFAlgQueryCompiler
implements
a query compiler for lazy frame. Each compiler instance holds an instance of
HdkOnNativeDataframe
which is used to build a lazy execution tree.
Public API#
- class modin.experimental.core.storage_formats.hdk.query_compiler.DFAlgQueryCompiler(frame, shape_hint=None)#
Query compiler for the HDK storage format.
This class doesn’t perform much processing and mostly forwards calls to
HdkOnNativeDataframe
for lazy execution trees build.- Parameters:
frame (HdkOnNativeDataframe) – Modin Frame to query with the compiled queries.
shape_hint ({"row", "column", None}, default: None) – Shape hint for frames known to be a column or a row, otherwise None.
- _modin_frame#
Modin Frame to query with the compiled queries.
- Type:
- _shape_hint#
Shape hint for frames known to be a column or a row, otherwise None.
- Type:
{“row”, “column”, None}
- add(other, **kwargs)#
Perform element-wise addition (
self + other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- astype(col_dtypes, errors: str = 'raise')#
Convert columns dtypes to given dtypes.
- Parameters:
col_dtypes (dict or str) – Map for column names and new dtypes.
errors ({'raise', 'ignore'}, default: 'raise') – Control raising of exceptions on invalid data for provided dtype. - raise : allow exceptions to be raised - ignore : suppress exceptions. On error return original object.
- Returns:
New QueryCompiler with updated dtypes.
- Return type:
- cat_codes()#
Convert underlying categories data into its codes.
- Returns:
New QueryCompiler containing the integer codes of the underlying categories.
- Return type:
Notes
Please refer to
modin.pandas.Series.cat.codes
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- columnarize()#
Transpose this QueryCompiler if it has a single row but multiple columns.
This method should be called for QueryCompilers representing a Series object, i.e.
self.is_series_like()
should be True.- Returns:
Transposed new QueryCompiler or self.
- Return type:
- property columns#
Return frame’s columns.
- Return type:
pandas.Index
- concat(axis, other, **kwargs)#
Concatenate self with passed query compilers along specified axis.
- Parameters:
axis ({0, 1}) – Axis to concatenate along. 0 is for index and 1 is for columns.
other (BaseQueryCompiler or list of such) – Objects to concatenate with self.
join ({'outer', 'inner', 'right', 'left'}, default: 'outer') – Type of join that will be used if indices on the other axis are different. (note: if specified, has to be passed as
join=value
).ignore_index (bool, default: False) – If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. (note: if specified, has to be passed as
ignore_index=value
).sort (bool, default: False) – Whether or not to sort non-concatenation axis. (note: if specified, has to be passed as
sort=value
).**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Concatenated objects.
- Return type:
- copy()#
Make a copy of this object.
- Returns:
Copy of self.
- Return type:
Notes
For copy, we don’t want a situation where we modify the metadata of the copies if we end up modifying something here. We copy all of the metadata to prevent that.
- count(**kwargs)#
Get the number of non-NaN values for each column or row.
- Parameters:
axis ({{0, 1}}) –
numeric_only (bool, optional) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the number of non-NaN values for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.count
for more information about parameters and output format.
- default_to_pandas(pandas_op, *args, **kwargs)#
Do fallback to pandas for the passed function.
- Parameters:
pandas_op (callable(pandas.DataFrame) -> object) – Function to apply to the casted to pandas frame.
*args (iterable) – Positional arguments to pass to pandas_op.
**kwargs (dict) – Key-value arguments to pass to pandas_op.
- Returns:
The result of the pandas_op, converted back to
BaseQueryCompiler
.- Return type:
- drop(index=None, columns=None, errors: str = 'raise')#
Drop specified rows or columns.
- Parameters:
index (list of labels, optional) – Labels of rows to drop.
columns (list of labels, optional) – Labels of columns to drop.
errors (str, default: "raise") – If ‘ignore’, suppress error and only existing labels are dropped.
- Returns:
New QueryCompiler with removed data.
- Return type:
- dropna(axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None)#
Remove missing values.
- Parameters:
axis ({0, 1}) –
how ({"any", "all"}) –
thresh (int, optional) –
subset (list of labels) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
New QueryCompiler with null values dropped along given axis.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.dropna
for more information about parameters and output format.
- dt_day()#
Get day component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is day component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.day
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_dayofweek()#
Get integer day of week for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is integer day of week for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.dayofweek
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_dayofyear()#
Get day of year for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is day of year for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.dayofyear
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_hour()#
Get hour for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is hour for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.hour
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_microsecond()#
Get microseconds component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is microseconds component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.microsecond
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_minute()#
Get minute component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is minute component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.minute
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_month()#
Get month component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is month component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.month
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_nanosecond()#
Get nanoseconds component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is nanoseconds component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.nanosecond
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_quarter()#
Get quarter component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is quarter component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.quarter
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_second()#
Get seconds component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is seconds component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.second
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_weekday()#
Get integer day of week for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is integer day of week for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.weekday
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- dt_year()#
Get year component for each datetime value.
- Returns:
New QueryCompiler with the same shape as self, where each element is year component for the corresponding datetime value.
- Return type:
Notes
Please refer to
modin.pandas.Series.dt.year
for more information about parameters and output format.Warning
This method is supported only by one-column query compilers.
- property dtypes#
Get columns dtypes.
- Returns:
Series with dtypes of each column.
- Return type:
pandas.Series
- eq(other, **kwargs)#
Perform element-wise equality comparison (
self == other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- execute()#
Wait for all computations to complete without materializing data.
- fillna(squeeze_self=False, squeeze_value=False, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)#
Replace NaN values using provided method.
- Parameters:
value (scalar or dict) –
method ({"backfill", "bfill", "pad", "ffill", None}) –
axis ({0, 1}) –
inplace ({False}) – This parameter serves the compatibility purpose. Always has to be False.
limit (int, optional) –
downcast (dict, optional) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
New QueryCompiler with all null values filled.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.fillna
for more information about parameters and output format.
- finalize()#
Finalize constructing the dataframe calling all deferred functions which were used to build it.
- floordiv(other, **kwargs)#
Perform element-wise integer division (
self // other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- force_import()#
Force table import.
- free()#
Trigger a cleanup of this object.
- classmethod from_arrow(at, data_cls)#
Build QueryCompiler from Arrow Table.
- Parameters:
at (Arrow Table) – The Arrow Table to convert from.
data_cls (type) –
PandasDataframe
class (or its descendant) to convert to.
- Returns:
QueryCompiler containing data from the pandas DataFrame.
- Return type:
- classmethod from_dataframe(df, data_cls)#
Build QueryCompiler from a DataFrame object supporting the dataframe exchange protocol __dataframe__().
- Parameters:
df (DataFrame) – The DataFrame object supporting the dataframe exchange protocol.
data_cls (type) –
PandasDataframe
class (or its descendant) to convert to.
- Returns:
QueryCompiler containing data from the DataFrame.
- Return type:
- classmethod from_pandas(df, data_cls)#
Build QueryCompiler from pandas DataFrame.
- Parameters:
df (pandas.DataFrame) – The pandas DataFrame to convert from.
data_cls (type) –
PandasDataframe
class (or its descendant) to convert to.
- Returns:
QueryCompiler containing data from the pandas DataFrame.
- Return type:
- ge(other, **kwargs)#
Perform element-wise greater than or equal comparison (
self >= other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- get_index_name(axis=0)#
Get index name of specified axis.
- Parameters:
axis ({0, 1}, default: 0) – Axis to get index name on.
- Returns:
Index name, None for MultiIndex.
- Return type:
hashable
- get_index_names(axis=0)#
Get index names of specified axis.
- Parameters:
axis ({0, 1}, default: 0) – Axis to get index names on.
- Returns:
Index names.
- Return type:
list
- getitem_array(key)#
Mask QueryCompiler with key.
- Parameters:
key (BaseQueryCompiler, np.ndarray or list of column labels) – Boolean mask represented by QueryCompiler or
np.ndarray
of the same shape as self, or enumerable of columns to pick.- Returns:
New masked QueryCompiler.
- Return type:
- getitem_column_array(key, numeric=False, ignore_order=False)#
Get column data for target labels.
- Parameters:
key (list-like) – Target labels by which to retrieve data.
numeric (bool, default: False) – Whether or not the key passed in represents the numeric index or the named index.
ignore_order (bool, default: False) – Allow returning columns in an arbitrary order for the sake of performance.
- Returns:
New QueryCompiler that contains specified columns.
- Return type:
- groupby_agg(by, agg_func, axis, groupby_kwargs, agg_args, agg_kwargs, how='axis_wise', drop=False, series_groupby=False)#
Group QueryCompiler data and apply passed aggregation function.
- Parameters:
by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.
agg_func (str, dict or callable(Series | DataFrame) -> scalar | Series | DataFrame) – Function to apply to the GroupBy object.
axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.
groupby_kwargs (dict) – GroupBy parameters as expected by
modin.pandas.DataFrame.groupby
signature.agg_args (list-like) – Positional arguments to pass to the agg_func.
agg_kwargs (dict) – Key arguments to pass to the agg_func.
how ({'axis_wise', 'group_wise', 'transform'}, default: 'axis_wise') –
- How to apply passed agg_func:
’axis_wise’: apply the function against each row/column.
’group_wise’: apply the function against every group.
’transform’: apply the function against every group and broadcast the result to the original Query Compiler shape.
drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.
series_groupby (bool, default: False) – Whether we should treat self as Series when performing groupby.
- Returns:
QueryCompiler containing the result of groupby aggregation.
- Return type:
Notes
Please refer to
modin.pandas.GroupBy.aggregate
for more information about parameters and output format.
- groupby_count(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#
Group QueryCompiler data and count non-null values for every group.
- Parameters:
by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.
axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.
groupby_kwargs (dict) – GroupBy parameters as expected by
modin.pandas.DataFrame.groupby
signature.agg_args (list-like) – Positional arguments to pass to the agg_func.
agg_kwargs (dict) – Key arguments to pass to the agg_func.
drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.
- Returns:
BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:
Labels on the opposite of specified axis are preserved.
If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.
If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.
Each element of QueryCompiler is the number of non-null values for the corresponding group and column/row.
.. warning – map_args and reduce_args parameters are deprecated. They’re leaked here from
PandasQueryCompiler.groupby_*
, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.
Notes
Please refer to
modin.pandas.GroupBy.count
for more information about parameters and output format.
- groupby_size(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#
Group QueryCompiler data and get the number of elements for every group.
- Parameters:
by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.
axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.
groupby_kwargs (dict) – GroupBy parameters as expected by
modin.pandas.DataFrame.groupby
signature.agg_args (list-like) – Positional arguments to pass to the agg_func.
agg_kwargs (dict) – Key arguments to pass to the agg_func.
drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.
- Returns:
BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:
Labels on the opposite of specified axis are preserved.
If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.
If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.
Each element of QueryCompiler is the number of elements for the corresponding group and column/row.
.. warning – map_args and reduce_args parameters are deprecated. They’re leaked here from
PandasQueryCompiler.groupby_*
, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.
Notes
Please refer to
modin.pandas.GroupBy.size
for more information about parameters and output format.
- groupby_sum(by, axis, groupby_kwargs, agg_args, agg_kwargs, drop=False)#
Group QueryCompiler data and compute sum for every group.
- Parameters:
by (BaseQueryCompiler, column or index label, Grouper or list of such) – Object that determine groups.
axis ({0, 1}) – Axis to group and apply aggregation function along. 0 is for index, when 1 is for columns.
groupby_kwargs (dict) – GroupBy parameters as expected by
modin.pandas.DataFrame.groupby
signature.agg_args (list-like) – Positional arguments to pass to the agg_func.
agg_kwargs (dict) – Key arguments to pass to the agg_func.
drop (bool, default: False) – If by is a QueryCompiler indicates whether or not by-data came from the self.
- Returns:
BaseQueryCompiler – QueryCompiler containing the result of groupby reduce built by the following rules:
Labels on the opposite of specified axis are preserved.
If groupby_args[“as_index”] is True then labels on the specified axis are the group names, otherwise labels would be default: 0, 1 … n.
If groupby_args[“as_index”] is False, then first N columns/rows of the frame contain group names, where N is the columns/rows to group on.
Each element of QueryCompiler is the sum for the corresponding group and column/row.
.. warning – map_args and reduce_args parameters are deprecated. They’re leaked here from
PandasQueryCompiler.groupby_*
, pandas storage format implements groupby via TreeReduce approach, but for other storage formats these parameters make no sense, and so they’ll be removed in the future.
Notes
Please refer to
modin.pandas.GroupBy.sum
for more information about parameters and output format.
- gt(other, **kwargs)#
Perform element-wise greater than comparison (
self > other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- has_multiindex(axis=0)#
Check if specified axis is indexed by MultiIndex.
- Parameters:
axis ({0, 1}, default: 0) – The axis to check (0 - index, 1 - columns).
- Returns:
True if index at specified axis is MultiIndex and False otherwise.
- Return type:
bool
- property index#
Return frame’s index.
- Return type:
pandas.Index
- insert(loc, column, value)#
Insert new column.
- Parameters:
loc (int) – Insertion position.
column (label) – Label of the new column.
value (One-column BaseQueryCompiler, 1D array or scalar) – Data to fill new column with.
- Returns:
QueryCompiler with new column inserted.
- Return type:
- invert()#
Apply bitwise inversion for each element of the QueryCompiler.
- Returns:
New QueryCompiler containing bitwise inversion for each value.
- Return type:
- is_series_like()#
Check whether this QueryCompiler can represent
modin.pandas.Series
object.- Returns:
Return True if QueryCompiler has a single column or row, False otherwise.
- Return type:
bool
- isna()#
Check for each element of self whether it’s NaN.
- Returns:
Boolean mask for self of whether an element at the corresponding position is NaN.
- Return type:
- le(other, **kwargs)#
Perform element-wise less than or equal comparison (
self <= other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- lt(other, **kwargs)#
Perform element-wise less than comparison (
self < other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- max(**kwargs)#
Get the maximum value for each column or row.
- Parameters:
axis ({{0, 1}}) –
numeric_only (bool, optional) –
skipna (bool, default: True) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the maximum value for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.max
for more information about parameters and output format.
- mean(**kwargs)#
Get the mean value for each column or row.
- Parameters:
axis ({{0, 1}}) –
numeric_only (bool, optional) –
skipna (bool, default: True) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the mean value for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.mean
for more information about parameters and output format.
- merge(right, **kwargs)#
Merge QueryCompiler objects using a database-style join.
- Parameters:
right (BaseQueryCompiler) – QueryCompiler of the right frame to merge with.
how ({"left", "right", "outer", "inner", "cross"}) –
on (label or list of such) –
left_on (label or list of such) –
right_on (label or list of such) –
left_index (bool) –
right_index (bool) –
sort (bool) –
suffixes (list-like) –
copy (bool) –
indicator (bool or str) –
validate (str) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
QueryCompiler that contains result of the merge.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.merge
for more information about parameters and output format.
- min(**kwargs)#
Get the minimum value for each column or row.
- Parameters:
axis ({{0, 1}}) –
numeric_only (bool, optional) –
skipna (bool, default: True) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the minimum value for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.min
for more information about parameters and output format.
- mod(other, **kwargs)#
Perform element-wise modulo (
self % other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- mul(other, **kwargs)#
Perform element-wise multiplication (
self * other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- ne(other, **kwargs)#
Perform element-wise not equal comparison (
self != other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- notna()#
Check for each element of self whether it’s existing (non-missing) value.
- Returns:
Boolean mask for self of whether an element at the corresponding position is not NaN.
- Return type:
- nunique(axis=0, dropna=True)#
Get the number of unique values for each column or row.
- Parameters:
axis ({0, 1}) –
dropna (bool) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the number of unique values for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.nunique
for more information about parameters and output format.
- pow(other, **kwargs)#
Perform element-wise exponential power (
self ** other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- reset_index(**kwargs)#
Reset the index, or a level of it.
- Parameters:
drop (bool) – Whether to drop the reset index or insert it at the beginning of the frame.
level (int or label, optional) – Level to remove from index. Removes all levels by default.
col_level (int or label) – If the columns have multiple levels, determines which level the labels are inserted into.
col_fill (label) – If the columns have multiple levels, determines how the other levels are named.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
QueryCompiler with reset index.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.reset_index
for more information about parameters and output format.
- set_index_name(name, axis=0)#
Set index name for the specified axis.
- Parameters:
name (hashable) – New index name.
axis ({0, 1}, default: 0) – Axis to set name along.
- set_index_names(names=None, axis=0)#
Set index names for the specified axis.
- Parameters:
names (list) – New index names.
axis ({0, 1}, default: 0) – Axis to set names along.
- setitem(axis, key, value)#
Set the row/column defined by key to the value provided.
- Parameters:
axis ({0, 1}) – Axis to set value along. 0 means set row, 1 means set column.
key (label) – Row/column label to set value in.
value (BaseQueryCompiler, list-like or scalar) – Define new row/column value.
- Returns:
New QueryCompiler with updated key value.
- Return type:
- sort_rows_by_column_values(columns, ascending=True, **kwargs)#
Reorder the rows based on the lexicographic order of the given columns.
- Parameters:
columns (label or list of labels) – The column or columns to sort by.
ascending (bool, default: True) – Sort in ascending order (True) or descending order (False).
kind ({"quicksort", "mergesort", "heapsort"}) –
na_position ({"first", "last"}) –
ignore_index (bool) –
key (callable(pandas.Index) -> pandas.Index, optional) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
New QueryCompiler that contains result of the sort.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.sort_values
for more information about parameters and output format.
- sub(other, **kwargs)#
Perform element-wise subtraction (
self - other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type:
- sum(**kwargs)#
Get the sum for each column or row.
- Parameters:
axis ({0, 1}) –
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
One-column QueryCompiler with index labels of the specified axis, where each row contains the sum for the corresponding row or column.
- Return type:
Notes
Please refer to
modin.pandas.DataFrame.sum
for more information about parameters and output format.
- support_materialization_in_worker_process() bool #
Whether it’s possible to call function to_pandas during the pickling process, at the moment of recreating the object.
- Return type:
bool
- take_2d_positional(index=None, columns=None)#
Index QueryCompiler with passed keys.
- Parameters:
index (list-like of ints, optional) – Positional indices of rows to grab.
columns (list-like of ints, optional) – Positional indices of columns to grab.
- Returns:
New masked QueryCompiler.
- Return type:
- to_dataframe(nan_as_null: bool = False, allow_copy: bool = True)#
Get a DataFrame exchange protocol object representing data of the Modin DataFrame.
See more about the protocol in https://data-apis.org/dataframe-protocol/latest/index.html.
- Parameters:
nan_as_null (bool, default: False) – A keyword intended for the consumer to tell the producer to overwrite null values in the data with
NaN
(orNaT
). This currently has no effect; once support for nullable extension dtypes is added, this value should be propagated to columns.allow_copy (bool, default: True) – A keyword that defines whether or not the library is allowed to make a copy of the data. For example, copying data would be necessary if a library supports strided buffers, given that this protocol specifies contiguous buffers. Currently, if the flag is set to
False
and a copy is needed, aRuntimeError
will be raised.
- Returns:
A dataframe object following the DataFrame protocol specification.
- Return type:
ProtocolDataframe
- to_pandas()#
Convert underlying query compilers data to
pandas.DataFrame
.- Returns:
The QueryCompiler converted to pandas.
- Return type:
pandas.DataFrame
- truediv(other, **kwargs)#
Perform element-wise division (
self / other
).If axes are not equal, perform frames alignment first.
- Parameters:
other (BaseQueryCompiler, scalar or array-like) – Other operand of the binary operation.
broadcast (bool, default: False) – If other is a one-column query compiler, indicates whether it is a Series or not. Frames and Series have to be processed differently, however we can’t distinguish them at the query compiler level, so this parameter is a hint that is passed from a high-level API.
level (int or label) – In case of MultiIndex match index values on the passed level.
axis ({{0, 1}}) – Axis to match indices along for 1D other (list or QueryCompiler that represents Series). 0 is for index, when 1 is for columns.
fill_value (float or None) – Value to fill missing elements during frame alignment.
**kwargs (dict) – Serves the compatibility purpose. Does not affect the result.
- Returns:
Result of binary operation.
- Return type: