Base pandas Dataset API#
The class implements functionality that is common to Modin’s pandas API for both DataFrame
and Series
classes.
Public API#
- class modin.pandas.base.BasePandasDataset
Implement most of the common code that exists in DataFrame/Series.
Since both objects share the same underlying representation, and the algorithms are the same, we use this object to define the general behavior of those objects and then use those objects to define the output type.
Notes
See pandas API documentation for pandas.DataFrame, pandas.Series for more.
- abs()
Return a BasePandasDataset with absolute numeric value of each element.
Notes
See pandas API documentation for pandas.DataFrame.abs, pandas.Series.abs for more.
- add(other, axis='columns', level=None, fill_value=None)
Return addition of BasePandasDataset and other, element-wise (binary operator add).
Notes
See pandas API documentation for pandas.DataFrame.add, pandas.Series.add for more.
- agg(func=None, axis=0, *args, **kwargs)
Aggregate using one or more operations over the specified axis.
Notes
See pandas API documentation for pandas.DataFrame.aggregate, pandas.Series.aggregate for more.
- aggregate(func=None, axis=0, *args, **kwargs)
Aggregate using one or more operations over the specified axis.
Notes
See pandas API documentation for pandas.DataFrame.aggregate, pandas.Series.aggregate for more.
- align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)
Align two objects on their axes with the specified join method.
Notes
See pandas API documentation for pandas.DataFrame.align, pandas.Series.align for more.
- all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
Return whether all elements are True, potentially over an axis.
Notes
See pandas API documentation for pandas.DataFrame.all, pandas.Series.all for more.
- any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
Return whether any element is True, potentially over an axis.
Notes
See pandas API documentation for pandas.DataFrame.any, pandas.Series.any for more.
- apply(func, axis, broadcast, raw, reduce, result_type, convert_dtype, args, **kwds)
Apply a function along an axis of the BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.apply, pandas.Series.apply for more.
- asfreq(freq, method=None, how=None, normalize=False, fill_value=None)
Convert time series to specified frequency.
Notes
See pandas API documentation for pandas.DataFrame.asfreq, pandas.Series.asfreq for more.
- asof(where, subset=None)
Return the last row(s) without any NaNs before where.
Notes
See pandas API documentation for pandas.DataFrame.asof, pandas.Series.asof for more.
- astype(dtype, copy=True, errors='raise')
Cast a Modin object to a specified dtype dtype.
Notes
See pandas API documentation for pandas.DataFrame.astype, pandas.Series.astype for more.
- property at
Get a single value for a row/column label pair.
Notes
See pandas API documentation for pandas.DataFrame.at, pandas.Series.at for more.
- at_time(time, asof=False, axis=None)
Select values at particular time of day (e.g., 9:30AM).
Notes
See pandas API documentation for pandas.DataFrame.at_time, pandas.Series.at_time for more.
- backfill(axis=None, inplace=False, limit=None, downcast=None)
Synonym for DataFrame.fillna with
method='bfill'
.Notes
See pandas API documentation for pandas.DataFrame.backfill, pandas.Series.backfill for more.
- between_time(start_time, end_time, include_start: bool | NoDefault = _NoDefault.no_default, include_end: bool | NoDefault = _NoDefault.no_default, inclusive: str | None = None, axis=None)
Select values between particular times of the day (e.g., 9:00-9:30 AM).
By setting
start_time
to be later thanend_time
, you can get the times that are not between the two times.- Parameters
start_time (datetime.time or str) – Initial time as a time filter limit.
end_time (datetime.time or str) – End time as a time filter limit.
include_start (bool, default True) –
Whether the start time needs to be included in the result.
Deprecated since version 1.4.0: Arguments include_start and include_end have been deprecated to standardize boundary inputs. Use inclusive instead, to set each bound as closed or open.
include_end (bool, default True) –
Whether the end time needs to be included in the result.
Deprecated since version 1.4.0: Arguments include_start and include_end have been deprecated to standardize boundary inputs. Use inclusive instead, to set each bound as closed or open.
inclusive ({"both", "neither", "left", "right"}, default "both") – Include boundaries; whether to set each bound as closed or open.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Determine range time on index or columns value. For Series this parameter is unused and defaults to 0.
- Returns
Data from the original object filtered to the specified dates range.
- Return type
- Raises
TypeError – If the index is not a
DatetimeIndex
See also
at_time
Select values at a particular time of the day.
first
Select initial periods of time series based on a date offset.
last
Select final periods of time series based on a date offset.
DatetimeIndex.indexer_between_time
Get just the index locations for values between particular times of the day.
Examples
>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min') >>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i) >>> ts A 2018-04-09 00:00:00 1 2018-04-10 00:20:00 2 2018-04-11 00:40:00 3 2018-04-12 01:00:00 4
>>> ts.between_time('0:15', '0:45') A 2018-04-10 00:20:00 2 2018-04-11 00:40:00 3
You get the times that are not between two times by setting
start_time
later thanend_time
:>>> ts.between_time('0:45', '0:15') A 2018-04-09 00:00:00 1 2018-04-12 01:00:00 4
Notes
See pandas API documentation for pandas.DataFrame.between_time for more.
- bfill(axis=None, inplace=False, limit=None, downcast=None)
Synonym for DataFrame.fillna with
method='bfill'
.Notes
See pandas API documentation for pandas.DataFrame.backfill, pandas.Series.backfill for more.
- bool()
Return the bool of a single element BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.bool, pandas.Series.bool for more.
- clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
Trim values at input threshold(s).
- combine(other, func, fill_value=None, **kwargs)
Perform combination of BasePandasDataset-s according to func.
Notes
See pandas API documentation for pandas.DataFrame.combine, pandas.Series.combine for more.
- combine_first(other)
Update null elements with value in the same location in other.
Notes
See pandas API documentation for pandas.DataFrame.combine_first, pandas.Series.combine_first for more.
- convert_dtypes(infer_objects: bool = True, convert_string: bool = True, convert_integer: bool = True, convert_boolean: bool = True, convert_floating: bool = True)
Convert columns to best possible dtypes using dtypes supporting
pd.NA
.Notes
See pandas API documentation for pandas.DataFrame.convert_dtypes, pandas.Series.convert_dtypes for more.
- copy(deep=True)
Make a copy of the object’s metadata.
Notes
See pandas API documentation for pandas.DataFrame.copy, pandas.Series.copy for more.
- count(axis=0, level=None, numeric_only=False)
Count non-NA cells for BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.count, pandas.Series.count for more.
- cummax(axis=None, skipna=True, *args, **kwargs)
Return cumulative maximum over a BasePandasDataset axis.
Notes
See pandas API documentation for pandas.DataFrame.cummax, pandas.Series.cummax for more.
- cummin(axis=None, skipna=True, *args, **kwargs)
Return cumulative minimum over a BasePandasDataset axis.
Notes
See pandas API documentation for pandas.DataFrame.cummin, pandas.Series.cummin for more.
- cumprod(axis=None, skipna=True, *args, **kwargs)
Return cumulative product over a BasePandasDataset axis.
Notes
See pandas API documentation for pandas.DataFrame.cumprod, pandas.Series.cumprod for more.
- cumsum(axis=None, skipna=True, *args, **kwargs)
Return cumulative sum over a BasePandasDataset axis.
Notes
See pandas API documentation for pandas.DataFrame.cumsum, pandas.Series.cumsum for more.
- describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
Generate descriptive statistics.
Notes
See pandas API documentation for pandas.DataFrame.describe, pandas.Series.describe for more.
- diff(periods=1, axis=0)
First discrete difference of element.
Notes
See pandas API documentation for pandas.DataFrame.diff, pandas.Series.diff for more.
- div(other, axis='columns', level=None, fill_value=None)
Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).
Notes
See pandas API documentation for pandas.DataFrame.truediv, pandas.Series.truediv for more.
- divide(other, axis='columns', level=None, fill_value=None)
Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).
Notes
See pandas API documentation for pandas.DataFrame.truediv, pandas.Series.truediv for more.
- drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Drop specified labels from BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.drop, pandas.Series.drop for more.
- drop_duplicates(keep='first', inplace=False, **kwargs)
Return BasePandasDataset with duplicate rows removed.
Notes
See pandas API documentation for pandas.DataFrame.drop_duplicates, pandas.Series.drop_duplicates for more.
- droplevel(level, axis=0)
Return BasePandasDataset with requested index / column level(s) removed.
Notes
See pandas API documentation for pandas.DataFrame.droplevel, pandas.Series.droplevel for more.
- dropna(axis: Axis = 0, how: str | NoDefault = _NoDefault.no_default, thresh: int | NoDefault = _NoDefault.no_default, subset: IndexLabel = None, inplace: bool = False)
Remove missing values.
Notes
See pandas API documentation for pandas.DataFrame.dropna, pandas.Series.dropna for more.
- eq(other, axis='columns', level=None)
Get equality of BasePandasDataset and other, element-wise (binary operator eq).
Notes
See pandas API documentation for pandas.DataFrame.eq, pandas.Series.eq for more.
- ewm(com: float | None = None, span: float | None = None, halflife: float | TimedeltaConvertibleTypes | None = None, alpha: float | None = None, min_periods: int | None = 0, adjust: bool = True, ignore_na: bool = False, axis: Axis = 0, times: str | np.ndarray | BasePandasDataset | None = None, method: str = 'single') ExponentialMovingWindow
Provide exponentially weighted (EW) calculations.
Notes
See pandas API documentation for pandas.DataFrame.ewm, pandas.Series.ewm for more.
- expanding(min_periods=1, center=None, axis=0, method='single')
Provide expanding window calculations.
Notes
See pandas API documentation for pandas.DataFrame.expanding, pandas.Series.expanding for more.
- explode(column, ignore_index: bool = False)
Transform each element of a list-like to a row.
Notes
See pandas API documentation for pandas.DataFrame.explode, pandas.Series.explode for more.
- ffill(axis=None, inplace=False, limit=None, downcast=None)
Synonym for DataFrame.fillna with
method='ffill'
.Notes
See pandas API documentation for pandas.DataFrame.pad, pandas.Series.pad for more.
- fillna(squeeze_self, squeeze_value, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.
- Parameters
squeeze_self (bool) – If True then self contains a Series object, if False then self contains a DataFrame object.
squeeze_value (bool) – If True then value contains a Series object, if False then value contains a DataFrame object.
value (scalar, dict, Series, or DataFrame, default: None) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
method ({'backfill', 'bfill', 'pad', 'ffill', None}, default: None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
axis ({None, 0, 1}, default: None) – Axis along which to fill missing values.
inplace (bool, default: False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
limit (int, default: None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcast (dict, default: None) – A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
- Returns
Object with missing values filled or None if
inplace=True
.- Return type
Notes
See pandas API documentation for pandas.DataFrame.fillna, pandas.Series.fillna for more.
- filter(items=None, like=None, regex=None, axis=None)
Subset the BasePandasDataset rows or columns according to the specified index labels.
Notes
See pandas API documentation for pandas.DataFrame.filter, pandas.Series.filter for more.
- first(offset)
Select initial periods of time series data based on a date offset.
Notes
See pandas API documentation for pandas.DataFrame.first, pandas.Series.first for more.
- first_valid_index()
Return index for first non-NA value or None, if no non-NA value is found.
Notes
See pandas API documentation for pandas.DataFrame.first_valid_index, pandas.Series.first_valid_index for more.
- property flags
Get the properties associated with this pandas object.
The available flags are
Flags.allows_duplicate_labels
See also
Flags
Flags that apply to pandas objects.
DataFrame.attrs
Global metadata applying to this dataset.
Notes
See pandas API documentation for pandas.DataFrame.flags, pandas.Series.flags for more. “Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in
DataFrame.attrs
.Examples
>>> df = pd.DataFrame({"A": [1, 2]}) >>> df.flags <Flags(allows_duplicate_labels=True)>
Flags can be get or set using
.
>>> df.flags.allows_duplicate_labels True >>> df.flags.allows_duplicate_labels = False
Or by slicing with a key
>>> df.flags["allows_duplicate_labels"] False >>> df.flags["allows_duplicate_labels"] = True
- floordiv(other, axis='columns', level=None, fill_value=None)
Get integer division of BasePandasDataset and other, element-wise (binary operator floordiv).
Notes
See pandas API documentation for pandas.DataFrame.floordiv, pandas.Series.floordiv for more.
- ge(other, axis='columns', level=None)
Get greater than or equal comparison of BasePandasDataset and other, element-wise (binary operator ge).
Notes
See pandas API documentation for pandas.DataFrame.ge, pandas.Series.ge for more.
- get(key, default=None)
Get item from object for given key.
Notes
See pandas API documentation for pandas.DataFrame.get, pandas.Series.get for more.
- gt(other, axis='columns', level=None)
Get greater than comparison of BasePandasDataset and other, element-wise (binary operator gt).
Notes
See pandas API documentation for pandas.DataFrame.gt, pandas.Series.gt for more.
- head(n=5)
Return the first n rows.
Notes
See pandas API documentation for pandas.DataFrame.head, pandas.Series.head for more.
- property iat
Get a single value for a row/column pair by integer position.
Notes
See pandas API documentation for pandas.DataFrame.iat, pandas.Series.iat for more.
- idxmax(axis=0, skipna=True, numeric_only=False)
Return index of first occurrence of maximum over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.idxmax, pandas.Series.idxmax for more.
- idxmin(axis=0, skipna=True, numeric_only=False)
Return index of first occurrence of minimum over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.idxmin, pandas.Series.idxmin for more.
- property iloc
Purely integer-location based indexing for selection by position.
Notes
See pandas API documentation for pandas.DataFrame.iloc, pandas.Series.iloc for more.
- property index
Get the index for this DataFrame.
- Returns
The union of all indexes across the partitions.
- Return type
pandas.Index
- infer_objects()
Attempt to infer better dtypes for object columns.
Notes
See pandas API documentation for pandas.DataFrame.infer_objects, pandas.Series.infer_objects for more.
- isin(values, **kwargs)
Whether elements in BasePandasDataset are contained in values.
Notes
See pandas API documentation for pandas.DataFrame.isin, pandas.Series.isin for more.
- isna()
Detect missing values.
Notes
See pandas API documentation for pandas.DataFrame.isna, pandas.Series.isna for more.
- isnull()
Detect missing values.
Notes
See pandas API documentation for pandas.DataFrame.isna, pandas.Series.isna for more.
- kurt(axis=_NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
axis ({index (0), columns (1)}) – Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
level (int or level name, default None) –
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
Deprecated since version 1.3.0: The level keyword is deprecated. Use groupby instead.
numeric_only (bool, default None) –
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Deprecated since version 1.5.0: Specifying
numeric_only=None
is deprecated. The default value will beFalse
in a future version of pandas.**kwargs – Additional keyword arguments to be passed to the function.
- Return type
Notes
See pandas API documentation for pandas.DataFrame.kurt for more.
- kurtosis(axis=_NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
axis ({index (0), columns (1)}) – Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
level (int or level name, default None) –
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
Deprecated since version 1.3.0: The level keyword is deprecated. Use groupby instead.
numeric_only (bool, default None) –
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Deprecated since version 1.5.0: Specifying
numeric_only=None
is deprecated. The default value will beFalse
in a future version of pandas.**kwargs – Additional keyword arguments to be passed to the function.
- Return type
Notes
See pandas API documentation for pandas.DataFrame.kurt for more.
- last(offset)
Select final periods of time series data based on a date offset.
Notes
See pandas API documentation for pandas.DataFrame.last, pandas.Series.last for more.
- last_valid_index()
Return index for last non-NA value or None, if no non-NA value is found.
Notes
See pandas API documentation for pandas.DataFrame.last_valid_index, pandas.Series.last_valid_index for more.
- le(other, axis='columns', level=None)
Get less than or equal comparison of BasePandasDataset and other, element-wise (binary operator le).
Notes
See pandas API documentation for pandas.DataFrame.le, pandas.Series.le for more.
- property loc
Get a group of rows and columns by label(s) or a boolean array.
Notes
See pandas API documentation for pandas.DataFrame.loc, pandas.Series.loc for more.
- lt(other, axis='columns', level=None)
Get less than comparison of BasePandasDataset and other, element-wise (binary operator lt).
Notes
See pandas API documentation for pandas.DataFrame.lt, pandas.Series.lt for more.
- mad(axis=None, skipna=True, level=None)
Return the mean absolute deviation of the values over the requested axis.
Notes
See pandas API documentation for pandas.DataFrame.mad, pandas.Series.mad for more.
- mask(cond, other=nan, inplace: bool = False, axis: Axis | None = None, level: Level = None, errors: IgnoreRaise | NoDefault = 'raise', try_cast=_NoDefault.no_default)
Replace values where the condition is True.
- max(axis: int | None | NoDefault = _NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)
Return the maximum of the values over the requested axis.
Notes
See pandas API documentation for pandas.DataFrame.max, pandas.Series.max for more.
- mean(axis: int | None | NoDefault = _NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)
Return the mean of the values over the requested axis.
Notes
See pandas API documentation for pandas.DataFrame.mean, pandas.Series.mean for more.
- median(axis: int | None | NoDefault = _NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)
Return the mean of the values over the requested axis.
Notes
See pandas API documentation for pandas.DataFrame.median, pandas.Series.median for more.
- memory_usage(index=True, deep=False)
Return the memory usage of the BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.memory_usage, pandas.Series.memory_usage for more.
- min(axis: Axis | None | NoDefault = _NoDefault.no_default, skipna: bool = True, level: Level | None = None, numeric_only=None, **kwargs)
Return the minimum of the values over the requested axis.
Notes
See pandas API documentation for pandas.DataFrame.min, pandas.Series.min for more.
- mod(other, axis='columns', level=None, fill_value=None)
Get modulo of BasePandasDataset and other, element-wise (binary operator mod).
Notes
See pandas API documentation for pandas.DataFrame.mod, pandas.Series.mod for more.
- mode(axis=0, numeric_only=False, dropna=True)
Get the mode(s) of each element along the selected axis.
Notes
See pandas API documentation for pandas.DataFrame.mode, pandas.Series.mode for more.
- mul(other, axis='columns', level=None, fill_value=None)
Get multiplication of BasePandasDataset and other, element-wise (binary operator mul).
Notes
See pandas API documentation for pandas.DataFrame.mul, pandas.Series.mul for more.
- multiply(other, axis='columns', level=None, fill_value=None)
Get multiplication of BasePandasDataset and other, element-wise (binary operator mul).
Notes
See pandas API documentation for pandas.DataFrame.mul, pandas.Series.mul for more.
- ne(other, axis='columns', level=None)
Get Not equal comparison of BasePandasDataset and other, element-wise (binary operator ne).
Notes
See pandas API documentation for pandas.DataFrame.ne, pandas.Series.ne for more.
- notna()
Detect existing (non-missing) values.
Notes
See pandas API documentation for pandas.DataFrame.notna, pandas.Series.notna for more.
- notnull()
Detect existing (non-missing) values.
Notes
See pandas API documentation for pandas.DataFrame.notna, pandas.Series.notna for more.
- nunique(axis=0, dropna=True)
Return number of unique elements in the BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.nunique, pandas.Series.nunique for more.
- pad(axis=None, inplace=False, limit=None, downcast=None)
Synonym for DataFrame.fillna with
method='ffill'
.Notes
See pandas API documentation for pandas.DataFrame.pad, pandas.Series.pad for more.
- pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
Percentage change between the current and a prior element.
Notes
See pandas API documentation for pandas.DataFrame.pct_change, pandas.Series.pct_change for more.
- pipe(func, *args, **kwargs)
Apply chainable functions that expect BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.pipe, pandas.Series.pipe for more.
- pop(item)
Return item and drop from frame. Raise KeyError if not found.
Notes
See pandas API documentation for pandas.DataFrame.pop, pandas.Series.pop for more.
- pow(other, axis='columns', level=None, fill_value=None)
Get exponential power of BasePandasDataset and other, element-wise (binary operator pow).
Notes
See pandas API documentation for pandas.DataFrame.pow, pandas.Series.pow for more.
- quantile(q, axis, numeric_only, interpolation, method)
Return values at the given quantile over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.quantile, pandas.Series.quantile for more.
- radd(other, axis='columns', level=None, fill_value=None)
Return addition of BasePandasDataset and other, element-wise (binary operator radd).
Notes
See pandas API documentation for pandas.DataFrame.radd, pandas.Series.radd for more.
- rank(axis=0, method: str = 'average', numeric_only=_NoDefault.no_default, na_option: str = 'keep', ascending: bool = True, pct: bool = False)
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
axis ({0 or 'index', 1 or 'columns'}, default 0) – Index to direct ranking. For Series this parameter is unused and defaults to 0.
method ({'average', 'min', 'max', 'first', 'dense'}, default 'average') –
How to rank the group of records that have the same value (i.e. ties):
average: average rank of the group
min: lowest rank in the group
max: highest rank in the group
first: ranks assigned in order they appear in the array
dense: like ‘min’, but rank always increases by 1 between groups.
numeric_only (bool, optional) – For DataFrame objects, rank only numeric columns if set to True.
na_option ({'keep', 'top', 'bottom'}, default 'keep') –
How to rank NaN values:
keep: assign NaN rank to NaN values
top: assign lowest rank to NaN values
bottom: assign highest rank to NaN values
ascending (bool, default True) – Whether or not the elements should be ranked in ascending order.
pct (bool, default False) – Whether or not to display the returned rankings in percentile form.
- Returns
Return a Series or DataFrame with data ranks as values.
- Return type
same type as caller
See also
core.groupby.GroupBy.rank
Rank of values within each group.
Examples
>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog', ... 'spider', 'snake'], ... 'Number_legs': [4, 2, 4, 8, np.nan]}) >>> df Animal Number_legs 0 cat 4.0 1 penguin 2.0 2 dog 4.0 3 spider 8.0 4 snake NaN
Ties are assigned the mean of the ranks (by default) for the group.
>>> s = pd.Series(range(5), index=list("abcde")) >>> s["d"] = s["b"] >>> s.rank() a 1.0 b 2.5 c 4.0 d 2.5 e 5.0 dtype: float64
The following example shows how the method behaves with the above parameters:
default_rank: this is the default behaviour obtained without using any parameter.
max_rank: setting
method = 'max'
the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)NA_bottom: choosing
na_option = 'bottom'
, if there are records with NaN values they are placed at the bottom of the ranking.pct_rank: when setting
pct = True
, the ranking is expressed as percentile rank.
>>> df['default_rank'] = df['Number_legs'].rank() >>> df['max_rank'] = df['Number_legs'].rank(method='max') >>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom') >>> df['pct_rank'] = df['Number_legs'].rank(pct=True) >>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank 0 cat 4.0 2.5 3.0 2.5 0.625 1 penguin 2.0 1.0 1.0 1.0 0.250 2 dog 4.0 2.5 3.0 2.5 0.625 3 spider 8.0 4.0 4.0 4.0 1.000 4 snake NaN NaN NaN 5.0 NaN
Notes
See pandas API documentation for pandas.DataFrame.rank for more.
- rdiv(other, axis='columns', level=None, fill_value=None)
Get floating division of BasePandasDataset and other, element-wise (binary operator rtruediv).
Notes
See pandas API documentation for pandas.DataFrame.rtruediv, pandas.Series.rtruediv for more.
- reindex(index=None, columns=None, copy=True, **kwargs)
Conform BasePandasDataset to new index with optional filling logic.
Notes
See pandas API documentation for pandas.DataFrame.reindex, pandas.Series.reindex for more.
- rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)
Set the name of the axis for the index or columns.
Notes
See pandas API documentation for pandas.DataFrame.rename_axis, pandas.Series.rename_axis for more.
- reorder_levels(order, axis=0)
Rearrange index levels using input order.
Notes
See pandas API documentation for pandas.DataFrame.reorder_levels, pandas.Series.reorder_levels for more.
- resample(rule, axis: Axis = 0, closed: str | None = None, label: str | None = None, convention: str = 'start', kind: str | None = None, loffset=None, base: int | None = None, on: Level = None, level: Level = None, origin: str | TimestampConvertibleTypes = 'start_day', offset: TimedeltaConvertibleTypes | None = None, group_keys=_NoDefault.no_default)
Resample time-series data.
Notes
See pandas API documentation for pandas.DataFrame.resample, pandas.Series.resample for more.
- reset_index(level: IndexLabel = None, drop: bool = False, inplace: bool = False, col_level: Hashable = 0, col_fill: Hashable = '', allow_duplicates=_NoDefault.no_default, names: Hashable | Sequence[Hashable] = None)
Reset the index, or a level of it.
Notes
See pandas API documentation for pandas.DataFrame.reset_index, pandas.Series.reset_index for more.
- rfloordiv(other, axis='columns', level=None, fill_value=None)
Get integer division of BasePandasDataset and other, element-wise (binary operator rfloordiv).
Notes
See pandas API documentation for pandas.DataFrame.rfloordiv, pandas.Series.rfloordiv for more.
- rmod(other, axis='columns', level=None, fill_value=None)
Get modulo of BasePandasDataset and other, element-wise (binary operator rmod).
Notes
See pandas API documentation for pandas.DataFrame.rmod, pandas.Series.rmod for more.
- rmul(other, axis='columns', level=None, fill_value=None)
Get Multiplication of dataframe and other, element-wise (binary operator rmul).
Notes
See pandas API documentation for pandas.DataFrame.rmul, pandas.Series.rmul for more.
- rolling(window, min_periods: int | None = None, center: bool = False, win_type: str | None = None, on: str | None = None, axis: Axis = 0, closed: str | None = None, step: int | None = None, method: str = 'single')
Provide rolling window calculations.
Notes
See pandas API documentation for pandas.DataFrame.rolling, pandas.Series.rolling for more.
- round(decimals=0, *args, **kwargs)
Round a BasePandasDataset to a variable number of decimal places.
Notes
See pandas API documentation for pandas.DataFrame.round, pandas.Series.round for more.
- rpow(other, axis='columns', level=None, fill_value=None)
Get exponential power of BasePandasDataset and other, element-wise (binary operator rpow).
Notes
See pandas API documentation for pandas.DataFrame.rpow, pandas.Series.rpow for more.
- rsub(other, axis='columns', level=None, fill_value=None)
Get subtraction of BasePandasDataset and other, element-wise (binary operator rsub).
Notes
See pandas API documentation for pandas.DataFrame.rsub, pandas.Series.rsub for more.
- rtruediv(other, axis='columns', level=None, fill_value=None)
Get floating division of BasePandasDataset and other, element-wise (binary operator rtruediv).
Notes
See pandas API documentation for pandas.DataFrame.rtruediv, pandas.Series.rtruediv for more.
- sample(n: int | None = None, frac: float | None = None, replace: bool = False, weights=None, random_state: RandomState | None = None, axis: Axis | None = None, ignore_index: bool = False)
Return a random sample of items from an axis of object.
Notes
See pandas API documentation for pandas.DataFrame.sample, pandas.Series.sample for more.
- sem(axis: Axis | None = None, skipna: bool = True, level: Level | None = None, ddof: int = 1, numeric_only=None, **kwargs)
Return unbiased standard error of the mean over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.sem, pandas.Series.sem for more.
- set_axis(labels, axis: Union[str, int] = 0, inplace=_NoDefault.no_default, *, copy=_NoDefault.no_default)
Assign desired index to given axis.
Notes
See pandas API documentation for pandas.DataFrame.set_axis, pandas.Series.set_axis for more.
- set_flags(*, copy: bool = False, allows_duplicate_labels: Optional[bool] = None)
Return a new BasePandasDataset with updated flags.
Notes
See pandas API documentation for pandas.DataFrame.set_flags, pandas.Series.set_flags for more.
- shift(periods: int = 1, freq=None, axis: Union[str, int] = 0, fill_value: Hashable = _NoDefault.no_default)
Shift index by desired number of periods with an optional time freq.
Notes
See pandas API documentation for pandas.DataFrame.shift, pandas.Series.shift for more.
- property size
Return an int representing the number of elements in this BasePandasDataset object.
Notes
See pandas API documentation for pandas.DataFrame.size, pandas.Series.size for more.
- skew(axis: Axis | None | NoDefault = _NoDefault.no_default, skipna: bool = True, level: Level | None = None, numeric_only=None, **kwargs)
Return unbiased skew over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.skew, pandas.Series.skew for more.
- sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: bool = False, key: Optional[Callable[[Index], Union[Index, ExtensionArray, ndarray, Series]]] = None)
Sort object by labels (along an axis).
Notes
See pandas API documentation for pandas.DataFrame.sort_index, pandas.Series.sort_index for more.
- sort_values(by, axis=0, ascending=True, inplace: bool = False, kind='quicksort', na_position='last', ignore_index: bool = False, key: Optional[Callable[[Index], Union[Index, ExtensionArray, ndarray, Series]]] = None)
Sort by the values along either axis.
Notes
See pandas API documentation for pandas.DataFrame.sort_values, pandas.Series.sort_values for more.
- std(axis: Axis | None = None, skipna: bool = True, level: Level | None = None, ddof: int = 1, numeric_only=None, **kwargs)
Return sample standard deviation over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.std, pandas.Series.std for more.
- sub(other, axis='columns', level=None, fill_value=None)
Get subtraction of BasePandasDataset and other, element-wise (binary operator sub).
Notes
See pandas API documentation for pandas.DataFrame.sub, pandas.Series.sub for more.
- subtract(other, axis='columns', level=None, fill_value=None)
Get subtraction of BasePandasDataset and other, element-wise (binary operator sub).
Notes
See pandas API documentation for pandas.DataFrame.sub, pandas.Series.sub for more.
- swapaxes(axis1, axis2, copy=True)
Interchange axes and swap values axes appropriately.
Notes
See pandas API documentation for pandas.DataFrame.swapaxes, pandas.Series.swapaxes for more.
- swaplevel(i=-2, j=-1, axis=0)
Swap levels i and j in a MultiIndex.
Notes
See pandas API documentation for pandas.DataFrame.swaplevel, pandas.Series.swaplevel for more.
- tail(n=5)
Return the last n rows.
Notes
See pandas API documentation for pandas.DataFrame.tail, pandas.Series.tail for more.
- take(indices, axis=0, is_copy=None, **kwargs)
Return the elements in the given positional indices along an axis.
Notes
See pandas API documentation for pandas.DataFrame.take, pandas.Series.take for more.
- to_clipboard(excel=True, sep=None, **kwargs)
Copy object to the system clipboard.
Notes
See pandas API documentation for pandas.DataFrame.to_clipboard, pandas.Series.to_clipboard for more.
- to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors: str = 'strict', storage_options: Optional[Dict[str, Any]] = None)
Write object to a comma-separated values (csv) file.
- Parameters
path_or_buf (str, path object, file-like object, or None, default None) –
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
Changed in version 1.2.0: Support for binary file objects was introduced.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, Callable, default None) – Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode (str, default 'w') – Python write mode. The available write modes are the same as
open()
.encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to
None
for no compression. Can also be a dict with key'method'
set to one of {'zip'
,'gzip'
,'bz2'
,'zstd'
,'tar'
} and other key-value pairs are forwarded tozipfile.ZipFile
,gzip.GzipFile
,bz2.BZ2File
,zstandard.ZstdCompressor
ortarfile.TarFile
, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive:compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}
.New in version 1.5.0: Added support for .tar files.
Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.
Changed in version 1.2.0: Compression is supported for binary file objects.
Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
lineterminator (str, optional) –
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).
Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
errors (str, default 'strict') –
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.New in version 1.1.0.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here.New in version 1.2.0.
- Returns
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
- Return type
None or str
See also
read_csv
Load a CSV file into a DataFrame.
to_excel
Write DataFrame to an Excel file.
Examples
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'], ... 'mask': ['red', 'purple'], ... 'weapon': ['sai', 'bo staff']}) >>> df.to_csv(index=False) 'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create ‘out.zip’ containing ‘out.csv’
>>> compression_opts = dict(method='zip', ... archive_name='out.csv') >>> df.to_csv('out.zip', index=False, ... compression=compression_opts)
To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:
>>> from pathlib import Path >>> filepath = Path('folder/subfolder/out.csv') >>> filepath.parent.mkdir(parents=True, exist_ok=True) >>> df.to_csv(filepath)
>>> import os >>> os.makedirs('folder/subfolder', exist_ok=True) >>> df.to_csv('folder/subfolder/out.csv')
Notes
See pandas API documentation for pandas.DataFrame.to_csv, pandas.Series.to_csv for more.
- to_dict(orient='dict', into=<class 'dict'>)
Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters (see below).
- Parameters
orient (str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}) –
Determines the type of the values of the dictionary.
’dict’ (default) : dict like {column -> {index -> value}}
’list’ : dict like {column -> [values]}
’series’ : dict like {column -> Series(values)}
’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
’tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}
’records’ : list like [{column -> value}, … , {column -> value}]
’index’ : dict like {index -> {column -> value}}
Abbreviations are allowed. s indicates series and sp indicates split.
New in version 1.4.0: ‘tight’ as an allowed value for the
orient
argumentinto (class, default dict) – The collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
- Returns
Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orient parameter.
- Return type
dict, list or collections.abc.Mapping
See also
DataFrame.from_dict
Create a DataFrame from a dictionary.
DataFrame.to_json
Convert a DataFrame to JSON format.
Examples
>>> df = pd.DataFrame({'col1': [1, 2], ... 'col2': [0.5, 0.75]}, ... index=['row1', 'row2']) >>> df col1 col2 row1 1 0.50 row2 2 0.75 >>> df.to_dict() {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can specify the return orientation.
>>> df.to_dict('series') {'col1': row1 1 row2 2 Name: col1, dtype: int64, 'col2': row1 0.50 row2 0.75 Name: col2, dtype: float64}
>>> df.to_dict('split') {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'], 'data': [[1, 0.5], [2, 0.75]]}
>>> df.to_dict('records') [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
>>> df.to_dict('index') {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
>>> df.to_dict('tight') {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'], 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}
You can also specify the mapping type.
>>> from collections import OrderedDict, defaultdict >>> df.to_dict(into=OrderedDict) OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
If you want a defaultdict, you need to initialize it:
>>> dd = defaultdict(list) >>> df.to_dict('records', into=dd) [defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}), defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
Notes
See pandas API documentation for pandas.DataFrame.to_dict, pandas.Series.to_dict for more.
- to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=_NoDefault.no_default, inf_rep='inf', verbose=_NoDefault.no_default, freeze_panes=None, storage_options: Optional[Dict[str, Any]] = None)
Write object to an Excel sheet.
Notes
See pandas API documentation for pandas.DataFrame.to_excel, pandas.Series.to_excel for more.
- to_hdf(path_or_buf, key, format='table', **kwargs)
Write the contained data to an HDF5 file using HDFStore.
Notes
See pandas API documentation for pandas.DataFrame.to_hdf, pandas.Series.to_hdf for more.
- to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=None, storage_options: Optional[Dict[str, Any]] = None)
Convert the object to a JSON string.
Notes
See pandas API documentation for pandas.DataFrame.to_json, pandas.Series.to_json for more.
- to_latex(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)
Render object to a LaTeX tabular, longtable, or nested table.
Notes
See pandas API documentation for pandas.DataFrame.to_latex, pandas.Series.to_latex for more.
- to_markdown(buf=None, mode: str = 'wt', index: bool = True, storage_options: Optional[Dict[str, Any]] = None, **kwargs)
Print BasePandasDataset in Markdown-friendly format.
Notes
See pandas API documentation for pandas.DataFrame.to_markdown, pandas.Series.to_markdown for more.
- to_numpy(dtype=None, copy=False, na_value=_NoDefault.no_default)
Convert the BasePandasDataset to a NumPy array.
Notes
See pandas API documentation for pandas.DataFrame.to_numpy, pandas.Series.to_numpy for more.
- to_period(freq=None, axis=0, copy=True)
Convert BasePandasDataset from DatetimeIndex to PeriodIndex.
Notes
See pandas API documentation for pandas.DataFrame.to_period, pandas.Series.to_period for more.
- to_pickle(path, compression: Optional[Union[Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd', 'tar'], Dict[str, Any]]] = 'infer', protocol: int = 5, storage_options: Optional[Dict[str, Any]] = None)
Pickle (serialize) object to file.
Notes
See pandas API documentation for pandas.DataFrame.to_pickle, pandas.Series.to_pickle for more.
- to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Write records stored in a BasePandasDataset to a SQL database.
Notes
See pandas API documentation for pandas.DataFrame.to_sql, pandas.Series.to_sql for more.
- to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, min_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, max_colwidth=None, encoding=None)
Render a BasePandasDataset to a console-friendly tabular output.
Notes
See pandas API documentation for pandas.DataFrame.to_string, pandas.Series.to_string for more.
- to_timestamp(freq=None, how='start', axis=0, copy=True)
Cast to DatetimeIndex of timestamps, at beginning of period.
Notes
See pandas API documentation for pandas.DataFrame.to_timestamp, pandas.Series.to_timestamp for more.
- to_xarray()
Return an xarray object from the BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.to_xarray, pandas.Series.to_xarray for more.
- transform(func, axis=0, *args, **kwargs)
Call
func
on self producing a BasePandasDataset with the same axis shape as self.Notes
See pandas API documentation for pandas.DataFrame.transform, pandas.Series.transform for more.
- truediv(other, axis='columns', level=None, fill_value=None)
Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).
Notes
See pandas API documentation for pandas.DataFrame.truediv, pandas.Series.truediv for more.
- truncate(before=None, after=None, axis=None, copy=True)
Truncate a BasePandasDataset before and after some index value.
Notes
See pandas API documentation for pandas.DataFrame.truncate, pandas.Series.truncate for more.
- tshift(periods=1, freq=None, axis=0)
Shift the time index, using the index’s frequency if available.
Notes
See pandas API documentation for pandas.DataFrame.tshift, pandas.Series.tshift for more.
- tz_convert(tz, axis=0, level=None, copy=True)
Convert tz-aware axis to target time zone.
Notes
See pandas API documentation for pandas.DataFrame.tz_convert, pandas.Series.tz_convert for more.
- tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')
Localize tz-naive index of a BasePandasDataset to target time zone.
Notes
See pandas API documentation for pandas.DataFrame.tz_localize, pandas.Series.tz_localize for more.
- value_counts(subset: Sequence[Hashable] | None = None, normalize: bool = False, sort: bool = True, ascending: bool = False, dropna: bool = True)
Return a Series containing counts of unique rows in the DataFrame.
New in version 1.1.0.
- Parameters
subset (list-like, optional) – Columns to use when counting unique combinations.
normalize (bool, default False) – Return proportions rather than frequencies.
sort (bool, default True) – Sort by frequencies.
ascending (bool, default False) – Sort in ascending order.
dropna (bool, default True) –
Don’t include counts of rows that contain NA values.
New in version 1.3.0.
- Return type
See also
Series.value_counts
Equivalent method on Series.
Notes
See pandas API documentation for pandas.DataFrame.value_counts for more. The returned Series will have a MultiIndex with one level per input column. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.
Examples
>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6], ... 'num_wings': [2, 0, 0, 0]}, ... index=['falcon', 'dog', 'cat', 'ant']) >>> df num_legs num_wings falcon 2 2 dog 4 0 cat 4 0 ant 6 0
>>> df.value_counts() num_legs num_wings 4 0 2 2 2 1 6 0 1 dtype: int64
>>> df.value_counts(sort=False) num_legs num_wings 2 2 1 4 0 2 6 0 1 dtype: int64
>>> df.value_counts(ascending=True) num_legs num_wings 2 2 1 6 0 1 4 0 2 dtype: int64
>>> df.value_counts(normalize=True) num_legs num_wings 4 0 0.50 2 2 0.25 6 0 0.25 dtype: float64
With dropna set to False we can also count rows with NA values.
>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'], ... 'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']}) >>> df first_name middle_name 0 John Smith 1 Anne <NA> 2 John <NA> 3 Beth Louise
>>> df.value_counts() first_name middle_name Beth Louise 1 John Smith 1 dtype: int64
>>> df.value_counts(dropna=False) first_name middle_name Anne NaN 1 Beth Louise 1 John Smith 1 NaN 1 dtype: int64
- property values
Return a NumPy representation of the BasePandasDataset.
Notes
See pandas API documentation for pandas.DataFrame.values, pandas.Series.values for more.
- var(axis: Axis | None = None, skipna: bool = True, level: Level | None = None, ddof: int = 1, numeric_only=None, **kwargs)
Return unbiased variance over requested axis.
Notes
See pandas API documentation for pandas.DataFrame.var, pandas.Series.var for more.
- xs(key, axis=0, level=None, drop_level: bool = True)
Return cross-section from the Series/DataFrame.
Notes
See pandas API documentation for pandas.DataFrame.xs, pandas.Series.xs for more.