Base pandas Dataset API#

The class implements functionality that is common to Modin’s pandas API for both DataFrame and Series classes.

Public API#

class modin.pandas.base.BasePandasDataset

Implement most of the common code that exists in DataFrame/Series.

Since both objects share the same underlying representation, and the algorithms are the same, we use this object to define the general behavior of those objects and then use those objects to define the output type.

Notes

See pandas API documentation for pandas.DataFrame, pandas.Series for more.

abs() → Self

Return a BasePandasDataset with absolute numeric value of each element.

Notes

See pandas API documentation for pandas.DataFrame.abs, pandas.Series.abs for more.

add(other, axis='columns', level=None, fill_value=None) → Self

Return addition of BasePandasDataset and other, element-wise (binary operator add).

Notes

See pandas API documentation for pandas.DataFrame.add, pandas.Series.add for more.

agg(func=None, axis=0, *args, **kwargs) → DataFrame | Series | Scalar

Aggregate using one or more operations over the specified axis.

Notes

See pandas API documentation for pandas.DataFrame.agg, pandas.Series.agg for more.

aggregate(func=None, axis=0, *args, **kwargs) → DataFrame | Series | Scalar

Aggregate using one or more operations over the specified axis.

Notes

See pandas API documentation for pandas.DataFrame.aggregate, pandas.Series.aggregate for more.

align(other, join='outer', axis=None, level=None, copy=None, fill_value=None, method=_NoDefault.no_default, limit=_NoDefault.no_default, fill_axis=_NoDefault.no_default, broadcast_axis=_NoDefault.no_default) → tuple[Self, Self]

Align two objects on their axes with the specified join method.

Notes

See pandas API documentation for pandas.DataFrame.align, pandas.Series.align for more.

all(axis=0, bool_only=False, skipna=True, **kwargs) → Self

Return whether all elements are True, potentially over an axis.

Notes

See pandas API documentation for pandas.DataFrame.all, pandas.Series.all for more.

any(*, axis=0, bool_only=False, skipna=True, **kwargs) → Self

Return whether any element is True, potentially over an axis.

Notes

See pandas API documentation for pandas.DataFrame.any, pandas.Series.any for more.

apply(func, axis, raw, result_type, args, **kwds) → BaseQueryCompiler

Apply a function along an axis of the BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.apply, pandas.Series.apply for more.

asfreq(freq, method=None, how=None, normalize=False, fill_value=None) → Self

Convert time series to specified frequency.

Notes

See pandas API documentation for pandas.DataFrame.asfreq, pandas.Series.asfreq for more.

asof(where, subset=None) → Self

Return the last row(s) without any NaNs before where.

Notes

See pandas API documentation for pandas.DataFrame.asof, pandas.Series.asof for more.

astype(dtype, copy=None, errors='raise') → Self

Cast a Modin object to a specified dtype dtype.

Notes

See pandas API documentation for pandas.DataFrame.astype, pandas.Series.astype for more.

property at: _LocIndexer

Get a single value for a row/column label pair.

Notes

See pandas API documentation for pandas.DataFrame.at, pandas.Series.at for more.

at_time(time, asof=False, axis=None) → Self

Select values at particular time of day (e.g., 9:30AM).

Notes

See pandas API documentation for pandas.DataFrame.at_time, pandas.Series.at_time for more.

backfill(*, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default) → Self

Synonym for DataFrame.bfill.

Notes

See pandas API documentation for pandas.DataFrame.backfill, pandas.Series.backfill for more.

between_time(start_time, end_time, inclusive='both', axis=None) → Self

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

Parameters:

start_time (datetime.time or str) – Initial time as a time filter limit.
end_time (datetime.time or str) – End time as a time filter limit.
inclusive ({"both", "neither", "left", "right"}, default "both") – Include boundaries; whether to set each bound as closed or open.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Determine range time on index or columns value. For Series this parameter is unused and defaults to 0.

Returns:

Data from the original object filtered to the specified dates range.

Return type:

Series or DataFrame

Raises:

TypeError – If the index is not a DatetimeIndex

See also

at_time: Select values at a particular time of the day.
first: Select initial periods of time series based on a date offset.
last: Select final periods of time series based on a date offset.
DatetimeIndex.indexer_between_time: Get just the index locations for values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4

>>> ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

>>> ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

Notes

See pandas API documentation for pandas.DataFrame.between_time for more.

bfill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=_NoDefault.no_default) → Self

Synonym for DataFrame.fillna with method='bfill'.

Notes

See pandas API documentation for pandas.DataFrame.bfill, pandas.Series.bfill for more.

bool() → bool

Return the bool of a single element BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.bool, pandas.Series.bool for more.

clip(lower=None, upper=None, *, axis=None, inplace=False, **kwargs) → Self

Trim values at input threshold(s).

Notes

See pandas API documentation for pandas.DataFrame.clip, pandas.Series.clip for more.

combine(other, func, fill_value=None, **kwargs) → Self

Perform combination of BasePandasDataset-s according to func.

Notes

See pandas API documentation for pandas.DataFrame.combine, pandas.Series.combine for more.

combine_first(other) → Self

Update null elements with value in the same location in other.

Notes

See pandas API documentation for pandas.DataFrame.combine_first, pandas.Series.combine_first for more.

convert_dtypes(infer_objects: bool = True, convert_string: bool = True, convert_integer: bool = True, convert_boolean: bool = True, convert_floating: bool = True, dtype_backend: DtypeBackend = 'numpy_nullable') → Self

Convert columns to best possible dtypes using dtypes supporting pd.NA.

Notes

See pandas API documentation for pandas.DataFrame.convert_dtypes, pandas.Series.convert_dtypes for more.

copy(deep=True) → Self

Make a copy of the object’s metadata.

Notes

See pandas API documentation for pandas.DataFrame.copy, pandas.Series.copy for more.

count(axis=0, numeric_only=False) → Series | Scalar

Count non-NA cells for BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.count, pandas.Series.count for more.

cummax(axis=None, skipna=True, *args, **kwargs) → Self

Return cumulative maximum over a BasePandasDataset axis.

Notes

See pandas API documentation for pandas.DataFrame.cummax, pandas.Series.cummax for more.

cummin(axis=None, skipna=True, *args, **kwargs) → Self

Return cumulative minimum over a BasePandasDataset axis.

Notes

See pandas API documentation for pandas.DataFrame.cummin, pandas.Series.cummin for more.

cumprod(axis=None, skipna=True, *args, **kwargs) → Self

Return cumulative product over a BasePandasDataset axis.

Notes

See pandas API documentation for pandas.DataFrame.cumprod, pandas.Series.cumprod for more.

cumsum(axis=None, skipna=True, *args, **kwargs) → Self

Return cumulative sum over a BasePandasDataset axis.

Notes

See pandas API documentation for pandas.DataFrame.cumsum, pandas.Series.cumsum for more.

describe(percentiles=None, include=None, exclude=None) → Self

Generate descriptive statistics.

Notes

See pandas API documentation for pandas.DataFrame.describe, pandas.Series.describe for more.

diff(periods=1, axis=0) → Self

First discrete difference of element.

Notes

See pandas API documentation for pandas.DataFrame.diff, pandas.Series.diff for more.

div(other, axis='columns', level=None, fill_value=None) → Self

Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).

Notes

See pandas API documentation for pandas.DataFrame.div, pandas.Series.div for more.

divide(other, axis='columns', level=None, fill_value=None) → Self

Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).

Notes

See pandas API documentation for pandas.DataFrame.divide, pandas.Series.divide for more.

drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') → Self

Drop specified labels from BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.drop, pandas.Series.drop for more.

drop_duplicates(keep='first', inplace=False, **kwargs) → Self

Return BasePandasDataset with duplicate rows removed.

Notes

See pandas API documentation for pandas.DataFrame.drop_duplicates, pandas.Series.drop_duplicates for more.

droplevel(level, axis=0) → Self

Return BasePandasDataset with requested index / column level(s) removed.

Notes

See pandas API documentation for pandas.DataFrame.droplevel, pandas.Series.droplevel for more.

dropna(*, axis: Axis = 0, how: str | lib.NoDefault = _NoDefault.no_default, thresh: int | lib.NoDefault = _NoDefault.no_default, subset: IndexLabel = None, inplace: bool = False, ignore_index: bool = False) → Self

Remove missing values.

Notes

See pandas API documentation for pandas.DataFrame.dropna, pandas.Series.dropna for more.

eq(other, axis='columns', level=None) → Self

Get equality of BasePandasDataset and other, element-wise (binary operator eq).

Notes

See pandas API documentation for pandas.DataFrame.eq, pandas.Series.eq for more.

Provide exponentially weighted (EW) calculations.

Notes

See pandas API documentation for pandas.DataFrame.ewm, pandas.Series.ewm for more.

expanding(min_periods=1, axis=_NoDefault.no_default, method='single') → Expanding

Provide expanding window calculations.

Notes

See pandas API documentation for pandas.DataFrame.expanding, pandas.Series.expanding for more.

explode(column, ignore_index: bool = False) → Self

Transform each element of a list-like to a row.

Notes

See pandas API documentation for pandas.DataFrame.explode, pandas.Series.explode for more.

ffill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=_NoDefault.no_default) → Self | None

Synonym for DataFrame.fillna with method='ffill'.

Notes

See pandas API documentation for pandas.DataFrame.ffill, pandas.Series.ffill for more.

fillna(squeeze_self, squeeze_value, value=None, method=None, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default) → Self | None

Fill NA/NaN values using the specified method.

Parameters:

squeeze_self (bool) – If True then self contains a Series object, if False then self contains a DataFrame object.
squeeze_value (bool) – If True then value contains a Series object, if False then value contains a DataFrame object.
value (scalar, dict, Series, or DataFrame, default: None) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
method ({'backfill', 'bfill', 'pad', 'ffill', None}, default: None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
axis ({None, 0, 1}, default: None) – Axis along which to fill missing values.
inplace (bool, default: False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
limit (int, default: None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcast (dict, default: None) – A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series, DataFrame or None

Notes

See pandas API documentation for pandas.DataFrame.fillna, pandas.Series.fillna for more.

filter(items=None, like=None, regex=None, axis=None) → Self

Subset the BasePandasDataset rows or columns according to the specified index labels.

Notes

See pandas API documentation for pandas.DataFrame.filter, pandas.Series.filter for more.

first(offset) → Self | None

Select initial periods of time series data based on a date offset.

Notes

See pandas API documentation for pandas.DataFrame.first, pandas.Series.first for more.

first_valid_index() → int

Return index for first non-NA value or None, if no non-NA value is found.

Notes

See pandas API documentation for pandas.DataFrame.first_valid_index, pandas.Series.first_valid_index for more.

property flags

Get the properties associated with this pandas object.

The available flags are

Flags.allows_duplicate_labels

See also

Flags: Flags that apply to pandas objects.
DataFrame.attrs: Global metadata applying to this dataset.

Notes

See pandas API documentation for pandas.DataFrame.flags, pandas.Series.flags for more. “Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True

floordiv(other, axis='columns', level=None, fill_value=None) → Self

Get integer division of BasePandasDataset and other, element-wise (binary operator floordiv).

Notes

See pandas API documentation for pandas.DataFrame.floordiv, pandas.Series.floordiv for more.

ge(other, axis='columns', level=None) → Self

Get greater than or equal comparison of BasePandasDataset and other, element-wise (binary operator ge).

Notes

See pandas API documentation for pandas.DataFrame.ge, pandas.Series.ge for more.

get(key, default=None) → DataFrame | Series | Scalar

Get item from object for given key.

Notes

See pandas API documentation for pandas.DataFrame.get, pandas.Series.get for more.

get_backend() → str

Get the backend for this BasePandasDataset.

Returns:: The name of the backend.
Return type:: str

gt(other, axis='columns', level=None) → Self

Get greater than comparison of BasePandasDataset and other, element-wise (binary operator gt).

Notes

See pandas API documentation for pandas.DataFrame.gt, pandas.Series.gt for more.

head(n=5) → Self

Return the first n rows.

Notes

See pandas API documentation for pandas.DataFrame.head, pandas.Series.head for more.

property iat: _iLocIndexer

Get a single value for a row/column pair by integer position.

Notes

See pandas API documentation for pandas.DataFrame.iat, pandas.Series.iat for more.

idxmax(axis=0, skipna=True, numeric_only=False) → Self

Return index of first occurrence of maximum over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.idxmax, pandas.Series.idxmax for more.

idxmin(axis=0, skipna=True, numeric_only=False) → Self

Return index of first occurrence of minimum over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.idxmin, pandas.Series.idxmin for more.

property iloc: _iLocIndexer

Purely integer-location based indexing for selection by position.

Notes

See pandas API documentation for pandas.DataFrame.iloc, pandas.Series.iloc for more.

property index: Index

Get the index for this DataFrame.

Returns:: The union of all indexes across the partitions.
Return type:: pandas.Index

infer_objects(copy=None) → Self

Attempt to infer better dtypes for object columns.

Notes

See pandas API documentation for pandas.DataFrame.infer_objects, pandas.Series.infer_objects for more.

interpolate(method='linear', *, axis=0, limit=None, inplace=False, limit_direction: Optional[str] = None, limit_area=None, downcast=_NoDefault.no_default, **kwargs) → Self

Fill NaN values using an interpolation method.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

Parameters:

method (str, default 'linear') –
Interpolation technique to use. One of:
- ’linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
- ’time’: Works on daily and higher resolution data to interpolate given length of interval.
- ’index’, ‘values’: use the actual numerical values of the index.
- ’pad’: Fill in NaNs using existing values.
- ’nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d, whereas ‘spline’ is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5). Note that, slinear method in Pandas refers to the Scipy first order spline instead of Pandas first order spline.
- ’krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
- ’from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives.
axis ({{0 or 'index', 1 or 'columns', None}}, default None) – Axis to interpolate along. For Series this parameter is unused and defaults to 0.
limit (int, optional) – Maximum number of consecutive NaNs to fill. Must be greater than 0.
inplace (bool, default False) – Update the data in place if possible.
limit_direction ({{'forward', 'backward', 'both'}}, Optional) –
Consecutive NaNs will be filled in this direction.
If limit is specified:
- If ‘method’ is ‘pad’ or ‘ffill’, ‘limit_direction’ must be ‘forward’.
- If ‘method’ is ‘backfill’ or ‘bfill’, ‘limit_direction’ must be ‘backwards’.
If ‘limit’ is not specified:
- If ‘method’ is ‘backfill’ or ‘bfill’, the default is ‘backward’
- else the default is ‘forward’
raises ValueError if limit_direction is ‘forward’ or ‘both’ and
method is ‘backfill’ or ‘bfill’.

raises ValueError if limit_direction is ‘backward’ or ‘both’ and
method is ‘pad’ or ‘ffill’.
limit_area ({{None, ‘inside’, ‘outside’}}, default None) –
If limit is specified, consecutive NaNs will be filled with this restriction.
- None: No fill restriction.
- ’inside’: Only fill NaNs surrounded by valid values (interpolate).
- ’outside’: Only fill NaNs outside valid values (extrapolate).
downcast (optional, 'infer' or None, defaults to None) –
Downcast dtypes if possible.

Deprecated since version 2.1.0.
**kwargs (optional) – Keyword arguments to pass on to the interpolating function.

Returns:

Returns the same object type as the caller, interpolated at some or all NaN values or None if inplace=True.

Return type:

Series or DataFrame or None

See also

fillna: Fill missing values using different methods.
scipy.interpolate.Akima1DInterpolator: Piecewise cubic polynomials (Akima interpolator).
scipy.interpolate.BPoly.from_derivatives: Piecewise polynomial in the Bernstein basis.
scipy.interpolate.interp1d: Interpolate a 1-D function.
scipy.interpolate.KroghInterpolator: Interpolate polynomial (Krogh interpolator).
scipy.interpolate.PchipInterpolator: PCHIP 1-d monotonic cubic interpolation.
scipy.interpolate.CubicSpline: Cubic spline data interpolator.

Notes

See pandas API documentation for pandas.DataFrame.interpolate, pandas.Series.interpolate for more. The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. For more information on their behavior, see the SciPy documentation.

Examples

Filling in NaN in a Series via linear interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in NaN in a Series via polynomial interpolation or splines: Both ‘polynomial’ and ‘spline’ methods require that you also specify an order (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Fill the DataFrame forward (that is, going down) along each column using linear interpolation.

Note how the last entry in column ‘a’ is interpolated differently, because there is no entry after it to use for interpolation. Note how the first entry in column ‘b’ remains NaN, because there is no entry before it to use for interpolation.

>>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
...                    (np.nan, 2.0, np.nan, np.nan),
...                    (2.0, 3.0, np.nan, 9.0),
...                    (np.nan, 4.0, -4.0, 16.0)],
...                   columns=list('abcd'))
>>> df
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0

Using polynomial interpolation.

>>> df['d'].interpolate(method='polynomial', order=2)
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64

is_backend_pinned() → bool

Get whether this object’s data is pinned to a particular backend.

Returns:: True if the data is pinned.
Return type:: bool

isin(values) → Self

Whether elements in BasePandasDataset are contained in values.

Notes

See pandas API documentation for pandas.DataFrame.isin, pandas.Series.isin for more.

isna() → Self

Detect missing values.

Notes

See pandas API documentation for pandas.DataFrame.isna, pandas.Series.isna for more.

isnull() → Self

Detect missing values.

Notes

See pandas API documentation for pandas.DataFrame.isnull, pandas.Series.isnull for more.

kurt(axis=0, skipna=True, numeric_only=False, **kwargs) → Series | float

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

Notes

See pandas API documentation for pandas.DataFrame.kurt for more.

kurtosis(axis=0, skipna=True, numeric_only=False, **kwargs) → Series | float

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

Notes

See pandas API documentation for pandas.DataFrame.kurt for more.

last(offset) → Self

Select final periods of time series data based on a date offset.

Notes

See pandas API documentation for pandas.DataFrame.last, pandas.Series.last for more.

last_valid_index() → int

Return index for last non-NA value or None, if no non-NA value is found.

Notes

See pandas API documentation for pandas.DataFrame.last_valid_index, pandas.Series.last_valid_index for more.

le(other, axis='columns', level=None) → Self

Get less than or equal comparison of BasePandasDataset and other, element-wise (binary operator le).

Notes

See pandas API documentation for pandas.DataFrame.le, pandas.Series.le for more.

property loc: _LocIndexer

Get a group of rows and columns by label(s) or a boolean array.

Notes

See pandas API documentation for pandas.DataFrame.loc, pandas.Series.loc for more.

lt(other, axis='columns', level=None) → Self

Get less than comparison of BasePandasDataset and other, element-wise (binary operator lt).

Notes

See pandas API documentation for pandas.DataFrame.lt, pandas.Series.lt for more.

mask(cond, other=_NoDefault.no_default, *, inplace: bool = False, axis: Optional[Axis] = None, level: Optional[Level] = None) → Self | None

Replace values where the condition is True.

Notes

See pandas API documentation for pandas.DataFrame.mask, pandas.Series.mask for more.

max(axis: Axis = 0, skipna=True, numeric_only=False, **kwargs) → Series | None

Return the maximum of the values over the requested axis.

Notes

See pandas API documentation for pandas.DataFrame.max, pandas.Series.max for more.

mean(axis: Axis = 0, skipna=True, numeric_only=False, **kwargs) → Series | float

Return the mean of the values over the requested axis.

Notes

See pandas API documentation for pandas.DataFrame.mean, pandas.Series.mean for more.

median(axis: Axis = 0, skipna=True, numeric_only=False, **kwargs) → Series | float

Return the mean of the values over the requested axis.

Notes

See pandas API documentation for pandas.DataFrame.median, pandas.Series.median for more.

memory_usage(index=True, deep=False) → Series | None

Return the memory usage of the BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.memory_usage, pandas.Series.memory_usage for more.

min(axis: Axis = 0, skipna: bool = True, numeric_only=False, **kwargs) → Series | None

Return the minimum of the values over the requested axis.

Notes

See pandas API documentation for pandas.DataFrame.min, pandas.Series.min for more.

mod(other, axis='columns', level=None, fill_value=None) → Self

Get modulo of BasePandasDataset and other, element-wise (binary operator mod).

Notes

See pandas API documentation for pandas.DataFrame.mod, pandas.Series.mod for more.

mode(axis=0, numeric_only=False, dropna=True) → Self

Get the mode(s) of each element along the selected axis.

Notes

See pandas API documentation for pandas.DataFrame.mode, pandas.Series.mode for more.

modin: alias of ModinAPI

move_to(backend: str, inplace: bool = False, *, switch_operation: str = None) → Optional[Self]

Move the data in this BasePandasDataset from its current backend to the given one.

Further operations on this BasePandasDataset will use the new backend instead of the current one.

Parameters:

backend (str) – The name of the backend to set.
inplace (bool, default: False) – Whether to modify this BasePandasDataset in place.
switch_operation (Optional[str], default: None) – The name of the operation that triggered the set_backend call. Internal argument used for displaying progress bar information.

Returns:

If inplace is False, returns a new instance of the BasePandasDataset with the given backend. If inplace is True, returns None.

Return type:

BasePandasDataset or None

Notes

This method will attempt to use the starting and new backend’s move_from or move_to methods if the backends implement them. Otherwise, it will

convert the data in this BasePandasDataset to a pandas DataFrame in this Python process

load the data from pandas to the new backend.

Either step may be slow and/or memory-intensive, especially if this BasePandasDataset’s data is large, or one or both of the backends do not store their data locally.

mul(other, axis='columns', level=None, fill_value=None) → Self

Get multiplication of BasePandasDataset and other, element-wise (binary operator mul).

Notes

See pandas API documentation for pandas.DataFrame.mul, pandas.Series.mul for more.

multiply(other, axis='columns', level=None, fill_value=None) → Self

Get multiplication of BasePandasDataset and other, element-wise (binary operator mul).

Notes

See pandas API documentation for pandas.DataFrame.multiply, pandas.Series.multiply for more.

ne(other, axis='columns', level=None) → Self

Get Not equal comparison of BasePandasDataset and other, element-wise (binary operator ne).

Notes

See pandas API documentation for pandas.DataFrame.ne, pandas.Series.ne for more.

notna() → Self

Detect existing (non-missing) values.

Notes

See pandas API documentation for pandas.DataFrame.notna, pandas.Series.notna for more.

notnull() → Self

Detect existing (non-missing) values.

Notes

See pandas API documentation for pandas.DataFrame.notnull, pandas.Series.notnull for more.

nunique(axis=0, dropna=True) → Series | int

Return number of unique elements in the BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.nunique, pandas.Series.nunique for more.

pad(*, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default) → Self | None

Synonym for DataFrame.ffill.

Notes

See pandas API documentation for pandas.DataFrame.pad, pandas.Series.pad for more.

pct_change(periods=1, fill_method=_NoDefault.no_default, limit=_NoDefault.no_default, freq=None, **kwargs) → Self

Percentage change between the current and a prior element.

Notes

See pandas API documentation for pandas.DataFrame.pct_change, pandas.Series.pct_change for more.

pipe(func: Callable[..., T] | tuple[Callable[..., T], str], *args, **kwargs) → T

Apply chainable functions that expect BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.pipe, pandas.Series.pipe for more.

pop(item) → Series | Scalar

Return item and drop from frame. Raise KeyError if not found.

Notes

See pandas API documentation for pandas.DataFrame.pop, pandas.Series.pop for more.

pow(other, axis='columns', level=None, fill_value=None) → Self

Get exponential power of BasePandasDataset and other, element-wise (binary operator pow).

Notes

See pandas API documentation for pandas.DataFrame.pow, pandas.Series.pow for more.

quantile(q, axis, numeric_only, interpolation, method) → DataFrame | Series | Scalar

Return values at the given quantile over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.quantile, pandas.Series.quantile for more.

radd(other, axis='columns', level=None, fill_value=None) → Self

Return addition of BasePandasDataset and other, element-wise (binary operator radd).

Notes

See pandas API documentation for pandas.DataFrame.radd, pandas.Series.radd for more.

rank(axis=0, method: str = 'average', numeric_only=False, na_option: str = 'keep', ascending: bool = True, pct: bool = False) → Self

Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – Index to direct ranking. For Series this parameter is unused and defaults to 0.
method ({'average', 'min', 'max', 'first', 'dense'}, default 'average') –
How to rank the group of records that have the same value (i.e. ties):
- average: average rank of the group
- min: lowest rank in the group
- max: highest rank in the group
- first: ranks assigned in order they appear in the array
- dense: like ‘min’, but rank always increases by 1 between groups.
numeric_only (bool, default False) –
For DataFrame objects, rank only numeric columns if set to True.

Changed in version 2.0.0: The default value of numeric_only is now False.
na_option ({'keep', 'top', 'bottom'}, default 'keep') –
How to rank NaN values:
- keep: assign NaN rank to NaN values
- top: assign lowest rank to NaN values
- bottom: assign highest rank to NaN values
ascending (bool, default True) – Whether or not the elements should be ranked in ascending order.
pct (bool, default False) – Whether or not to display the returned rankings in percentile form.

Returns:

Return a Series or DataFrame with data ranks as values.

Return type:

same type as caller

See also

core.groupby.DataFrameGroupBy.rank: Rank of values within each group.
core.groupby.SeriesGroupBy.rank: Rank of values within each group.

Examples

>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
...                                    'spider', 'snake'],
...                         'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
    Animal  Number_legs
0      cat          4.0
1  penguin          2.0
2      dog          4.0
3   spider          8.0
4    snake          NaN

Ties are assigned the mean of the ranks (by default) for the group.

>>> s = pd.Series(range(5), index=list("abcde"))
>>> s["d"] = s["b"]
>>> s.rank()
a    1.0
b    2.5
c    4.0
d    2.5
e    5.0
dtype: float64

The following example shows how the method behaves with the above parameters:

default_rank: this is the default behaviour obtained without using any parameter.
max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.
pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
    Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN

Notes

See pandas API documentation for pandas.DataFrame.rank for more.

rdiv(other, axis='columns', level=None, fill_value=None) → Self

Get floating division of BasePandasDataset and other, element-wise (binary operator rtruediv).

Notes

See pandas API documentation for pandas.DataFrame.rdiv, pandas.Series.rdiv for more.

reindex(index=None, columns=None, copy=True, **kwargs) → Self

Conform BasePandasDataset to new index with optional filling logic.

Notes

See pandas API documentation for pandas.DataFrame.reindex, pandas.Series.reindex for more.

rename_axis(mapper=_NoDefault.no_default, *, index=_NoDefault.no_default, columns=_NoDefault.no_default, axis=0, copy=None, inplace=False) → DataFrame | Series | None

Set the name of the axis for the index or columns.

Notes

See pandas API documentation for pandas.DataFrame.rename_axis, pandas.Series.rename_axis for more.

reorder_levels(order, axis=0) → Self

Rearrange index levels using input order.

Notes

See pandas API documentation for pandas.DataFrame.reorder_levels, pandas.Series.reorder_levels for more.

resample(rule, axis: Axis = _NoDefault.no_default, closed: Optional[str] = None, label: Optional[str] = None, convention: str = _NoDefault.no_default, kind: Optional[str] = _NoDefault.no_default, on: Level = None, level: Level = None, origin: str | TimestampConvertibleTypes = 'start_day', offset: Optional[TimedeltaConvertibleTypes] = None, group_keys=False) → Resampler

Resample time-series data.

Notes

See pandas API documentation for pandas.DataFrame.resample, pandas.Series.resample for more.

reset_index(level: IndexLabel = None, *, drop: bool = False, inplace: bool = False, col_level: Hashable = 0, col_fill: Hashable = '', allow_duplicates=_NoDefault.no_default, names: Hashable | Sequence[Hashable] = None) → DataFrame | Series | None

Reset the index, or a level of it.

Notes

See pandas API documentation for pandas.DataFrame.reset_index, pandas.Series.reset_index for more.

rfloordiv(other, axis='columns', level=None, fill_value=None) → Self

Get integer division of BasePandasDataset and other, element-wise (binary operator rfloordiv).

Notes

See pandas API documentation for pandas.DataFrame.rfloordiv, pandas.Series.rfloordiv for more.

rmod(other, axis='columns', level=None, fill_value=None) → Self

Get modulo of BasePandasDataset and other, element-wise (binary operator rmod).

Notes

See pandas API documentation for pandas.DataFrame.rmod, pandas.Series.rmod for more.

rmul(other, axis='columns', level=None, fill_value=None) → Self

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Notes

See pandas API documentation for pandas.DataFrame.rmul, pandas.Series.rmul for more.

Provide rolling window calculations.

Notes

See pandas API documentation for pandas.DataFrame.rolling, pandas.Series.rolling for more.

round(decimals=0, *args, **kwargs) → Self

Round a BasePandasDataset to a variable number of decimal places.

Notes

See pandas API documentation for pandas.DataFrame.round, pandas.Series.round for more.

rpow(other, axis='columns', level=None, fill_value=None) → Self

Get exponential power of BasePandasDataset and other, element-wise (binary operator rpow).

Notes

See pandas API documentation for pandas.DataFrame.rpow, pandas.Series.rpow for more.

rsub(other, axis='columns', level=None, fill_value=None) → Self

Get subtraction of BasePandasDataset and other, element-wise (binary operator rsub).

Notes

See pandas API documentation for pandas.DataFrame.rsub, pandas.Series.rsub for more.

rtruediv(other, axis='columns', level=None, fill_value=None) → Self

Get floating division of BasePandasDataset and other, element-wise (binary operator rtruediv).

Notes

See pandas API documentation for pandas.DataFrame.rtruediv, pandas.Series.rtruediv for more.

sample(n: int | None = None, frac: float | None = None, replace: bool = False, weights=None, random_state: RandomState | None = None, axis: Axis | None = None, ignore_index: bool = False) → Self

Return a random sample of items from an axis of object.

Notes

See pandas API documentation for pandas.DataFrame.sample, pandas.Series.sample for more.

sem(axis: Axis = 0, skipna: bool = True, ddof: int = 1, numeric_only=False, **kwargs) → Series | float

Return unbiased standard error of the mean over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.sem, pandas.Series.sem for more.

set_axis(labels, *, axis: Axis = 0, copy=None) → Self

Assign desired index to given axis.

Notes

See pandas API documentation for pandas.DataFrame.set_axis, pandas.Series.set_axis for more.

set_backend(backend: str, inplace: bool = False, *, switch_operation: str = None) → Optional[Self]

Move the data in this BasePandasDataset from its current backend to the given one.

Further operations on this BasePandasDataset will use the new backend instead of the current one.

Parameters:

backend (str) – The name of the backend to set.
inplace (bool, default: False) – Whether to modify this BasePandasDataset in place.
switch_operation (Optional[str], default: None) – The name of the operation that triggered the set_backend call. Internal argument used for displaying progress bar information.

Returns:

If inplace is False, returns a new instance of the BasePandasDataset with the given backend. If inplace is True, returns None.

Return type:

BasePandasDataset or None

Notes

This method will attempt to use the starting and new backend’s move_from or move_to methods if the backends implement them. Otherwise, it will

convert the data in this BasePandasDataset to a pandas DataFrame in this Python process

load the data from pandas to the new backend.

Either step may be slow and/or memory-intensive, especially if this BasePandasDataset’s data is large, or one or both of the backends do not store their data locally.

set_flags(*, copy: bool = False, allows_duplicate_labels: Optional[bool] = None) → Self

Return a new BasePandasDataset with updated flags.

Notes

See pandas API documentation for pandas.DataFrame.set_flags, pandas.Series.set_flags for more.

shift(periods: int = 1, freq=None, axis: Axis = 0, fill_value: Hashable = _NoDefault.no_default, suffix=None) → Self | DataFrame

Shift index by desired number of periods with an optional time freq.

Notes

See pandas API documentation for pandas.DataFrame.shift, pandas.Series.shift for more.

property size: int

Return an int representing the number of elements in this BasePandasDataset object.

Notes

See pandas API documentation for pandas.DataFrame.size, pandas.Series.size for more.

skew(axis: Axis = 0, skipna: bool = True, numeric_only=False, **kwargs) → Series | float

Return unbiased skew over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.skew, pandas.Series.skew for more.

sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: bool = False, key: Optional[IndexKeyFunc] = None) → Self | None

Sort object by labels (along an axis).

Notes

See pandas API documentation for pandas.DataFrame.sort_index, pandas.Series.sort_index for more.

sort_values(by, *, axis=0, ascending=True, inplace: bool = False, kind='quicksort', na_position='last', ignore_index: bool = False, key: Optional[IndexKeyFunc] = None) → Self | None

Sort by the values along either axis.

Notes

See pandas API documentation for pandas.DataFrame.sort_values, pandas.Series.sort_values for more.

std(axis: Axis = 0, skipna: bool = True, ddof: int = 1, numeric_only=False, **kwargs) → Series | float

Return sample standard deviation over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.std, pandas.Series.std for more.

sub(other, axis='columns', level=None, fill_value=None) → Self

Get subtraction of BasePandasDataset and other, element-wise (binary operator sub).

Notes

See pandas API documentation for pandas.DataFrame.sub, pandas.Series.sub for more.

subtract(other, axis='columns', level=None, fill_value=None) → Self

Get subtraction of BasePandasDataset and other, element-wise (binary operator sub).

Notes

See pandas API documentation for pandas.DataFrame.subtract, pandas.Series.subtract for more.

swapaxes(axis1, axis2, copy=None) → Self

Interchange axes and swap values axes appropriately.

Notes

See pandas API documentation for pandas.DataFrame.swapaxes, pandas.Series.swapaxes for more.

swaplevel(i=-2, j=-1, axis=0) → Self

Swap levels i and j in a MultiIndex.

Notes

See pandas API documentation for pandas.DataFrame.swaplevel, pandas.Series.swaplevel for more.

tail(n=5) → Self

Return the last n rows.

Notes

See pandas API documentation for pandas.DataFrame.tail, pandas.Series.tail for more.

take(indices, axis=0, **kwargs) → Self

Return the elements in the given positional indices along an axis.

Notes

See pandas API documentation for pandas.DataFrame.take, pandas.Series.take for more.

to_clipboard(excel=True, sep=None, **kwargs)

Copy object to the system clipboard.

Notes

See pandas API documentation for pandas.DataFrame.to_clipboard, pandas.Series.to_clipboard for more.

to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors: str = 'strict', storage_options: StorageOptions = None) → str | None

Write object to a comma-separated values (csv) file.

Parameters:

path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, Callable, default None) – Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode ({'w', 'x', 'a'}, default 'w') –
Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:
- ’w’, truncate the file first.
- ’x’, exclusive creation, failing if the file already exists.
- ’a’, append to the end of file if it exists.
encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

New in version 1.5.0: Added support for .tar files.

May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
lineterminator (str, optional) –
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).

Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.
storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Returns:

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

Return type:

None or str

See also

read_csv: Load a CSV file into a DataFrame.
to_excel: Write DataFrame to an Excel file.

Examples

Create ‘out.csv’ containing ‘df’ without indices

>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
...                    'mask': ['red', 'purple'],
...                    'weapon': ['sai', 'bo staff']})
>>> df.to_csv('out.csv', index=False)  

Create ‘out.zip’ containing ‘out.csv’

>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
>>> compression_opts = dict(method='zip',
...                         archive_name='out.csv')  
>>> df.to_csv('out.zip', index=False,
...           compression=compression_opts)  

To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:

>>> from pathlib import Path  
>>> filepath = Path('folder/subfolder/out.csv')  
>>> filepath.parent.mkdir(parents=True, exist_ok=True)  
>>> df.to_csv(filepath)  

>>> import os  
>>> os.makedirs('folder/subfolder', exist_ok=True)  
>>> df.to_csv('folder/subfolder/out.csv')  

Notes

See pandas API documentation for pandas.DataFrame.to_csv, pandas.Series.to_csv for more.

to_dict(orient='dict', into=<class 'dict'>, index=True) → dict

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Parameters:

orient (str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}) –
Determines the type of the values of the dictionary.
- ’dict’ (default) : dict like {column -> {index -> value}}
- ’list’ : dict like {column -> [values]}
- ’series’ : dict like {column -> Series(values)}
- ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
- ’tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}
- ’records’ : list like [{column -> value}, … , {column -> value}]
- ’index’ : dict like {index -> {column -> value}}
New in version 1.4.0: ‘tight’ as an allowed value for the orient argument
into (class, default dict) – The collections.abc.MutableMapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
index (bool, default True) –
Whether to include the index item (and index_names item if orient is ‘tight’) in the returned dictionary. Can only be False when orient is ‘split’ or ‘tight’.

New in version 2.0.0.

Returns:

Return a collections.abc.MutableMapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

Return type:

dict, list or collections.abc.MutableMapping

See also

DataFrame.from_dict: Create a DataFrame from a dictionary.
DataFrame.to_json: Convert a DataFrame to JSON format.

Examples

>>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

You can specify the return orientation.

>>> df.to_dict('series')
{'col1': row1    1
         row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
        row2    0.75
Name: col2, dtype: float64}

>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]]}

>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

>>> df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}

You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
             ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
 defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]

Notes

See pandas API documentation for pandas.DataFrame.to_dict, pandas.Series.to_dict for more.

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, inf_rep='inf', freeze_panes=None, storage_options: Optional[dict[str, Any]] = None, engine_kwargs=None) → None

Write object to an Excel sheet.

Notes

See pandas API documentation for pandas.DataFrame.to_excel, pandas.Series.to_excel for more.

to_hdf(path_or_buf, key: str, mode: Literal['a', 'w', 'r+'] = 'a', complevel: int | None = None, complib: Literal['zlib', 'lzo', 'bzip2', 'blosc'] | None = None, append: bool = False, format: Literal['fixed', 'table'] | None = None, index: bool = True, min_itemsize: int | dict[str, int] | None = None, nan_rep=None, dropna: bool | None = None, data_columns: Literal[True] | list[str] | None = None, errors: str = 'strict', encoding: str = 'UTF-8') → None

Write the contained data to an HDF5 file using HDFStore.

Notes

See pandas API documentation for pandas.DataFrame.to_hdf, pandas.Series.to_hdf for more.

to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=None, indent=None, storage_options: StorageOptions = None, mode='w') → str | None

Convert the object to a JSON string.

Notes

See pandas API documentation for pandas.DataFrame.to_json, pandas.Series.to_json for more.

to_latex(buf=None, columns=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None) → str | None

Render object to a LaTeX tabular, longtable, or nested table.

Notes

See pandas API documentation for pandas.DataFrame.to_latex, pandas.Series.to_latex for more.

to_markdown(buf=None, mode: str = 'wt', index: bool = True, storage_options: Optional[dict[str, Any]] = None, **kwargs) → str

Print BasePandasDataset in Markdown-friendly format.

Notes

See pandas API documentation for pandas.DataFrame.to_markdown, pandas.Series.to_markdown for more.

to_numpy(dtype=None, copy=False, na_value=_NoDefault.no_default) → ndarray

Convert the BasePandasDataset to a NumPy array or a Modin wrapper for NumPy array.

Notes

See pandas API documentation for pandas.DataFrame.to_numpy, pandas.Series.to_numpy for more.

to_period(freq=None, axis=0, copy=None) → Self

Convert BasePandasDataset from DatetimeIndex to PeriodIndex.

Notes

See pandas API documentation for pandas.DataFrame.to_period, pandas.Series.to_period for more.

to_pickle(path, compression: Optional[Union[Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd', 'tar'], dict[str, Any]]] = 'infer', protocol: int = 5, storage_options: Optional[dict[str, Any]] = None) → None

Pickle (serialize) object to file.

Notes

See pandas API documentation for pandas.DataFrame.to_pickle, pandas.Series.to_pickle for more.

to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None) → int | None

Write records stored in a BasePandasDataset to a SQL database.

Notes

See pandas API documentation for pandas.DataFrame.to_sql, pandas.Series.to_sql for more.

to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, min_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, max_colwidth=None, encoding=None) → str | None

Render a BasePandasDataset to a console-friendly tabular output.

Notes

See pandas API documentation for pandas.DataFrame.to_string, pandas.Series.to_string for more.

to_timestamp(freq=None, how='start', axis=0, copy=None) → Self

Cast to DatetimeIndex of timestamps, at beginning of period.

Notes

See pandas API documentation for pandas.DataFrame.to_timestamp, pandas.Series.to_timestamp for more.

to_xarray()

Return an xarray object from the BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.to_xarray, pandas.Series.to_xarray for more.

transform(func, axis=0, *args, **kwargs) → Self

Call func on self producing a BasePandasDataset with the same axis shape as self.

Notes

See pandas API documentation for pandas.DataFrame.transform, pandas.Series.transform for more.

truediv(other, axis='columns', level=None, fill_value=None) → Self

Get floating division of BasePandasDataset and other, element-wise (binary operator truediv).

Notes

See pandas API documentation for pandas.DataFrame.truediv, pandas.Series.truediv for more.

truncate(before=None, after=None, axis=None, copy=None) → Self

Truncate a BasePandasDataset before and after some index value.

Notes

See pandas API documentation for pandas.DataFrame.truncate, pandas.Series.truncate for more.

tz_convert(tz, axis=0, level=None, copy=None) → Self

Convert tz-aware axis to target time zone.

Notes

See pandas API documentation for pandas.DataFrame.tz_convert, pandas.Series.tz_convert for more.

tz_localize(tz, axis=0, level=None, copy=None, ambiguous='raise', nonexistent='raise') → Self

Localize tz-naive index of a BasePandasDataset to target time zone.

Notes

See pandas API documentation for pandas.DataFrame.tz_localize, pandas.Series.tz_localize for more.

value_counts(subset: Sequence[Hashable] | None = None, normalize: bool = False, sort: bool = True, ascending: bool = False, dropna: bool = True) → Series

Return a Series containing the frequency of each distinct row in the Dataframe.

Parameters:

subset (label or list of labels, optional) – Columns to use when counting unique combinations.
normalize (bool, default False) – Return proportions rather than frequencies.
sort (bool, default True) – Sort by frequencies when True. Sort by DataFrame column values when False.
ascending (bool, default False) – Sort in ascending order.
dropna (bool, default True) –
Don’t include counts of rows that contain NA values.

New in version 1.3.0.

Return type:

Series

See also

Series.value_counts: Equivalent method on Series.

Notes

See pandas API documentation for pandas.DataFrame.value_counts for more. The returned Series will have a MultiIndex with one level per input column but an Index (non-multi) for a single label. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
...                    'num_wings': [2, 0, 0, 0]},
...                   index=['falcon', 'dog', 'cat', 'ant'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0

>>> df.value_counts()
num_legs  num_wings
4         0            2
2         2            1
6         0            1
Name: count, dtype: int64

>>> df.value_counts(sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
Name: count, dtype: int64

>>> df.value_counts(ascending=True)
num_legs  num_wings
2         2            1
6         0            1
4         0            2
Name: count, dtype: int64

>>> df.value_counts(normalize=True)
num_legs  num_wings
4         0            0.50
2         2            0.25
6         0            0.25
Name: proportion, dtype: float64

With dropna set to False we can also count rows with NA values.

>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
...                    'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
>>> df
  first_name middle_name
0       John       Smith
1       Anne        <NA>
2       John        <NA>
3       Beth      Louise

>>> df.value_counts()
first_name  middle_name
Beth        Louise         1
John        Smith          1
Name: count, dtype: int64

>>> df.value_counts(dropna=False)
first_name  middle_name
Anne        NaN            1
Beth        Louise         1
John        Smith          1
            NaN            1
Name: count, dtype: int64

>>> df.value_counts("first_name")
first_name
John    2
Anne    1
Beth    1
Name: count, dtype: int64

property values: ndarray

Return a NumPy representation of the BasePandasDataset.

Notes

See pandas API documentation for pandas.DataFrame.values, pandas.Series.values for more.

var(axis: Axis = 0, skipna: bool = True, ddof: int = 1, numeric_only=False, **kwargs) → Series | float

Return unbiased variance over requested axis.

Notes

See pandas API documentation for pandas.DataFrame.var, pandas.Series.var for more.

xs(key, axis=0, level=None, drop_level: bool = True) → Self

Return cross-section from the Series/DataFrame.

Notes

See pandas API documentation for pandas.DataFrame.xs, pandas.Series.xs for more.