BasePandasDataset

The class implements functionality that is common to Modin’s pandas API for both DataFrame and Series classes.

Public API

class modin.pandas.base.BasePandasDataset

Implement most of the common code that exists in DataFrame/Series.

Since both objects share the same underlying representation, and the algorithms are the same, we use this object to define the general behavior of those objects and then use those objects to define the output type.

Notes

See pandas API documentation for pandas.DataFrame for more.

abs()

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns

Series/DataFrame containing the absolute value of each element.

Return type

abs

See also

numpy.absolute

Calculate the absolute value element-wise.

Notes

See pandas API documentation for pandas.DataFrame.abs for more. For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

Examples

Absolute numeric values in a Series.

>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64

Absolute numeric values in a Series with complex numbers.

>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0    1.56205
dtype: float64

Absolute numeric values in a Series with a Timedelta element.

>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0   1 days
dtype: timedelta64[ns]

Select rows with data closest to certain value using argsort (from StackOverflow).

>>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })
>>> df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
>>> df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50
add(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.add for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
agg(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters
  • func (function, str, list or dict) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, 'mean']

    • dict of axis labels -> functions, function names or list of such.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

Returns

  • scalar, Series or DataFrame – The return can be:

    • scalar : when Series.agg is called with single function

    • Series : when DataFrame.agg is called with a single function

    • DataFrame : when DataFrame.agg is called with several functions

    Return scalar, Series or DataFrame.

  • The aggregation operations are always performed over an axis, either the

  • index (default) or the column axis. This behavior is different from

  • numpy aggregation functions (mean, median, prod, sum, std,

  • var), where the default is to compute the aggregation of the flattened

  • array, e.g., numpy.mean(arr_2d) as opposed to

  • numpy.mean(arr_2d, axis=0).

  • agg is an alias for aggregate. Use the alias.

See also

DataFrame.apply

Perform any type of operations.

DataFrame.transform

Perform transformation type operations.

core.groupby.GroupBy

Perform operations over groups.

core.resample.Resampler

Perform operations over resampled bins.

core.window.Rolling

Perform operations over rolling window.

core.window.Expanding

Perform operations over expanding window.

core.window.ExponentialMovingWindow

Perform operation over exponential weighted window.

Notes

See pandas API documentation for pandas.DataFrame.aggregate for more. agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
aggregate(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters
  • func (function, str, list or dict) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, 'mean']

    • dict of axis labels -> functions, function names or list of such.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

Returns

  • scalar, Series or DataFrame – The return can be:

    • scalar : when Series.agg is called with single function

    • Series : when DataFrame.agg is called with a single function

    • DataFrame : when DataFrame.agg is called with several functions

    Return scalar, Series or DataFrame.

  • The aggregation operations are always performed over an axis, either the

  • index (default) or the column axis. This behavior is different from

  • numpy aggregation functions (mean, median, prod, sum, std,

  • var), where the default is to compute the aggregation of the flattened

  • array, e.g., numpy.mean(arr_2d) as opposed to

  • numpy.mean(arr_2d, axis=0).

  • agg is an alias for aggregate. Use the alias.

See also

DataFrame.apply

Perform any type of operations.

DataFrame.transform

Perform transformation type operations.

core.groupby.GroupBy

Perform operations over groups.

core.resample.Resampler

Perform operations over resampled bins.

core.window.Rolling

Perform operations over rolling window.

core.window.Expanding

Perform operations over expanding window.

core.window.ExponentialMovingWindow

Perform operation over exponential weighted window.

Notes

See pandas API documentation for pandas.DataFrame.aggregate for more. agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)

Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters
  • other (DataFrame or Series) –

  • join ({'outer', 'inner', 'left', 'right'}, default 'outer') –

  • axis (allowed axis of the other object, default None) – Align on index (0), columns (1), or both (None).

  • level (int or level name, default None) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • copy (bool, default True) – Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

  • fill_value (scalar, default np.NaN) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

  • method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) –

    Method to use for filling holes in reindexed Series:

    • pad / ffill: propagate last valid observation forward to next valid.

    • backfill / bfill: use NEXT valid observation to fill gap.

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • fill_axis ({0 or 'index', 1 or 'columns'}, default 0) – Filling axis, method and limit.

  • broadcast_axis ({0 or 'index', 1 or 'columns'}, default None) – Broadcast values along this axis, if aligning two objects of different dimensions.

Returns

(left, right) – Aligned objects.

Return type

(DataFrame, type of other)

Notes

See pandas API documentation for pandas.DataFrame.align for more.

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters
  • axis ({0 or 'index', 1 or 'columns', None}, default 0) –

    Indicate which axis or axes should be reduced.

    • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

    • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

    • None : reduce all axes, return a scalar.

  • bool_only (bool, default None) – Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

  • skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • **kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type

Series or DataFrame

See also

Series.all

Return True if all elements are True.

DataFrame.any

Return True if one (or more) elements are True.

Examples

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if column-wise values all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if row-wise values all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False

Notes

See pandas API documentation for pandas.DataFrame.all for more.

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters
  • axis ({0 or 'index', 1 or 'columns', None}, default 0) –

    Indicate which axis or axes should be reduced.

    • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

    • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

    • None : reduce all axes, return a scalar.

  • bool_only (bool, default None) – Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

  • skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • **kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type

Series or DataFrame

See also

numpy.any

Numpy version of this method.

Series.any

Return whether any element is True.

Series.all

Return whether all elements are True.

DataFrame.any

Return whether any element is True over requested axis.

DataFrame.all

Return whether all elements are True over requested axis.

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0
>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.any(axis='columns')
0    True
1    True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0
>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)

Notes

See pandas API documentation for pandas.DataFrame.any for more.

apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, convert_dtype=True, args=(), **kwds)

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters
  • func (function) – Function to apply to each column or row.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Axis along which the function is applied:

    • 0 or ‘index’: apply function to each column.

    • 1 or ‘columns’: apply function to each row.

  • raw (bool, default False) –

    Determines if row or column is passed as a Series or ndarray object:

    • False : passes each row or column as a Series to the function.

    • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

  • result_type ({'expand', 'reduce', 'broadcast', None}, default None) –

    These only act when axis=1 (columns):

    • ’expand’ : list-like results will be turned into columns.

    • ’reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.

    • ’broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

    The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

  • args (tuple) – Positional arguments to pass to func in addition to the array/series.

  • **kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns

Result of applying func along the given axis of the DataFrame.

Return type

Series or DataFrame

See also

DataFrame.applymap

For elementwise operations.

DataFrame.aggregate

Only perform aggregating type operations.

DataFrame.transform

Only perform transforming type operations.

Notes

See pandas API documentation for pandas.DataFrame.apply for more. Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as np.sqrt(df)):

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64
>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Returning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2
asfreq(freq, method=None, how=None, normalize=False, fill_value=None)

Convert time series to specified frequency.

Returns the original data conformed to a new index with the specified frequency.

If the index of this DataFrame is a PeriodIndex, the new index is the result of transforming the original index with PeriodIndex.asfreq (so the original index will map one-to-one to the new index).

Otherwise, the new index will be equivalent to pd.date_range(start, end, freq=freq) where start and end are, respectively, the first and last entries in the original index (see pandas.date_range()). The values corresponding to any timesteps in the new index which were not present in the original index will be null (NaN), unless a method for filling such unknowns is provided (see the method parameter below).

The resample() method is more appropriate if an operation on each group of timesteps (such as an aggregate) is necessary to represent the data at the new frequency.

Parameters
  • freq (DateOffset or str) – Frequency DateOffset or string.

  • method ({'backfill'/'bfill', 'pad'/'ffill'}, default None) –

    Method to use for filling holes in reindexed Series (note this does not fill NaNs that already were present):

    • ’pad’ / ‘ffill’: propagate last valid observation forward to next valid

    • ’backfill’ / ‘bfill’: use NEXT valid observation to fill.

  • how ({'start', 'end'}, default end) – For PeriodIndex only (see PeriodIndex.asfreq).

  • normalize (bool, default False) – Whether to reset output index to midnight.

  • fill_value (scalar, optional) – Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).

Returns

DataFrame object reindexed to the specified frequency.

Return type

DataFrame

See also

reindex

Conform DataFrame to new index with optional filling logic.

Notes

See pandas API documentation for pandas.DataFrame.asfreq for more. To learn more about the frequency strings, please see this link.

Examples

Start by creating a series with 4 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s': series})
>>> df
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:01:00    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:03:00    3.0

Upsample the series into 30 second bins.

>>> df.asfreq(freq='30S')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0

Upsample again, providing a fill value.

>>> df.asfreq(freq='30S', fill_value=9.0)
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    9.0
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    9.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    9.0
2000-01-01 00:03:00    3.0

Upsample again, providing a method.

>>> df.asfreq(freq='30S', method='bfill')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    2.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    3.0
2000-01-01 00:03:00    3.0
asof(where, subset=None)

Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame

Parameters
  • where (date or array-like of dates) – Date(s) before which the last row(s) are returned.

  • subset (str or array-like of str, default None) – For DataFrame, if not None, only use these columns to check for NaNs.

Returns

The return can be:

  • scalar : when self is a Series and where is a scalar

  • Series: when self is a Series and where is an array-like, or when self is a DataFrame and where is a scalar

  • DataFrame : when self is a DataFrame and where is an array-like

Return scalar, Series, or DataFrame.

Return type

scalar, Series, or DataFrame

See also

merge_asof

Perform an asof merge. Similar to left join.

Notes

See pandas API documentation for pandas.DataFrame.asof for more. Dates are assumed to be sorted. Raises if this is not the case.

Examples

A Series and a scalar where.

>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64
>>> s.asof(20)
2.0

For a sequence where, a Series is returned. The first value is NaN, because the first element of where is before the first index value.

>>> s.asof([5, 20])
5     NaN
20    2.0
dtype: float64

Missing values are not considered. The following is 2.0, not NaN, even though NaN is at the index location for 30.

>>> s.asof(30)
2.0

Take all columns into consideration

>>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
...                    'b': [None, None, None, None, 500]},
...                   index=pd.DatetimeIndex(['2018-02-27 09:01:00',
...                                           '2018-02-27 09:02:00',
...                                           '2018-02-27 09:03:00',
...                                           '2018-02-27 09:04:00',
...                                           '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']))
                      a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

Take a single column into consideration

>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']),
...         subset=['a'])
                         a   b
2018-02-27 09:03:30   30.0 NaN
2018-02-27 09:04:30   40.0 NaN
astype(dtype, copy=True, errors='raise')

Cast a pandas object to a specified dtype dtype.

Parameters
  • dtype (data type, or dict of column name -> data type) – Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

  • copy (bool, default True) – Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

  • errors ({'raise', 'ignore'}, default 'raise') –

    Control raising of exceptions on invalid data for provided dtype.

    • raise : allow exceptions to be raised

    • ignore : suppress exceptions. On error return original object.

Returns

casted

Return type

same type as caller

See also

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to a numeric type.

numpy.ndarray.astype

Cast a numpy array to a specified type.

Notes

See pandas API documentation for pandas.DataFrame.astype for more. .. deprecated:: 1.3.0

Using astype to convert from timezone-naive dtype to timezone-aware dtype is deprecated and will raise in a future version. Use Series.dt.tz_localize() instead.

Examples

Create a DataFrame:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype('int32').dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({'col1': 'int32'}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
...     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False and changing data on a new pandas object may propagate changes:

>>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]
property at

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises

KeyError – If ‘label’ does not exist in DataFrame.

See also

DataFrame.iat

Access a single value for a row/column pair by integer position.

DataFrame.loc

Access a group of rows and columns by label(s).

Series.at

Access a single value using a label.

Examples

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

Get value at specified row/column pair

>>> df.at[4, 'B']
2

Set value at specified row/column pair

>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10

Get value within a Series

>>> df.loc[5].at['B']
4

Notes

See pandas API documentation for pandas.DataFrame.at for more.

at_time(time, asof=False, axis=None)

Select values at particular time of day (e.g., 9:30AM).

Parameters
  • time (datetime.time or str) –

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

Returns

Return type

Series or DataFrame

Raises

TypeError – If the index is not a DatetimeIndex

See also

between_time

Select values between particular times of the day.

first

Select initial periods of time series based on a date offset.

last

Select final periods of time series based on a date offset.

DatetimeIndex.indexer_at_time

Get just the index locations for values at particular time of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4
>>> ts.at_time('12:00')
                     A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4

Notes

See pandas API documentation for pandas.DataFrame.at_time for more.

backfill(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='bfill'.

Returns

Object with missing values filled or None if inplace=True.

Return type

Series/DataFrame or None

Notes

See pandas API documentation for pandas.DataFrame.backfill for more.

between_time(start_time, end_time, include_start=True, include_end=True, axis=None)

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

Parameters
  • start_time (datetime.time or str) – Initial time as a time filter limit.

  • end_time (datetime.time or str) – End time as a time filter limit.

  • include_start (bool, default True) – Whether the start time needs to be included in the result.

  • include_end (bool, default True) – Whether the end time needs to be included in the result.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Determine range time on index or columns value.

Returns

Data from the original object filtered to the specified dates range.

Return type

Series or DataFrame

Raises

TypeError – If the index is not a DatetimeIndex

See also

at_time

Select values at a particular time of the day.

first

Select initial periods of time series based on a date offset.

last

Select final periods of time series based on a date offset.

DatetimeIndex.indexer_between_time

Get just the index locations for values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4
>>> ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

>>> ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

Notes

See pandas API documentation for pandas.DataFrame.between_time for more.

bfill(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='bfill'.

Returns

Object with missing values filled or None if inplace=True.

Return type

Series/DataFrame or None

Notes

See pandas API documentation for pandas.DataFrame.backfill for more.

bool()

Return the bool of a single element Series or DataFrame.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns

The value in the Series or DataFrame.

Return type

bool

See also

Series.astype

Change the data type of a Series, including to boolean.

DataFrame.astype

Change the data type of a DataFrame, including to boolean.

numpy.bool_

NumPy boolean data type, used by pandas for boolean values.

Examples

The method will only work for single element objects with a boolean value:

>>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False
>>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False

Notes

See pandas API documentation for pandas.DataFrame.bool for more.

combine(other, func, fill_value=None, **kwargs)

Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters
  • other (DataFrame) – The DataFrame to merge column-wise.

  • func (function) – Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.

  • fill_value (scalar value, default None) – The value to fill NaNs with prior to passing any column to the merge func.

  • overwrite (bool, default True) – If True, columns in self that do not exist in other will be overwritten with NaNs.

Returns

Combination of the provided DataFrames.

Return type

DataFrame

See also

DataFrame.combine_first

Combine two DataFrame objects and default to non-null values in frame calling the method.

Examples

Combine using a simple function that chooses the smaller column.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
    A    B
0  0 -5.0
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0
>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

Notes

See pandas API documentation for pandas.DataFrame.combine for more.

combine_first(other)

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters

other (DataFrame) – Provided DataFrame to use to fill null values.

Returns

The result of combining the provided DataFrame with the other object.

Return type

DataFrame

See also

DataFrame.combine

Perform series-wise operation on two DataFrames using a given function.

Examples

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

Notes

See pandas API documentation for pandas.DataFrame.combine_first for more.

convert_dtypes(infer_objects: modin.pandas.base.BasePandasDataset.bool = True, convert_string: modin.pandas.base.BasePandasDataset.bool = True, convert_integer: modin.pandas.base.BasePandasDataset.bool = True, convert_boolean: modin.pandas.base.BasePandasDataset.bool = True, convert_floating: modin.pandas.base.BasePandasDataset.bool = True)

Convert columns to best possible dtypes using dtypes supporting pd.NA.

New in version 1.0.0.

Parameters
  • infer_objects (bool, default True) – Whether object dtypes should be converted to the best possible types.

  • convert_string (bool, default True) – Whether object dtypes should be converted to StringDtype().

  • convert_integer (bool, default True) – Whether, if possible, conversion can be done to integer extension types.

  • convert_boolean (bool, defaults True) – Whether object dtypes should be converted to BooleanDtypes().

  • convert_floating (bool, defaults True) –

    Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

    New in version 1.2.0.

Returns

Copy of input object with new dtype.

Return type

Series or DataFrame

See also

infer_objects

Infer dtypes of objects.

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to a numeric type.

Notes

See pandas API documentation for pandas.DataFrame.convert_dtypes for more. By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the options convert_string, convert_integer, convert_boolean and convert_boolean, it is possible to turn off individual conversions to StringDtype, the integer extension types, BooleanDtype or floating extension types, respectively.

For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object.

If the dtype is integer, convert to an appropriate integer extension type.

If the dtype is numeric, and consists of all integers, convert to an appropriate integer extension type. Otherwise, convert to an appropriate floating extension type.

Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns to the nullable floating extension type.

In the future, as new dtypes are added that support pd.NA, the results of this method will change to support those new dtypes.

Examples

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0
>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: object

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string
copy(deep=True)

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters

deep (bool, default True) – Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns

copy – Object type matches caller.

Return type

Series or DataFrame

Notes

See pandas API documentation for pandas.DataFrame.copy for more. When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

While Index objects are copied when deep=True, the underlying numpy array is not copied for performance reasons. Since Index is immutable, the underlying data can be safely shared and a copy is not needed.

Examples

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares data and index with original.

>>> s is shallow
False
>>> s.values is shallow.values and s.index is shallow.index
True

Deep copy has own copy of data and index.

>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.

>>> s[0] = 3
>>> shallow[1] = 4
>>> s
a    3
b    4
dtype: int64
>>> shallow
a    3
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64

Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.

>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0    [10, 2]
1     [3, 4]
dtype: object
>>> deep
0    [10, 2]
1     [3, 4]
dtype: object
count(axis=0, level=None, numeric_only=False)

Count non-NA cells for each column or row.

The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

  • level (int or str, optional) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.

  • numeric_only (bool, default False) – Include only float, int or boolean data.

Returns

For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.

Return type

Series or DataFrame

See also

Series.count

Number of non-NA elements in a Series.

DataFrame.value_counts

Count unique combinations of columns.

DataFrame.shape

Number of DataFrame rows and columns (including NA elements).

DataFrame.isna

Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

>>> df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64

Notes

See pandas API documentation for pandas.DataFrame.count for more.

cummax(axis=None, skipna=True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Return cumulative maximum of Series or DataFrame.

Return type

Series or DataFrame

See also

core.window.Expanding.max

Similar functionality but ignores NaN values.

DataFrame.max

Return the maximum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

Notes

See pandas API documentation for pandas.DataFrame.cummax for more.

cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Return cumulative minimum of Series or DataFrame.

Return type

Series or DataFrame

See also

core.window.Expanding.min

Similar functionality but ignores NaN values.

DataFrame.min

Return the minimum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

Notes

See pandas API documentation for pandas.DataFrame.cummin for more.

cumprod(axis=None, skipna=True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Return cumulative product of Series or DataFrame.

Return type

Series or DataFrame

See also

core.window.Expanding.prod

Similar functionality but ignores NaN values.

DataFrame.prod

Return the product over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

Notes

See pandas API documentation for pandas.DataFrame.cumprod for more.

cumsum(axis=None, skipna=True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Return cumulative sum of Series or DataFrame.

Return type

Series or DataFrame

See also

core.window.Expanding.sum

Similar functionality but ignores NaN values.

DataFrame.sum

Return the sum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

Notes

See pandas API documentation for pandas.DataFrame.cumsum for more.

describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters
  • percentiles (list-like of numbers, optional) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

  • include ('all', list-like of dtypes or None (default), optional) –

    A white list of data types to include in the result. Ignored for Series. Here are the options:

    • ’all’ : All columns of the input will be included in the output.

    • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

    • None (default) : The result will include all numeric columns.

  • exclude (list-like of dtypes or None (default), optional,) –

    A black list of data types to omit from the result. Ignored for Series. Here are the options:

    • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category'

    • None (default) : The result will exclude nothing.

  • datetime_is_numeric (bool, default False) –

    Whether to treat datetime dtypes as numeric. This affects statistics calculated for the column. For DataFrame input, this also controls whether datetime columns are included by default.

    New in version 1.1.0.

Returns

Summary statistics of the Series or Dataframe provided.

Return type

Series or DataFrame

See also

DataFrame.count

Count number of non-NA/null observations.

DataFrame.max

Maximum of the values in the object.

DataFrame.min

Minimum of the values in the object.

DataFrame.mean

Mean of the values.

DataFrame.std

Standard deviation of the observations.

DataFrame.select_dtypes

Subset of a DataFrame including/excluding columns based on their dtype.

Notes

See pandas API documentation for pandas.DataFrame.describe for more. For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Examples

Describing a numeric Series.

>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

Describing a categorical Series.

>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count     4
unique    3
top       a
freq      2
dtype: object

Describing a timestamp Series.

>>> s = pd.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')  
       categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      a
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])  
       object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])  
       categorical object
count            3      3
unique           3      3
top              f      a
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])  
       categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0
diff(periods=1, axis=0)

First discrete difference of element.

Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).

Parameters
  • periods (int, default 1) – Periods to shift for calculating difference, accepts negative values.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Take difference over rows (0) or columns (1).

Returns

First differences of the Series.

Return type

Dataframe

See also

Dataframe.pct_change

Percent change over given number of periods.

Dataframe.shift

Shift index by desired number of periods with an optional time freq.

Series.diff

First discrete difference of object.

Notes

See pandas API documentation for pandas.DataFrame.diff for more. For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in Dataframe, however dtype of the result is always float64.

Examples

Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36
>>> df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Difference with previous column

>>> df.diff(axis=1)
    a  b   c
0 NaN  0   0
1 NaN -1   3
2 NaN -1   7
3 NaN -1  13
4 NaN  0  20
5 NaN  2  28

Difference with 3rd previous row

>>> df.diff(periods=3)
     a    b     c
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  3.0  2.0  15.0
4  3.0  4.0  21.0
5  3.0  6.0  27.0

Difference with following row

>>> df.diff(periods=-1)
     a    b     c
0 -1.0  0.0  -3.0
1 -1.0 -1.0  -5.0
2 -1.0 -1.0  -7.0
3 -1.0 -2.0  -9.0
4 -1.0 -3.0 -11.0
5  NaN  NaN   NaN

Overflow in input dtype

>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
       a
0    NaN
1  255.0
div(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.truediv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
divide(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.truediv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide <advanced.shown_levels> for more information about the now unused levels.

Parameters
  • labels (single label or list-like) – Index or column labels to drop.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

  • index (single label or list-like) – Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

  • columns (single label or list-like) – Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

  • level (int or level name, optional) – For MultiIndex, level from which the labels will be removed.

  • inplace (bool, default False) – If False, return a copy. Otherwise, do operation inplace and return None.

  • errors ({'ignore', 'raise'}, default 'raise') – If ‘ignore’, suppress error and only existing labels are dropped.

Returns

DataFrame without the removed index or column labels or None if inplace=True.

Return type

DataFrame or None

Raises

KeyError – If any of the labels is not found in the selected axis.

See also

DataFrame.loc

Label-location based indexer for selection by label.

DataFrame.dropna

Return DataFrame with labels on given axis omitted where (all or any) data are missing.

DataFrame.drop_duplicates

Return DataFrame with duplicate rows removed, optionally only considering certain columns.

Series.drop

Return Series with specified index labels removed.

Examples

>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11
>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        weight  1.0     0.8
        length  0.3     0.2
>>> df.drop(index='cow', columns='small')
                big
lama    speed   45.0
        weight  200.0
        length  1.5
falcon  speed   320.0
        weight  1.0
        length  0.3
>>> df.drop(index='length', level=1)
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
cow     speed   30.0    20.0
        weight  250.0   150.0
falcon  speed   320.0   250.0
        weight  1.0     0.8

Notes

See pandas API documentation for pandas.DataFrame.drop for more.

drop_duplicates(keep='first', inplace=False, **kwargs)

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameters
  • subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.

  • keep ({'first', 'last', False}, default 'first') – Determines which duplicates (if any) to keep. - first : Drop duplicates except for the first occurrence. - last : Drop duplicates except for the last occurrence. - False : Drop all duplicates.

  • inplace (bool, default False) – Whether to drop duplicates in place or to return a copy.

  • ignore_index (bool, default False) –

    If True, the resulting axis will be labeled 0, 1, …, n - 1.

    New in version 1.0.0.

Returns

DataFrame with duplicates removed or None if inplace=True.

Return type

DataFrame or None

See also

DataFrame.value_counts

Count unique combinations of columns.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

Notes

See pandas API documentation for pandas.DataFrame.drop_duplicates for more.

droplevel(level, axis=0)

Return Series/DataFrame with requested index / column level(s) removed.

Parameters
  • level (int, str, or list-like) – If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Axis along which the level(s) is removed:

    • 0 or ‘index’: remove level(s) in column.

    • 1 or ‘columns’: remove level(s) in row.

Returns

Series/DataFrame with requested index / column level(s) removed.

Return type

Series/DataFrame

Examples

>>> df = pd.DataFrame([
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
...     ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1   c   d
level_2   e   f
a b
1 2      3   4
5 6      7   8
9 10    11  12
>>> df.droplevel('a')
level_1   c   d
level_2   e   f
b
2        3   4
6        7   8
10      11  12
>>> df.droplevel('level_2', axis=1)
level_1   c   d
a b
1 2      3   4
5 6      7   8
9 10    11  12

Notes

See pandas API documentation for pandas.DataFrame.droplevel for more.

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Determine if rows or columns which contain missing values are removed.

    • 0, or ‘index’ : Drop rows which contain missing values.

    • 1, or ‘columns’ : Drop columns which contain missing value.

    Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

  • how ({'any', 'all'}, default 'any') –

    Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

    • ’any’ : If any NA values are present, drop that row or column.

    • ’all’ : If all values are NA, drop that row or column.

  • thresh (int, optional) – Require that many non-NA values.

  • subset (array-like, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

  • inplace (bool, default False) – If True, do operation inplace and return None.

Returns

DataFrame with NA entries dropped from it or None if inplace=True.

Return type

DataFrame or None

See also

DataFrame.isna

Indicate missing values.

DataFrame.notna

Indicate existing (non-missing) values.

DataFrame.fillna

Replace missing values.

Series.dropna

Drop missing values.

Index.dropna

Drop missing indices.

Examples

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
1  Batman  Batmobile 1940-04-25

Notes

See pandas API documentation for pandas.DataFrame.dropna for more.

eq(other, axis='columns', level=None)

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.eq for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0, times=None)

Provide exponential weighted (EW) functions.

Available EW functions: mean(), var(), std(), corr(), cov().

Exactly one parameter: com, span, halflife, or alpha must be provided.

Parameters
  • com (float, optional) – Specify decay in terms of center of mass, \(\alpha = 1 / (1 + com)\), for \(com \geq 0\).

  • span (float, optional) – Specify decay in terms of span, \(\alpha = 2 / (span + 1)\), for \(span \geq 1\).

  • halflife (float, str, timedelta, optional) –

    Specify decay in terms of half-life, \(\alpha = 1 - \exp\left(-\ln(2) / halflife\right)\), for \(halflife > 0\).

    If times is specified, the time unit (str or timedelta) over which an observation decays to half its value. Only applicable to mean() and halflife value will not apply to the other functions.

    New in version 1.1.0.

  • alpha (float, optional) – Specify smoothing factor \(\alpha\) directly, \(0 < \alpha \leq 1\).

  • min_periods (int, default 0) – Minimum number of observations in window required to have a value (otherwise result is NA).

  • adjust (bool, default True) –

    Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

    • When adjust=True (default), the EW function is calculated using weights \(w_i = (1 - \alpha)^i\). For example, the EW moving average of the series [\(x_0, x_1, ..., x_t\)] would be:

    \[y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... + (1 - \alpha)^t x_0}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... + (1 - \alpha)^t}\]
    • When adjust=False, the exponentially weighted function is calculated recursively:

    \[\begin{split}\begin{split} y_0 &= x_0\\ y_t &= (1 - \alpha) y_{t-1} + \alpha x_t, \end{split}\end{split}\]

  • ignore_na (bool, default False) –

    Ignore missing values when calculating weights; specify True to reproduce pre-0.15.0 behavior.

    • When ignore_na=False (default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) if adjust=True, and \((1-\alpha)^2\) and \(\alpha\) if adjust=False.

    • When ignore_na=True (reproducing pre-0.15.0 behavior), weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) if adjust=True, and \(1-\alpha\) and \(\alpha\) if adjust=False.

  • axis ({0, 1}, default 0) – The axis to use. The value 0 identifies the rows, and 1 identifies the columns.

  • times (str, np.ndarray, Series, default None) –

    New in version 1.1.0.

    Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.

    If str, the name of the column in the DataFrame representing the times.

    If 1-D array like, a sequence with the same shape as the observations.

    Only applicable to mean().

Returns

A Window sub-classed for the particular operation.

Return type

DataFrame

See also

rolling

Provides rolling window calculations.

expanding

Provides expanding transformations.

Notes

See pandas API documentation for pandas.DataFrame.ewm for more.

More details can be found at: Exponentially weighted windows.

Examples

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.ewm(com=0.5).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

Specifying times with a timedelta halflife when computing mean.

>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
          B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686
expanding(min_periods=1, center=None, axis=0, method='single')

Provide expanding transformations.

Parameters
  • min_periods (int, default 1) – Minimum number of observations in window required to have a value (otherwise result is NA).

  • center (bool, default False) – Set the labels at the center of the window.

  • axis (int or str, default 0) –

  • method (str {'single', 'table'}, default 'single') –

    Execute the rolling operation per single column or row ('single') or over the entire object ('table').

    This argument is only implemented when specifying engine='numba' in the method call.

    New in version 1.3.0.

Returns

Return type

a Window sub-classed for the particular operation

See also

rolling

Provides rolling window calculations.

ewm

Provides exponential weighted functions.

Notes

See pandas API documentation for pandas.DataFrame.expanding for more. By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

Examples

>>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.expanding(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  3.0
4  7.0
ffill(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='ffill'.

Returns

Object with missing values filled or None if inplace=True.

Return type

Series/DataFrame or None

Notes

See pandas API documentation for pandas.DataFrame.pad for more.

filter(items=None, like=None, regex=None, axis=None)

Subset the dataframe rows or columns according to the specified index labels.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Parameters
  • items (list-like) – Keep labels from axis which are in items.

  • like (str) – Keep labels from axis for which “like in label == True”.

  • regex (str (regular expression)) – Keep labels from axis for which re.search(regex, label) == True.

  • axis ({0 or ‘index’, 1 or ‘columns’, None}, default None) – The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame.

Returns

Return type

same type as input object

See also

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

Notes

See pandas API documentation for pandas.DataFrame.filter for more. The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

Examples

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6
first(offset)

Select initial periods of time series data based on a date offset.

When having a DataFrame with dates as index, this function can select the first few rows based on a date offset.

Parameters

offset (str, DateOffset or dateutil.relativedelta) – The offset length of the data that will be selected. For instance, ‘1M’ will display all the rows having their index within the first month.

Returns

A subset of the caller.

Return type

Series or DataFrame

Raises

TypeError – If the index is not a DatetimeIndex

See also

last

Select final periods of time series based on a date offset.

at_time

Select values at a particular time of the day.

between_time

Select values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the first 3 days:

>>> ts.first('3D')
            A
2018-04-09  1
2018-04-11  2

Notice the data for 3 first calendar days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

Notes

See pandas API documentation for pandas.DataFrame.first for more.

first_valid_index()

Return index for first non-NA value or None, if no NA value is found.

Returns

scalar

Return type

type of index

Notes

See pandas API documentation for pandas.DataFrame.first_valid_index for more. If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

property flags

Get the properties associated with this pandas object.

The available flags are

  • Flags.allows_duplicate_labels

See also

Flags

Flags that apply to pandas objects.

DataFrame.attrs

Global metadata applying to this dataset.

Notes

See pandas API documentation for pandas.DataFrame.flags for more. “Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
floordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.floordiv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
ge(other, axis='columns', level=None)

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.ge for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
get(key, default=None)

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters

key (object) –

Returns

value

Return type

same type as items contained in object

Notes

See pandas API documentation for pandas.DataFrame.get for more.

gt(other, axis='columns', level=None)

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.gt for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
head(n=5)

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters

n (int, default 5) – Number of rows to select.

Returns

The first n rows of the caller object.

Return type

same type as caller

See also

DataFrame.tail

Returns the last n rows.

Examples

>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot

Notes

See pandas API documentation for pandas.DataFrame.head for more.

property iat

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises

IndexError – When integer position is out of bounds.

See also

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.loc

Access a group of rows and columns by label(s).

DataFrame.iloc

Access a group of rows and columns by integer position(s).

Examples

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

>>> df.iat[1, 2]
1

Set value at specified row/column pair

>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

>>> df.loc[0].iat[1]
2

Notes

See pandas API documentation for pandas.DataFrame.iat for more.

idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns

Indexes of maxima along the specified axis.

Return type

Series

Raises

ValueError

  • If the row/column is empty

See also

Series.idxmax

Return index of the maximum element.

Notes

See pandas API documentation for pandas.DataFrame.idxmax for more. This method is the DataFrame version of ndarray.argmax.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the maximum value in each column.

>>> df.idxmax()
consumption     Wheat Products
co2_emissions             Beef
dtype: object

To return the index for the maximum value in each row, use axis="columns".

>>> df.idxmax(axis="columns")
Pork              co2_emissions
Wheat Products     consumption
Beef              co2_emissions
dtype: object
idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns

Indexes of minima along the specified axis.

Return type

Series

Raises

ValueError

  • If the row/column is empty

See also

Series.idxmin

Return index of the minimum element.

Notes

See pandas API documentation for pandas.DataFrame.idxmin for more. This method is the DataFrame version of ndarray.argmin.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the minimum value in each column.

>>> df.idxmin()
consumption                Pork
co2_emissions    Wheat Products
dtype: object

To return the index for the minimum value in each row, use axis="columns".

>>> df.idxmin(axis="columns")
Pork                consumption
Wheat Products    co2_emissions
Beef                consumption
dtype: object
property iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

  • An integer, e.g. 5.

  • A list or array of integers, e.g. [4, 3, 0].

  • A slice object with ints, e.g. 1:7.

  • A boolean array.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

See also

DataFrame.iat

Fast integer location scalar accessor.

DataFrame.loc

Purely label-location based indexer for selection by label.

Series.iloc

Purely integer-location based indexing for selection by position.

Examples

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

With a scalar integer.

>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a slice object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

>>> df.iloc[[True, False, True]]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

>>> df.iloc[0, 1]
2

With lists of integers.

>>> df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

With slice objects.

>>> df.iloc[1:3, 0:3]
      a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

>>> df.iloc[:, lambda df: [0, 2]]
      a     c
0     1     3
1   100   300
2  1000  3000

Notes

See pandas API documentation for pandas.DataFrame.iloc for more.

property index

Get the index for this DataFrame.

Returns

The union of all indexes across the partitions.

Return type

pandas.Index

infer_objects()

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Returns

converted

Return type

same type as input object

See also

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to numeric type.

convert_dtypes

Convert argument to best possible dtype.

Examples

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3
>>> df.dtypes
A    object
dtype: object
>>> df.infer_objects().dtypes
A    int64
dtype: object

Notes

See pandas API documentation for pandas.DataFrame.infer_objects for more.

isin(values)

Whether each element in the DataFrame is contained in values.

Parameters

values (iterable, Series, DataFrame or dict) – The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

Returns

DataFrame of booleans showing whether each element in the DataFrame is contained in values.

Return type

DataFrame

See also

DataFrame.eq

Equality test for DataFrame.

Series.isin

Equivalent method on Series.

Series.str.contains

Test if pattern or regex is contained within a string of a Series or Index.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in df2.

>>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon      True       True
dog        False      False

Notes

See pandas API documentation for pandas.DataFrame.isin for more.

isna()

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

Return type

DataFrame

See also

DataFrame.isnull

Alias of isna.

DataFrame.notna

Boolean inverse of isna.

DataFrame.dropna

Omit axes labels with missing values.

isna

Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool

Notes

See pandas API documentation for pandas.DataFrame.isna for more.

isnull()

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

Return type

DataFrame

See also

DataFrame.isnull

Alias of isna.

DataFrame.notna

Boolean inverse of isna.

DataFrame.dropna

Omit axes labels with missing values.

isna

Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool

Notes

See pandas API documentation for pandas.DataFrame.isna for more.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.kurt for more.

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.kurt for more.

last(offset)

Select final periods of time series data based on a date offset.

For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.

Parameters

offset (str, DateOffset, dateutil.relativedelta) – The offset length of the data that will be selected. For instance, ‘3D’ will display all the rows having their index within the last 3 days.

Returns

A subset of the caller.

Return type

Series or DataFrame

Raises

TypeError – If the index is not a DatetimeIndex

See also

first

Select initial periods of time series based on a date offset.

at_time

Select values at a particular time of the day.

between_time

Select values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the last 3 days:

>>> ts.last('3D')
            A
2018-04-13  3
2018-04-15  4

Notice the data for 3 last calendar days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

Notes

See pandas API documentation for pandas.DataFrame.last for more.

last_valid_index()

Return index for last non-NA value or None, if no NA value is found.

Returns

scalar

Return type

type of index

Notes

See pandas API documentation for pandas.DataFrame.last_valid_index for more. If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

le(other, axis='columns', level=None)

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.le for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
property loc

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

    Warning

    Note that contrary to usual python slices, both the start and the stop are included

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • An alignable boolean Series. The index of the key will be aligned before masking.

  • An alignable Index. The Index of the returned selection will be the input.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises
  • KeyError – If any items are not found.

  • IndexingError – If an indexed key is passed and its index is unalignable to the frame index.

See also

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.iloc

Access group of rows and columns by integer position(s).

DataFrame.xs

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

Series.loc

Access group of values using labels.

Examples

Getting values

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame.

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Alignable boolean Series:

>>> df.loc[pd.Series([False, True, False],
...        index=['viper', 'sidewinder', 'cobra'])]
            max_speed  shield
sidewinder          7       8

Index (same behavior as df.reindex)

>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
       max_speed  shield
foo
cobra          1       2
viper          4       5

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

Setting values

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

Getting values on a DataFrame with an index that has integer labels

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

Getting values with a MultiIndex

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using [[]] returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

Notes

See pandas API documentation for pandas.DataFrame.loc for more.

lt(other, axis='columns', level=None)

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.lt for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
mad(axis=None, skipna=None, level=None)

Return the mean absolute deviation of the values over the requested axis.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default None) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.mad for more.

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.max()
8

Notes

See pandas API documentation for pandas.DataFrame.max for more.

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values over the requested axis.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.mean for more.

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values over the requested axis.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.median for more.

memory_usage(index=True, deep=False)

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Parameters
  • index (bool, default True) – Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.

  • deep (bool, default False) – If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Returns

A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

Return type

Series

See also

numpy.ndarray.nbytes

Total bytes consumed by the elements of an ndarray.

Series.memory_usage

Bytes consumed by a Series.

Categorical

Memory-efficient array for string values with many repeated values.

DataFrame.info

Concise summary of a DataFrame.

Examples

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True
>>> df.memory_usage()
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5244

Notes

See pandas API documentation for pandas.DataFrame.memory_usage for more.

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters
  • axis ({index (0), columns (1)}) – Axis for the function to be applied on.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns

Return type

Series or DataFrame (if level specified)

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.min()
0

Notes

See pandas API documentation for pandas.DataFrame.min for more.

mod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.mod for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
mode(axis=0, numeric_only=False, dropna=True)

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    The axis to iterate over while searching for the mode:

    • 0 or ‘index’ : get mode of each column

    • 1 or ‘columns’ : get mode of each row.

  • numeric_only (bool, default False) – If True, only apply to numeric columns.

  • dropna (bool, default True) – Don’t consider counts of NaN/NaT.

Returns

The modes of each column or row.

Return type

DataFrame

See also

Series.mode

Return the highest frequency value in a Series.

Series.value_counts

Return the counts of values in a Series.

Examples

>>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
           0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN

Notes

See pandas API documentation for pandas.DataFrame.mode for more.

mul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.mul for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
multiply(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.mul for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
ne(other, axis='columns', level=None)

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Result of the comparison.

Return type

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

See pandas API documentation for pandas.DataFrame.ne for more. Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
notna()

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

Return type

DataFrame

See also

DataFrame.notnull

Alias of notna.

DataFrame.isna

Boolean inverse of notna.

DataFrame.dropna

Omit axes labels with missing values.

notna

Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool

Notes

See pandas API documentation for pandas.DataFrame.notna for more.

notnull()

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

Return type

DataFrame

See also

DataFrame.notnull

Alias of notna.

DataFrame.isna

Boolean inverse of notna.

DataFrame.dropna

Omit axes labels with missing values.

notna

Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool

Notes

See pandas API documentation for pandas.DataFrame.notna for more.

nunique(axis=0, dropna=True)

Count number of distinct elements in specified axis.

Return Series with number of distinct elements. Can ignore NaN values.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • dropna (bool, default True) – Don’t include NaN in the counts.

Returns

Return type

Series

See also

Series.nunique

Method nunique for Series.

DataFrame.count

Count non-NA cells for each column or row.

Examples

>>> df = pd.DataFrame({'A': [4, 5, 6], 'B': [4, 1, 1]})
>>> df.nunique()
A    3
B    2
dtype: int64
>>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64

Notes

See pandas API documentation for pandas.DataFrame.nunique for more.

pad(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='ffill'.

Returns

Object with missing values filled or None if inplace=True.

Return type

Series/DataFrame or None

Notes

See pandas API documentation for pandas.DataFrame.pad for more.

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)

Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

Parameters
  • periods (int, default 1) – Periods to shift for forming percent change.

  • fill_method (str, default 'pad') – How to handle NAs before computing percent changes.

  • limit (int, default None) – The number of consecutive NAs to fill before stopping.

  • freq (DateOffset, timedelta, or str, optional) – Increment to use from time series API (e.g. ‘M’ or BDay()).

  • **kwargs – Additional keyword arguments are passed into DataFrame.shift or Series.shift.

Returns

chg – The same type as the calling object.

Return type

Series or DataFrame

See also

Series.diff

Compute the difference of two elements in a Series.

DataFrame.diff

Compute the difference of two elements in a DataFrame.

Series.shift

Shift the index by some number of periods.

DataFrame.shift

Shift the index by some number of periods.

Examples

Series

>>> s = pd.Series([90, 91, 85])
>>> s
0    90
1    91
2    85
dtype: int64
>>> s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
dtype: float64

See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

>>> s = pd.Series([90, 91, None, 85])
>>> s
0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64
>>> s.pct_change(fill_method='ffill')
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

DataFrame

Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

>>> df = pd.DataFrame({
...     'FR': [4.0405, 4.0963, 4.3149],
...     'GR': [1.7246, 1.7482, 1.8519],
...     'IT': [804.74, 810.01, 860.13]},
...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
                FR      GR      IT
1980-01-01  4.0405  1.7246  804.74
1980-02-01  4.0963  1.7482  810.01
1980-03-01  4.3149  1.8519  860.13
>>> df.pct_change()
                  FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876

Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

>>> df = pd.DataFrame({
...     '2016': [1769950, 30586265],
...     '2015': [1500923, 40912316],
...     '2014': [1371819, 41403351]},
...     index=['GOOG', 'APPL'])
>>> df
          2016      2015      2014
GOOG   1769950   1500923   1371819
APPL  30586265  40912316  41403351
>>> df.pct_change(axis='columns', periods=-1)
          2016      2015  2014
GOOG  0.179241  0.094112   NaN
APPL -0.252395 -0.011860   NaN

Notes

See pandas API documentation for pandas.DataFrame.pct_change for more.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
  • func (function) – Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

  • args (iterable, optional) – Positional arguments passed into func.

  • kwargs (mapping, optional) – A dictionary of keyword arguments passed into func.

Returns

object

Return type

the return type of func.

See also

DataFrame.apply

Apply a function along input axis of DataFrame.

DataFrame.applymap

Apply a function elementwise on a whole DataFrame.

Series.map

Apply a mapping correspondence on a Series.

Notes

See pandas API documentation for pandas.DataFrame.pipe for more. Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)  

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )  

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )  
pop(item)

Return item and drop from frame. Raise KeyError if not found.

Parameters

item (label) – Label of column to be popped.

Returns

Return type

Series

Examples

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN
>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object
>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

Notes

See pandas API documentation for pandas.DataFrame.pop for more.

pow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.pow for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')

Return values at the given quantile over requested axis.

Parameters
  • q (float or array-like, default 0.5 (50% quantile)) – Value between 0 <= q <= 1, the quantile(s) to compute.

  • axis ({0, 1, 'index', 'columns'}, default 0) – Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • numeric_only (bool, default True) – If False, the quantile of datetime and timedelta data will be computed as well.

  • interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –

    This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

    • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

    • lower: i.

    • higher: j.

    • nearest: i or j whichever is nearest.

    • midpoint: (i + j) / 2.

Returns

If q is an array, a DataFrame will be returned where the

index is q, the columns are the columns of self, and the values are the quantiles.

If q is a float, a Series will be returned where the

index is the columns of self and the values are the quantiles.

Return type

Series or DataFrame

See also

core.window.Rolling.quantile

Rolling quantile.

numpy.percentile

Numpy function to compute the percentile.

Examples

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object

Notes

See pandas API documentation for pandas.DataFrame.quantile for more.

radd(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.add for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Index to direct ranking.

  • method ({'average', 'min', 'max', 'first', 'dense'}, default 'average') –

    How to rank the group of records that have the same value (i.e. ties):

    • average: average rank of the group

    • min: lowest rank in the group

    • max: highest rank in the group

    • first: ranks assigned in order they appear in the array

    • dense: like ‘min’, but rank always increases by 1 between groups.

  • numeric_only (bool, optional) – For DataFrame objects, rank only numeric columns if set to True.

  • na_option ({'keep', 'top', 'bottom'}, default 'keep') –

    How to rank NaN values:

    • keep: assign NaN rank to NaN values

    • top: assign lowest rank to NaN values

    • bottom: assign highest rank to NaN values

  • ascending (bool, default True) – Whether or not the elements should be ranked in ascending order.

  • pct (bool, default False) – Whether or not to display the returned rankings in percentile form.

Returns

Return a Series or DataFrame with data ranks as values.

Return type

same type as caller

See also

core.groupby.GroupBy.rank

Rank of values within each group.

Examples

>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
...                                    'spider', 'snake'],
...                         'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
    Animal  Number_legs
0      cat          4.0
1  penguin          2.0
2      dog          4.0
3   spider          8.0
4    snake          NaN

The following example shows how the method behaves with the above parameters:

  • default_rank: this is the default behaviour obtained without using any parameter.

  • max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)

  • NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.

  • pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
    Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN

Notes

See pandas API documentation for pandas.DataFrame.rank for more.

rdiv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rtruediv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
reindex(index=None, columns=None, copy=True, **kwargs)

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters
  • axes (keywords for) – New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.

  • method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –

    Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

    • None (default): don’t fill gaps

    • pad / ffill: Propagate last valid observation forward to next valid.

    • backfill / bfill: Use next valid observation to fill gap.

    • nearest: Use nearest valid observations to fill gap.

  • copy (bool, default True) – Return a new object, even if the passed indexes are the same.

  • level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (scalar, default np.NaN) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

  • limit (int, default None) – Maximum number of consecutive elements to forward or backward fill.

  • tolerance (optional) –

    Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

    Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns

Return type

Series/DataFrame with changed index.

See also

DataFrame.set_index

Set row labels.

DataFrame.reset_index

Remove row labels or move them to new columns.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)

  • (labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02
>>> df.reindex(new_index, fill_value='missing')
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

Notes

See pandas API documentation for pandas.DataFrame.reindex for more.

reindex_like(other, method=None, copy=True, limit=None, tolerance=None)

Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters
  • other (Object of the same data type) – Its row and column indices are used to define the new indices of this object.

  • method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –

    Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

    • None (default): don’t fill gaps

    • pad / ffill: propagate last valid observation forward to next valid

    • backfill / bfill: use next valid observation to fill gap

    • nearest: use nearest valid observations to fill gap.

  • copy (bool, default True) – Return a new object, even if the passed indexes are the same.

  • limit (int, default None) – Maximum number of consecutive labels to fill for inexact matches.

  • tolerance (optional) –

    Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

    Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns

Same type as caller, but with changed indices on each axis.

Return type

Series or DataFrame

See also

DataFrame.set_index

Set row labels.

DataFrame.reset_index

Remove row labels or move them to new columns.

DataFrame.reindex

Change to new indices or expand indices.

Notes

See pandas API documentation for pandas.DataFrame.reindex_like for more. Same as calling .reindex(index=other.index, columns=other.columns,...).

Examples

>>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
...                     [31, 87.8, 'high'],
...                     [22, 71.6, 'medium'],
...                     [35, 95, 'medium']],
...                    columns=['temp_celsius', 'temp_fahrenheit',
...                             'windspeed'],
...                    index=pd.date_range(start='2014-02-12',
...                                        end='2014-02-15', freq='D'))
>>> df1
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium
>>> df2 = pd.DataFrame([[28, 'low'],
...                     [30, 'low'],
...                     [35.1, 'medium']],
...                    columns=['temp_celsius', 'windspeed'],
...                    index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
...                                            '2014-02-15']))
>>> df2
            temp_celsius windspeed
2014-02-12          28.0       low
2014-02-13          30.0       low
2014-02-15          35.1    medium
>>> df2.reindex_like(df1)
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          28.0              NaN       low
2014-02-13          30.0              NaN       low
2014-02-14           NaN              NaN       NaN
2014-02-15          35.1              NaN    medium
rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)

Set the name of the axis for the index or columns.

Parameters
  • mapper (scalar, list-like, optional) – Value to set the axis name attribute.

  • index (scalar, list-like, dict-like or function, optional) –

    A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

    Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

  • columns (scalar, list-like, dict-like or function, optional) –

    A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

    Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to rename.

  • copy (bool, default True) – Also copy underlying data.

  • inplace (bool, default False) – Modifies the object directly, instead of creating a new Series or DataFrame.

Returns

The same type as the caller or None if inplace=True.

Return type

Series, DataFrame, or None

See also

Series.rename

Alter Series index labels or name.

DataFrame.rename

Alter DataFrame index labels or name.

Index.rename

Set new names on index.

Notes

See pandas API documentation for pandas.DataFrame.rename_axis for more. DataFrame.rename_axis supports two calling conventions

  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.

Examples

Series

>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0       dog
1       cat
2    monkey
dtype: object
>>> s.rename_axis("animal")
animal
0    dog
1    cat
2    monkey
dtype: object

DataFrame

>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
...                    "num_arms": [0, 0, 2]},
...                   ["dog", "cat", "monkey"])
>>> df
        num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("animal")
>>> df
        num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs   num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2

MultiIndex

>>> df.index = pd.MultiIndex.from_product([['mammal'],
...                                        ['dog', 'cat', 'monkey']],
...                                       names=['type', 'name'])
>>> df
limbs          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(index={'type': 'class'})
limbs          num_legs  num_arms
class  name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(columns=str.upper)
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels.

Parameters
  • order (list of int or list of str) – List representing new level order. Reference level by number (position) or by key (label).

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Where to reorder levels.

Returns

Return type

DataFrame

Notes

See pandas API documentation for pandas.DataFrame.reorder_levels for more.

resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base: Optional[int] = None, on=None, level=None, origin: Union[str, Timestamp, datetime.datetime, numpy.datetime64, int, numpy.int64, float] = 'start_day', offset: Optional[Union[Timedelta, datetime.timedelta, numpy.timedelta64, int, numpy.int64, float, str]] = None)

Resample time-series data.

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or the caller must pass the label of a datetime-like series/index to the on/level keyword parameter.

Parameters
  • rule (DateOffset, Timedelta or str) – The offset string or object representing target conversion.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Which axis to use for up- or down-sampling. For Series this will default to 0, i.e. along the rows. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex.

  • closed ({'right', 'left'}, default None) – Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

  • label ({'right', 'left'}, default None) – Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

  • convention ({'start', 'end', 's', 'e'}, default 'start') – For PeriodIndex only, controls whether to use the start or end of rule.

  • kind ({'timestamp', 'period'}, optional, default None) – Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.

  • loffset (timedelta, default None) –

    Adjust the resampled time labels.

    Deprecated since version 1.1.0: You should add the loffset to the df.index after the resample. See below.

  • base (int, default 0) –

    For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.

    Deprecated since version 1.1.0: The new arguments that you should use are ‘offset’ or ‘origin’.

  • on (str, optional) – For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

  • level (str or int, optional) – For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.

  • origin ({'epoch', 'start', 'start_day', 'end', 'end_day'}, Timestamp) –

    or str, default ‘start_day’ The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If a timestamp is not used, these values are also supported:

    • ’epoch’: origin is 1970-01-01

    • ’start’: origin is the first value of the timeseries

    • ’start_day’: origin is the first day at midnight of the timeseries

    New in version 1.1.0.

    • ’end’: origin is the last value of the timeseries

    • ’end_day’: origin is the ceiling midnight of the last day

    New in version 1.3.0.

  • offset (Timedelta or str, default is None) –

    An offset timedelta added to the origin.

    New in version 1.1.0.

Returns

Resampler object.

Return type

pandas.core.Resampler

See also

Series.resample

Resample a Series.

DataFrame.resample

Resample a DataFrame.

groupby

Group DataFrame by mapping, function, label, or list of labels.

asfreq

Reindex a DataFrame with the given frequency without grouping.

Notes

See pandas API documentation for pandas.DataFrame.resample for more. See the user guide for more.

To learn more about the offset strings, please see this link.

Examples

Start by creating a series with 9 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.

>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left. Please note that the value in the bucket used as the label is not included in the bucket, which it labels. For example, in the original series the bucket 2000-01-01 00:03:00 contains the value 3, but the summed value in the resampled bucket with the label 2000-01-01 00:03:00 does not include 3 (if it did, the summed value would be 6, not 3). To include this value close the right side of the bin interval as illustrated in the example below this one.

>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but close the right side of the bin interval.

>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int64

Upsample the series into 30 second bins.

>>> series.resample('30S').asfreq()[0:5]   # Select first 5 rows
2000-01-01 00:00:00   0.0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00   1.0
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00   2.0
Freq: 30S, dtype: float64

Upsample the series into 30 second bins and fill the NaN values using the pad method.

>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Upsample the series into 30 second bins and fill the NaN values using the bfill method.

>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Pass a custom function via apply

>>> def custom_resampler(arraylike):
...     return np.sum(arraylike) + 5
...
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.

Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period.

>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
...                                             freq='A',
...                                             periods=2))
>>> s
2012    1
2013    2
Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq()
2012Q1    1.0
2012Q2    NaN
2012Q3    NaN
2012Q4    NaN
2013Q1    2.0
2013Q2    NaN
2013Q3    NaN
2013Q4    NaN
Freq: Q-DEC, dtype: float64

Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period.

>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
...                                                   freq='Q',
...                                                   periods=4))
>>> q
2018Q1    1
2018Q2    2
2018Q3    3
2018Q4    4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03    1.0
2018-04    NaN
2018-05    NaN
2018-06    2.0
2018-07    NaN
2018-08    NaN
2018-09    3.0
2018-10    NaN
2018-11    NaN
2018-12    4.0
Freq: M, dtype: float64

For DataFrame objects, the keyword on can be used to specify the column instead of the index for resampling.

>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
...                                     periods=8,
...                                     freq='W')
>>> df
   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
>>> df.resample('M', on='week_starting').mean()
               price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place.

>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...       'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(
...     d2,
...     index=pd.MultiIndex.from_product(
...         [days, ['morning', 'afternoon']]
...     )
... )
>>> df2
                      price  volume
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50
>>> df2.resample('D', level=0).sum()
            price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

If you want to adjust the start of the bins based on a fixed timestamp:

>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, dtype: int64
>>> ts.resample('17min').sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

>>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

If you want to take the largest Timestamp as the end of the bins:

>>> ts.resample('17min', origin='end').sum()
2000-10-01 23:35:00     0
2000-10-01 23:52:00    18
2000-10-02 00:09:00    27
2000-10-02 00:26:00    63
Freq: 17T, dtype: int64

In contrast with the start_day, you can use end_day to take the ceiling midnight of the largest Timestamp as the end of the bins and drop the bins not containing data:

>>> ts.resample('17min', origin='end_day').sum()
2000-10-01 23:38:00     3
2000-10-01 23:55:00    15
2000-10-02 00:12:00    45
2000-10-02 00:29:00    45
Freq: 17T, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

>>> ts.resample('17min', offset='2min').sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17T, dtype: int64

To replace the use of the deprecated loffset argument:

>>> from pandas.tseries.frequencies import to_offset
>>> loffset = '19min'
>>> ts_out = ts.resample('17min').sum()
>>> ts_out.index = ts_out.index + to_offset(loffset)
>>> ts_out
2000-10-01 23:33:00     0
2000-10-01 23:50:00     9
2000-10-02 00:07:00    21
2000-10-02 00:24:00    54
2000-10-02 00:41:00    24
Freq: 17T, dtype: int64
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters
  • level (int, str, tuple, or list, default None) – Only remove the given levels from the index. Removes all levels by default.

  • drop (bool, default False) – Do not try to insert index into dataframe columns. This resets the index to the default integer index.

  • inplace (bool, default False) – Modify the DataFrame in place (do not create a new object).

  • col_level (int or str, default 0) – If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.

  • col_fill (object, default '') – If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

Returns

DataFrame with the new index or None if inplace=True.

Return type

DataFrame or None

See also

DataFrame.set_index

Opposite of reset_index.

DataFrame.reindex

Change to new indices or expand indices.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

>>> df = pd.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

When we reset the index, the old index is added as a column, and a new sequential index is used:

>>> df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

We can use the drop parameter to avoid the old index being added as a column:

>>> df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

You can also use reset_index with MultiIndex.

>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
...                                      ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
...                    ( 24.0, 'fly'),
...                    ( 80.5, 'run'),
...                    (np.nan, 'jump')],
...                   index=index,
...                   columns=columns)
>>> df
               speed species
                 max    type
class  name
bird   falcon  389.0     fly
       parrot   24.0     fly
mammal lion     80.5     run
       monkey    NaN    jump

If the index has multiple levels, we can reset a subset of them:

>>> df.reset_index(level='class')
         class  speed species
                  max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

If we are not dropping the index, by default, it is placed in the top level. We can place it in another level:

>>> df.reset_index(level='class', col_level=1)
                speed species
         class    max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

When the index is inserted under another level, we can specify under which one with the parameter col_fill:

>>> df.reset_index(level='class', col_level=1, col_fill='species')
              species  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

If we specify a nonexistent level for col_fill, it is created:

>>> df.reset_index(level='class', col_level=1, col_fill='genus')
                genus  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

Notes

See pandas API documentation for pandas.DataFrame.reset_index for more.

rfloordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rfloordiv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rmod for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.mul for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')

Provide rolling window calculations.

Parameters
  • window (int, offset, or BaseIndexer subclass) –

    Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

    If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.

    If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, and closed will be passed to get_window_bounds.

  • min_periods (int, default None) – Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

  • center (bool, default False) – Set the labels at the center of the window.

  • win_type (str, default None) – Provide a window type. If None, all points are evenly weighted. See the notes below for further information.

  • on (str, optional) – For a DataFrame, a datetime-like column or Index level on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

  • axis (int or str, default 0) –

  • closed (str, default None) –

    Make the interval closed on the ‘right’, ‘left’, ‘both’ or ‘neither’ endpoints. Defaults to ‘right’.

    Changed in version 1.2.0: The closed parameter with fixed windows is now supported.

  • method (str {'single', 'table'}, default 'single') –

    Execute the rolling operation per single column or row ('single') or over the entire object ('table').

    This argument is only implemented when specifying engine='numba' in the method call.

    New in version 1.3.0.

Returns

Return type

a Window or Rolling sub-classed for the particular operation

See also

expanding

Provides expanding transformations.

ewm

Provides exponential weighted functions.

Notes

See pandas API documentation for pandas.DataFrame.rolling for more. By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

To learn more about the offsets & frequency strings, please see this link.

If win_type=None, all points are evenly weighted; otherwise, win_type can accept a string of any scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature. Please see the third example below on how to add the additional parameters.

Examples

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Rolling sum with a window length of 2, using the ‘triang’ window type.

>>> df.rolling(2, win_type='triang').sum()
     B
0  NaN
1  0.5
2  1.5
3  NaN
4  NaN

Rolling sum with a window length of 2, using the ‘gaussian’ window type (note how we need to specify std).

>>> df.rolling(2, win_type='gaussian').sum(std=3)
          B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN

Rolling sum with a window length of 2, min_periods defaults to the window length.

>>> df.rolling(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN

Same as above, but explicitly set the min_periods

>>> df.rolling(2, min_periods=1).sum()
     B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0

Same as above, but with forward-looking windows

>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
     B
0  1.0
1  3.0
2  2.0
3  4.0
4  4.0

A ragged (meaning not-a-regular frequency), time-indexed DataFrame

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
...                   index = [pd.Timestamp('20130101 09:00:00'),
...                            pd.Timestamp('20130101 09:00:02'),
...                            pd.Timestamp('20130101 09:00:03'),
...                            pd.Timestamp('20130101 09:00:05'),
...                            pd.Timestamp('20130101 09:00:06')])
>>> df
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

Contrasting to an integer rolling window, this will roll a variable length window corresponding to the time period. The default for min_periods is 1.

>>> df.rolling('2s').sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
round(decimals=0, *args, **kwargs)

Round a DataFrame to a variable number of decimal places.

Parameters
  • decimals (int, dict, Series) – Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

  • *args – Additional keywords have no effect but might be accepted for compatibility with numpy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with numpy.

Returns

A DataFrame with the affected columns rounded to the specified number of decimal places.

Return type

DataFrame

See also

numpy.around

Round a numpy array to the given number of decimals.

Series.round

Round a Series to the given number of decimals.

Examples

>>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...                   columns=['dogs', 'cats'])
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Notes

See pandas API documentation for pandas.DataFrame.round for more.

rpow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rpow for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rsub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rsub for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rtruediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

Result of the arithmetic operation.

Return type

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

See pandas API documentation for pandas.DataFrame.rtruediv for more. Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
  • n (int, optional) – Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

  • frac (float, optional) – Fraction of axis items to return. Cannot be used with n.

  • replace (bool, default False) – Allow or disallow sampling of the same row more than once.

  • weights (str or ndarray-like, optional) – Default ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

  • random_state (int, array-like, BitGenerator, np.random.RandomState, optional) –

    If int, array-like, or BitGenerator (NumPy>=1.17), seed for random number generator If np.random.RandomState, use as numpy RandomState object.

    Changed in version 1.1.0: array-like and BitGenerator (for NumPy>=1.17) object now passed to np.random.RandomState() as seed

  • axis ({0 or ‘index’, 1 or ‘columns’, None}, default None) – Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames).

  • ignore_index (bool, default False) –

    If True, the resulting index will be labeled 0, 1, …, n - 1.

    New in version 1.3.0.

Returns

A new object of same type as caller containing n items randomly sampled from the caller object.

Return type

Series or DataFrame

See also

DataFrameGroupBy.sample

Generates random samples from each group of a DataFrame object.

SeriesGroupBy.sample

Generates random samples from each group of a Series object.

numpy.random.choice

Generates a random sample from a given 1-D numpy array.

Notes

See pandas API documentation for pandas.DataFrame.sample for more. If frac > 1, replacement should be set to True.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the Series df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the DataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8
sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters
  • axis ({index (0), columns (1)}) –

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

  • ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

  • numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns

Return type

Series or DataFrame (if level specified)

Notes

See pandas API documentation for pandas.DataFrame.sem for more. To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

set_axis(labels, axis=0, inplace=False)

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters
  • labels (list-like, Index) – The values for the new index.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to update. The value 0 identifies the rows, and 1 identifies the columns.

  • inplace (bool, default False) – Whether to return a new DataFrame instance.

Returns

renamed – An object of type DataFrame or None if inplace=True.

Return type

DataFrame or None

See also

DataFrame.rename_axis

Alter the name of the index or columns. Examples ——– >>> df = pd.DataFrame({“A”: [1, 2, 3], “B”: [4, 5, 6]}) Change the row labels. >>> df.set_axis([‘a’, ‘b’, ‘c’], axis=’index’) A B a 1 4 b 2 5 c 3 6 Change the column labels. >>> df.set_axis([‘I’, ‘II’], axis=’columns’) I II 0 1 4 1 2 5 2 3 6 Now, update the labels inplace. >>> df.set_axis([‘i’, ‘ii’], axis=’columns’, inplace=True) >>> df i ii 0 1 4 1 2 5 2 3 6

Notes

See pandas API documentation for pandas.DataFrame.set_axis for more.

set_flags(*, copy: modin.pandas.base.BasePandasDataset.bool = False, allows_duplicate_labels: Optional[modin.pandas.base.BasePandasDataset.bool] = None)

Return a new object with updated flags.

Parameters

allows_duplicate_labels (bool, optional) – Whether the returned object allows duplicate labels.

Returns

The same type as the caller.

Return type

Series or DataFrame

See also

DataFrame.attrs

Global metadata applying to this dataset.

DataFrame.flags

Global flags applying to this object.

Notes

See pandas API documentation for pandas.DataFrame.set_flags for more. This method returns a new object that’s a view on the same data as the input. Mutating the input or the output values will be reflected in the other.

This method is intended to be used in method chains.

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
shift(periods=1, freq=None, axis=0, fill_value=NoDefault.no_default)

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realigning the data. If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. freq can be inferred when specified as “infer” as long as either freq or inferred_freq attribute is set in the index.

Parameters
  • periods (int) – Number of periods to shift. Can be positive or negative.

  • freq (DateOffset, tseries.offsets, timedelta, or str, optional) – Offset to use from the tseries module or time rule (e.g. ‘EOM’). If freq is specified then the index values are shifted but the data is not realigned. That is, use freq if you would like to extend the index when shifting and preserve the original data. If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.

  • axis ({0 or 'index', 1 or 'columns', None}, default None) – Shift direction.

  • fill_value (object, optional) –

    The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, np.nan is used. For datetime, timedelta, or period data, etc. NaT is used. For extension dtypes, self.dtype.na_value is used.

    Changed in version 1.1.0.

Returns

Copy of input object, shifted.

Return type

DataFrame

See also

Index.shift

Shift values of Index.

DatetimeIndex.shift

Shift values of DatetimeIndex.

PeriodIndex.shift

Shift values of PeriodIndex.

tshift

Shift the time index, using the index’s frequency if available.

Examples

>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
...                    "Col2": [13, 23, 18, 33, 48],
...                    "Col3": [17, 27, 22, 37, 52]},
...                   index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52
>>> df.shift(periods=3)
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0
>>> df.shift(periods=1, axis="columns")
            Col1  Col2  Col3
2020-01-01   NaN    10    13
2020-01-02   NaN    20    23
2020-01-03   NaN    15    18
2020-01-04   NaN    30    33
2020-01-05   NaN    45    48
>>> df.shift(periods=3, fill_value=0)
            Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27
>>> df.shift(periods=3, freq="D")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52
>>> df.shift(periods=3, freq="infer")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

Notes

See pandas API documentation for pandas.DataFrame.shift for more.

property size

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

See also

ndarray.size

Number of elements in the array.

Examples

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3