pd.DataFrame supported APIs#

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository, or give a thumbs up to already created issues. Contributions are also welcome!

The following table is structured as follows: The first column contains the method name. The second column contains link to a description of corresponding pandas method. The third column is a flag for whether or not there is an implementation in Modin for the method in the left column. Y stands for yes, N stands for no, P stands for partial (meaning some parameters may not be supported yet), and D stands for default to pandas.

Note

Currently third column reflects implementation status for Ray and Dask engines. By default, support for a method in the HDK engine could be treated as D unless Notes column contains additional information. Similarly, by default Notes contains information about Ray and Dask engines unless Hdk is explicitly mentioned.

DataFrame method

pandas Doc link

Implemented? (Y/N/P/D)

Notes for Current implementation

T

T

Y

abs

abs

Y

add

add

Y

Ray and Dask: Shuffles data in operations between DataFrames. Hdk: P, support binary operations on scalars and projections of the same frame, otherwise D

add_prefix

add_prefix

Y

add_suffix

add_suffix

Y

agg / aggregate

agg / aggregate

P

  • Dictionary func parameter defaults to pandas

  • Numpy operations default to pandas

align

align

D

all

all

Y

any

any

Y

append

append

Y

Hdk: Y but sort and ignore_index parameters ignored

apply

apply

Y

See agg

applymap

applymap

Y

asfreq

asfreq

D

asof

asof

Y

assign

assign

Y

astype

astype

Y

Hdk: P, int``<-> ``float supported

at

at

Y

at_time

at_time

Y

axes

axes

Y

between_time

between_time

Y

bfill

bfill

Y

bool

bool

Y

boxplot

boxplot

D

clip

clip

Y

combine

combine

Y

combine_first

combine_first

Y

compare

compare

Y

copy

copy

Y

corr

corr

P

Correlation floating point precision may slightly differ from pandas. For now pearson method is available only. For other methods and for numeric_only defaults to pandas.

corrwith

corrwith

D

count

count

Y

Hdk: P, only default params supported, otherwise D

cov

cov

P

Covariance floating point precision may slightly differ from pandas. For numeric_only defaults to pandas.

cummax

cummax

Y

cummin

cummin

Y

cumprod

cumprod

Y

cumsum

cumsum

Y

describe

describe

Y

diff

diff

Y

div

div

Y

See add

divide

divide

Y

See add

dot

dot

Y

drop

drop

Y

Hdk: P since row drop unsupported

droplevel

droplevel

Y

drop_duplicates

drop_duplicates

D

dropna

dropna

Y

Hdk: P since thresh and axis params unsupported

dtypes

dtypes

Y

Hdk: Y

duplicated

duplicated

Y

empty

empty

Y

eq

eq

Y

See add

equals

equals

Y

Requires shuffle, can be further optimized

eval

eval

Y

ewm

ewm

D

expanding

expanding

D

explode

explode

Y

ffill

ffill

Y

fillna

fillna

P

value parameter of type DataFrame defaults to pandas. Hdk: P, params limit, downcast and method unsupported. Also only axis = 0 supported for now

filter

filter

Y

first

first

Y

first_valid_index

first_valid_index

Y

floordiv

floordiv

Y

See add

from_dict

from_dict

D

from_records

from_records

D

ge

ge

Y

See add

get

get

Y

groupby

groupby

Y

Not yet optimized for all operations. Hdk: P. count, sum, size, mean, nunique, std, skew supported, otherwise D

gt

gt

Y

See add

head

head

Y

hist

hist

D

iat

iat

Y

idxmax

idxmax

Y

idxmin

idxmin

Y

iloc

iloc

Y

Hdk: P, read access fully supported, write access: no row and 2D assignments support

infer_objects

infer_objects

Y

Hdk: D

info

info

Y

insert

insert

Y

interpolate

interpolate

D

isetitem

isetitem

D

isin

isin

Y

isna

isna

Y

isnull

isnull

Y

items

items

Y

iteritems

iteritems

P

Modin does not parallelize iteration in Python

iterrows

iterrows

P

Modin does not parallelize iteration in Python

itertuples

itertuples

P

Modin does not parallelize iteration in Python

join

join

P

When on is set to right or outer or when validate is given defaults to pandas

keys

keys

Y

kurt

kurt

Y

kurtosis

kurtosis

Y

last

last

Y

last_valid_index

last_valid_index

Y

le

le

Y

See add

loc

loc

P

We do not support: boolean array, callable. Hdk: P, read access fully supported, write access: no row and 2D assignments support

lookup

lookup

D

lt

lt

Y

See add

mad

mad

Y

mask

mask

D

max

max

Y

Hdk: P, only default params supported, otherwise D

mean

mean

P

Modin defaults to pandas if given the level param. Hdk: P. D for level, axis, skipna and numeric_only params

median

median

P

Modin defaults to pandas if given the level param.

melt

melt

Y

memory_usage

memory_usage

Y

merge

merge

P

Implemented the following cases: left_index=True and right_index=True, how=left and how=inner for all values of parameters except left_index=True and right_index=False or left_index=False and right_index=True. Defaults to pandas otherwise. Hdk: P, only non-index joins for how=left and how=inner with explicit on are supported

min

min

Y

Hdk: P, only default params supported, otherwise D

mod

mod

Y

See add

mode

mode

Y

mul

mul

Y

See add

multiply

multiply

Y

See add

ndim

ndim

Y

ne

ne

Y

See add

nlargest

nlargest

Y

notna

notna

Y

notnull

notnull

Y

nsmallest

nsmallest

Y

nunique

nunique

Y

Hdk: P, no support for axis!=0 and dropna=False

pct_change

pct_change

D

pipe

pipe

Y

pivot

pivot

Y

pivot_table

pivot_table

Y

plot

plot

D

pop

pop

Y

pow

pow

Y

See add; Hdk: D

prod

prod

Y

product

product

Y

quantile

quantile

Y

query

query

P

Local variables not yet supported

radd

radd

Y

See add

rank

rank

Y

rdiv

rdiv

Y

See add; Hdk: D

reindex

reindex

Y

Shuffles data

reindex_like

reindex_like

D

rename

rename

Y

rename_axis

rename_axis

Y

reorder_levels

reorder_levels

Y

replace

replace

Y

resample

resample

Y

reset_index

reset_index

P

Hdk: P. D for level parameter Ray and Dask: D when names or allow_duplicates is non-default

rfloordiv

rfloordiv

Y

See add; Hdk: D

rmod

rmod

Y

See add; Hdk: D

rmul

rmul

Y

See add

rolling

rolling

Y

round

round

Y

rpow

rpow

Y

See add; Hdk: D

rsub

rsub

Y

See add; Hdk: D

rtruediv

rtruediv

Y

See add; Hdk: D

sample

sample

Y

select_dtypes

select_dtypes

Y

sem

sem

P

Modin defaults to pandas if given the level param.

set_axis

set_axis

Y

set_index

set_index

Y

shape

shape

Y

Hdk: Y

shift

shift

Y

size

size

Y

skew

skew

P

Modin defaults to pandas if given the level param

slice_shift

slice_shift

Y

sort_index

sort_index

Y

sort_values

sort_values

Y

Shuffles data; Hdk: Y

sparse

sparse

N

squeeze

squeeze

Y

stack

stack

Y

std

std

P

Modin defaults to pandas if given the level param.

style

style

D

sub

sub

Y

See add

subtract

subtract

Y

See add; Hdk: D

sum

sum

Y

Hdk: P, only default params supported, otherwise D

swapaxes

swapaxes

Y

swaplevel

swaplevel

Y

tail

tail

Y

take

take

Y

to_clipboard

to_clipboard

D

to_csv

to_csv

Y

to_dict

to_dict

D

to_excel

to_excel

D

to_feather

to_feather

D

to_gbq

to_gbq

D

to_hdf

to_hdf

D

to_html

to_html

D

to_json

to_json

D

to_latex

to_latex

D

to_orc

to_orc

D

to_parquet

to_parquet

P

Dask: Defaults to Pandas implementation and writes a single output file. Ray: Parallel implementation only if path parameter is a string; does not end with “.gz”, “.bz2”, “.zip”, or “.xz”; and compression parameter is not None or “snappy”. In these cases, the path parameter specifies a directory where one file is written per row partition of the Modin dataframe.

to_period

to_period

D

to_pickle

to_pickle

D

Experimental implementation: to_pickle_distributed

to_records

to_records

D

to_sql

to_sql

Y

to_stata

to_stata

D

to_string

to_string

D

to_timestamp

to_timestamp

D

to_xarray

to_xarray

D

transform

transform

Y

transpose

transpose

Y

truediv

truediv

Y

See add

truncate

truncate

Y

tshift

tshift

Y

tz_convert

tz_convert

Y

tz_localize

tz_localize

Y

unstack

unstack

Y

update

update

Y

values

values

Y

value_counts

value_counts

D

var

var

P

Modin defaults to pandas if given the level param.

where

where

Y