`pd.DataFrame` supported APIs#

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository, or give a thumbs up to already created issues. Contributions are also welcome!

The following table is structured as follows: The first column contains the method name. The second column contains link to a description of corresponding pandas method. The third column is a flag for whether or not there is an implementation in Modin for the method in the left column. Y stands for yes, N stands for no, P stands for partial (meaning some parameters may not be supported yet), and D stands for default to pandas.

Note

Currently third column reflects implementation status for Ray and Dask engines. By default, support for a method in the HDK engine could be treated as D unless Notes column contains additional information. Similarly, by default Notes contains information about Ray and Dask engines unless Hdk is explicitly mentioned.

DataFrame method	pandas Doc link	Implemented? (Y/N/P/D)	Notes for Current implementation
`T`	T	Y
`abs`	abs	Y
`add`	add	Y	Ray and Dask: Shuffles data in operations between DataFrames. Hdk: `P`, support binary operations on scalars and projections of the same frame, otherwise `D`
`add_prefix`	add_prefix	Y
`add_suffix`	add_suffix	Y
`agg` / `aggregate`	agg / aggregate	P	Dictionary `func` parameter defaults to pandas Numpy operations default to pandas
`align`	align	D
`all`	all	Y
`any`	any	Y
`apply`	apply	Y	See `agg`
`applymap`	applymap	Y
`asfreq`	asfreq	D
`asof`	asof	Y
`assign`	assign	Y
`astype`	astype	Y	Hdk: `P`, int``<-> ``float supported
`at`	at	Y
`at_time`	at_time	Y
`axes`	axes	Y
`between_time`	between_time	Y
`bfill`	bfill	Y
`bool`	bool	Y
`boxplot`	boxplot	D
`clip`	clip	Y
`combine`	combine	Y
`combine_first`	combine_first	Y
`compare`	compare	Y
`copy`	copy	Y
`corr`	corr	P	Correlation floating point precision may slightly differ from pandas. For now pearson method is available only. For other methods and for `numeric_only` defaults to pandas.
`corrwith`	corrwith	D
`count`	count	Y	Hdk: `P`, only default params supported, otherwise `D`
`cov`	cov	P	Covariance floating point precision may slightly differ from pandas. For `numeric_only` defaults to pandas.
`cummax`	cummax	Y
`cummin`	cummin	Y
`cumprod`	cumprod	Y
`cumsum`	cumsum	Y
`describe`	describe	Y
`diff`	diff	Y
`div`	div	Y	See `add`
`divide`	divide	Y	See `add`
`dot`	dot	Y
`drop`	drop	Y	Hdk: `P` since row drop unsupported
`droplevel`	droplevel	Y
`drop_duplicates`	drop_duplicates	D
`dropna`	dropna	Y	Hdk: `P` since `thresh` and `axis` params unsupported
`dtypes`	dtypes	Y	Hdk: `Y`
`duplicated`	duplicated	Y
`empty`	empty	Y
`eq`	eq	Y	See `add`
`equals`	equals	Y	Requires shuffle, can be further optimized
`eval`	eval	Y
`ewm`	ewm	D
`expanding`	expanding	D
`explode`	explode	Y
`ffill`	ffill	Y
`fillna`	fillna	P	`value` parameter of type DataFrame defaults to pandas. Hdk: `P`, params `limit`, `downcast` and `method` unsupported. Also only `axis = 0` supported for now
`filter`	filter	Y
`first`	first	Y
`first_valid_index`	first_valid_index	Y
`floordiv`	floordiv	Y	See `add`
`from_dict`	from_dict	D
`from_records`	from_records	D
`ge`	ge	Y	See `add`
`get`	get	Y
`groupby`	groupby	Y	Not yet optimized for all operations. Hdk: `P`. `count`, `sum`, `size`, `mean`, `nunique`, `std`, `skew` supported, otherwise `D`
`gt`	gt	Y	See `add`
`head`	head	Y
`hist`	hist	D
`iat`	iat	Y
`idxmax`	idxmax	Y
`idxmin`	idxmin	Y
`iloc`	iloc	Y	Hdk: `P`, read access fully supported, write access: no row and 2D assignments support
`infer_objects`	infer_objects	Y	Hdk: `D`
`info`	info	Y
`insert`	insert	Y
`interpolate`	interpolate	D
`isetitem`	isetitem	D
`isin`	isin	Y
`isna`	isna	Y
`isnull`	isnull	Y
`items`	items	Y
`iterrows`	iterrows	P	Modin does not parallelize iteration in Python
`itertuples`	itertuples	P	Modin does not parallelize iteration in Python
`join`	join	P	When `on` is set to `right` or `outer` or when `validate` is given defaults to pandas
`keys`	keys	Y
`kurt`	kurt	Y
`kurtosis`	kurtosis	Y
`last`	last	Y
`last_valid_index`	last_valid_index	Y
`le`	le	Y	See `add`
`loc`	loc	P	We do not support: boolean array, callable. Hdk: `P`, read access fully supported, write access: no row and 2D assignments support
`lt`	lt	Y	See `add`
`mask`	mask	D
`max`	max	Y	Hdk: `P`, only default params supported, otherwise `D`
`mean`	mean	P	Modin defaults to pandas if given the `level` param. Hdk: `P`. `D` for `level`, `axis`, `skipna` and `numeric_only` params
`median`	median	P	Modin defaults to pandas if given the `level` param.
`melt`	melt	Y
`memory_usage`	memory_usage	Y
`merge`	merge	P	Implemented the following cases: `left_index=True` and `right_index=True`, `how=left` and `how=inner` for all values of parameters except `left_index=True` and `right_index=False` or `left_index=False` and `right_index=True`. Defaults to pandas otherwise. Hdk: `P`, only non-index joins for `how=left` and `how=inner` with explicit on are supported
`min`	min	Y	Hdk: `P`, only default params supported, otherwise `D`
`mod`	mod	Y	See `add`
`mode`	mode	Y
`mul`	mul	Y	See `add`
`multiply`	multiply	Y	See `add`
`ndim`	ndim	Y
`ne`	ne	Y	See `add`
`nlargest`	nlargest	Y
`notna`	notna	Y
`notnull`	notnull	Y
`nsmallest`	nsmallest	Y
`nunique`	nunique	Y	Hdk: `P`, no support for `axis!=0` and `dropna=False`
`pct_change`	pct_change	D
`pipe`	pipe	Y
`pivot`	pivot	Y
`pivot_table`	pivot_table	Y
`plot`	plot	D
`pop`	pop	Y
`pow`	pow	Y	See `add`; Hdk: `D`
`prod`	prod	Y
`product`	product	Y
`quantile`	quantile	Y
`query`	query	P	Local variables not yet supported
`radd`	radd	Y	See `add`
`rank`	rank	Y
`rdiv`	rdiv	Y	See `add`; Hdk: `D`
`reindex`	reindex	Y	Shuffles data
`reindex_like`	reindex_like	D
`rename`	rename	Y
`rename_axis`	rename_axis	Y
`reorder_levels`	reorder_levels	Y
`replace`	replace	Y
`resample`	resample	Y
`reset_index`	reset_index	P	Hdk: `P`. `D` for `level` parameter Ray and Dask: `D` when `names` or `allow_duplicates` is non-default
`rfloordiv`	rfloordiv	Y	See `add`; Hdk: `D`
`rmod`	rmod	Y	See `add`; Hdk: `D`
`rmul`	rmul	Y	See `add`
`rolling`	rolling	Y
`round`	round	Y
`rpow`	rpow	Y	See `add`; Hdk: `D`
`rsub`	rsub	Y	See `add`; Hdk: `D`
`rtruediv`	rtruediv	Y	See `add`; Hdk: `D`
`sample`	sample	Y
`select_dtypes`	select_dtypes	Y
`sem`	sem	P	Modin defaults to pandas if given the `level` param.
`set_axis`	set_axis	Y
`set_index`	set_index	Y
`shape`	shape	Y	Hdk: `Y`
`shift`	shift	Y
`size`	size	Y
`skew`	skew	P	Modin defaults to pandas if given the `level` param
`sort_index`	sort_index	Y
`sort_values`	sort_values	Y	Shuffles data. Order of indexes that have the same sort key is not guaranteed to be the same across sorts; Hdk: `Y`
`sparse`	sparse	N
`squeeze`	squeeze	Y
`stack`	stack	Y
`std`	std	P	Modin defaults to pandas if given the `level` param.
`style`	style	D
`sub`	sub	Y	See `add`
`subtract`	subtract	Y	See `add`; Hdk: `D`
`sum`	sum	Y	Hdk: `P`, only default params supported, otherwise `D`
`swapaxes`	swapaxes	Y
`swaplevel`	swaplevel	Y
`tail`	tail	Y
`take`	take	Y
`to_clipboard`	to_clipboard	D
`to_csv`	to_csv	Y
`to_dict`	to_dict	D
`to_excel`	to_excel	D
`to_feather`	to_feather	D
`to_gbq`	to_gbq	D
`to_hdf`	to_hdf	D
`to_html`	to_html	D
`to_json`	to_json	D	Experimental implementation: DataFrame.modin.to_json_glob
`to_xml`	to_xml	D	Experimental implementation: DataFrame.modin.to_xml_glob
`to_latex`	to_latex	D
`to_orc`	to_orc	D
`to_parquet`	to_parquet	P	Ray/Dask/Unidist: Parallel implementation only if path parameter is a string. In that case, the `path` parameter specifies a directory where one file is written per row partition of the Modin dataframe. Experimental implementation: DataFrame.modin.to_parquet_glob
`to_period`	to_period	D
`to_pickle`	to_pickle	D	Experimental implementation: DataFrame.modin.to_pickle_glob
`to_records`	to_records	D
`to_sql`	to_sql	Y
`to_stata`	to_stata	D
`to_string`	to_string	D
`to_timestamp`	to_timestamp	D
`to_xarray`	to_xarray	D
`transform`	transform	Y
`transpose`	transpose	Y
`truediv`	truediv	Y	See `add`
`truncate`	truncate	Y
`tz_convert`	tz_convert	Y
`tz_localize`	tz_localize	Y
`unstack`	unstack	Y
`update`	update	Y
`values`	values	Y
`value_counts`	value_counts	D
`var`	var	P	Modin defaults to pandas if given the `level` param.
`where`	where	Y

pd.DataFrame supported APIs#

`pd.DataFrame` supported APIs#