pd.read_<file> and I/O APIs#

A number of IO methods default to pandas. We have parallelized read_csv, read_parquet and some more (see table), though many of the remaining methods can be relatively easily parallelized. Some of the operations default to the pandas implementation, meaning it will read in serially as a single, non-distributed DataFrame and distribute it. Performance will be affected by this.

The following table is structured as follows: The first column contains the method name. The second column is a flag for whether or not there is an implementation in Modin for the method in the left column. Y stands for yes, N stands for no, P stands for partial (meaning some parameters may not be supported yet), and D stands for default to pandas.

Note

Currently, the second column reflects implementation status for Ray and Dask engines. By default, support for a method in the Hdk engine could be treated as D unless Notes column contains additional information.

Note

Support for fully asynchronous reading has been added for the following functions: read_csv, read_fwf, read_table, read_custom_text. This mode is disabled by default, one can enable it using MODIN_ASYNC_READ_MODE=True environment variable. Some parameter combinations are not supported and the function will be executed in synchronous mode.

IO method

Modin Implementation? (Y/N/P/D)

Notes for Current implementation

read_csv

Y

Hdk: P, only basic cases and parameters supported: filepath_or_buffer can be local file only, sep, delimiter, header (partly) names, usecols, dtype, true/false_values, skiprows (partly) skip_blank_lines (partly), parse_dates (partly), compression (inferred automatically, should not be specified), quotechar, escapechar, doublequote, delim_whitespace

read_fwf

Y

read_table

Y

read_parquet

P

Parameters besides filters and storage_options passed via **kwargs are not supported. use_nullable_dtypes == True is not supported.

Experimental implementation: read_parquet_glob

read_json

P

Implemented for lines=True Experimental implementation: read_json_glob

read_xml

D

Experimental implementation: read_xml_glob

read_html

D

read_clipboard

D

read_excel

D

read_hdf

D

read_feather

Y

read_stata

D

read_sas

D

read_pickle

D

Experimental implementation: read_pickle_distributed

read_sql

Y