pd.read_<file>
and I/O APIs#
A number of IO methods default to pandas. We have parallelized read_csv
,
read_parquet
and some more (see table), though many of the remaining methods
can be relatively easily parallelized. Some of the operations default to the
pandas implementation, meaning it will read in serially as a single, non-distributed
DataFrame and distribute it. Performance will be affected by this.
The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Modin for
the method in the left column. Y
stands for yes, N
stands for no, P
stands
for partial (meaning some parameters may not be supported yet), and D
stands for
default to pandas.
Note
Support for fully asynchronous reading has been added for the following functions:
read_csv
, read_fwf
, read_table
, read_custom_text
.
This mode is disabled by default, one can enable it using MODIN_ASYNC_READ_MODE=True
environment variable. Some parameter combinations are not supported and the function
will be executed in synchronous mode.
IO method |
Modin Implementation? (Y/N/P/D) |
Notes for Current implementation |
Y |
||
Y |
||
Y |
||
P |
Parameters besides Experimental implementation: read_parquet_glob |
|
P |
Implemented for |
|
read_xml |
D |
Experimental implementation: read_xml_glob |
D |
||
D |
||
D |
||
D |
||
Y |
||
D |
||
D |
||
D |
Experimental implementation: read_pickle_glob |
|
Y |