pd.read_<file>
and I/O APIs¶
A number of IO methods default to pandas. We have parallelized read_csv
and
read_parquet
, though many of the remaining methods can be relatively easily
parallelized. Some of the operations default to the pandas implementation, meaning it
will read in serially as a single, non-distributed DataFrame and distribute it.
Performance will be affected by this.
The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Modin for
the method in the left column. Y
stands for yes, N
stands for no, P
stands
for partial (meaning some parameters may not be supported yet), and D
stands for
default to pandas.
IO method |
Modin Implementation? (Y/N/P/D) |
Notes for Current implementation |
Y |
||
Y |
||
Y |
||
P |
Implemented for |
|
D |
||
D |
||
D |
||
Y |
||
Y |
||
D |
||
D |
||
D |
||
D |
Experimental implementation: read_pickle_distributed |
|
Y |