pd.read_<file> and I/O APIs¶
A number of IO methods default to pandas. We have parallelized
read_parquet, though many of the remaining methods can be relatively easily
parallelized. Some of the operations default to the pandas implementation, meaning it
will read in serially as a single, non-distributed DataFrame and distribute it.
Performance will be affected by this.
The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Modin for
the method in the left column.
Y stands for yes,
N stands for no,
for partial (meaning some parameters may not be supported yet), and
D stands for
default to pandas.
|IO method||Modin Implementation? (Y/N/P/D)||Notes for Current implementation|