pd.read_<file>
and I/O APIsΒΆ
A number of IO methods default to pandas. We have parallelized read_csv
and
read_parquet
, though many of the remaining methods can be relatively easily
parallelized. Some of the operations default to the pandas implementation, meaning it
will read in serially as a single, non-distributed DataFrame and distribute it.
Performance will be affected by this.
The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Modin for
the method in the left column. Y
stands for yes, N
stands for no, P
stands
for partial (meaning some parameters may not be supported yet), and D
stands for
default to pandas.
IO method | Modin Implementation? (Y/N/P/D) | Notes for Current implementation |
read_csv | Y | |
read_table | Y | |
read_parquet | Y | |
read_json | P | Implemented for lines=True |
read_html | D | |
read_clipboard | D | |
read_excel | D | |
read_hdf | Y | |
read_feather | Y | |
read_msgpack | D | |
read_stata | D | |
read_sas | D | |
read_pickle | D | |
read_sql | Y |