Ecosystem#
There is a constantly growing number of users and packages using pandas to address their specific needs in data preparation, analysis and visualization. pandas is being used ubiquitously and is a good choise to handle small-sized data. However, pandas scales poorly and is non-interactive on moderate to large datasets. Modin provides a drop-in replacement API for pandas and scales computation across nodes and CPUs available. What you need to do to switch to Modin is just replace a single line of code.
# import pandas as pd
import modin.pandas as pd
While most packages can consume a pandas DataFrame and operate it efficiently,
this is not the case with a Modin DataFrame due to its distributed nature.
Thus, some packages may lack support for handling Modin DataFrame(s) correctly and,
moreover, efficiently. Modin implements such methods as __array__
, __dataframe__
, etc.
to facilitate other libraries to consume a Modin DataFrame. If you feel that a certain library
can operate efficiently with a specific format of data, it is possible to convert a Modin DataFrame
to the format preferred.
to_pandas#
You can refer to pandas ecosystem page to get more details on where pandas can be used and what libraries it powers.
from modin.pandas.io import to_pandas
pandas_df = to_pandas(modin_df)
to_numpy#
You can refer to NumPy ecosystem section of NumPy documentation to get more details on where NumPy can be used and what libraries it powers.
from modin.pandas.io import to_numpy
numpy_arr = to_numpy(modin_df)
to_ray#
You can refer to Ray Data page to get more details on where Ray Dataset can be used and what libraries it powers.
from modin.pandas.io import to_ray
ray_dataset = to_ray(modin_df)
to_dask#
You can refer to Dask DataFrame page to get more details on where Dask DataFrame can be used and what libraries it powers.
from modin.pandas.io import to_dask
dask_df = to_dask(modin_df)