Third Party Library Integrations#
Modin is a drop-in replacement for Pandas, so we want it to interoperate with third-party libraries just as Pandas does. To see where Modin performs well and where it needs to improve, we’ve selected a number of important machine learning + visualization + statistics libraries, and then looked at examples (from their documentation, if possible) about how they work with Pandas. Then we ran those same workflows with Modin, and tracked what worked, and what failed.
In the table below, you’ll see, for each third-party library we tested, the number of successful test calls / total test calls, and a qualitative description of how both Pandas and Modin integrate with that library.
In the deeper dive, you can view the Jupyter notebook we have used to test API calls and the corresponding Github issues filed. If you come across other issues/ examples in your own workflows we encourage you to file an issue or contribute a PR!
Note
These interoperability metrics are preliminary and not all APIs for each library have been tested. Feel free to add more!
Modin Interoperability by Library#
Library |
API successes / calls |
Interoperability |
---|---|---|
seaborn |
73% (11/15) |
Pandas: Accepts Pandas DataFrames as inputs for producing plot |
plotly |
78% (7 / 9) |
Pandas: Accepts Pandas DataFrames as inputs for producing plots, including specifying X and Y parameters as df columns |
matplotlib |
100% (5 / 5) |
Pandas: Accepts Pandas DataFrames as inputs for producing plots like scatter, barh, etc. |
altair |
0% (0 / 1) |
Pandas: Accepts Pandas DataFrames as inputs for producing charts through Chart |
bokeh |
0% (0 / 1) |
Pandas: Loads Pandas DataFrames through ColumnDataSource |
sklearn |
100% (6 / 6) |
Pandas: Many functions take Pandas DataFrames as inputs |
Hugging Face (Transformers, Datasets) |
100% (2 / 2) |
Pandas: Loads Pandas DataFrames into Datasets, and processes Pandas DataFrame rows as inputs using Transformers.InputExample (deprecated) |
Tensorflow |
75% (3 / 4) |
Pandas: Converts Pandas dataframes to tensors |
NLTK |
100% (1 / 1) |
Pandas: Performs transformations like tokenization on Pandas DataFrames |
XGBoost |
100% (1 / 1) |
Pandas: Loads Pandas DataFrames through the DMatrix function |
statsmodels |
50% (1 / 2) |
Pandas: Can accept Pandas DataFrames when fitting models |
A Deeper Dive#
seaborn#
- Github Issues
plotly#
- Github Issues
matplotlib#
altair#
- Github Issues
bokeh#
- Github Issues
sklearn#
Hugging Face#
Tensorflow#
- Github Issues
NLTK#
XGBoost#
statsmodels#
- Github Issues
Appendix: System Information#
The example scripts here were run on the following system:
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Big Sur 11.5.2
Modin version: 0.18.0+3.g4114183f
Ray version: 2.0.1
Python version: 3.9.7.final.0
Machine: MacBook Pro (16-inch, 2019)
Processor: 2.3 GHz 8-core Intel Core i9 processor
Memory: 16 GB 2667 MHz DDR4