Third Party Library Integrations#

Modin is a drop-in replacement for Pandas, so we want it to interoperate with third-party libraries just as Pandas does. To see where Modin performs well and where it needs to improve, we’ve selected a number of important machine learning + visualization + statistics libraries, and then looked at examples (from their documentation, if possible) about how they work with Pandas. Then we ran those same workflows with Modin, and tracked what worked, and what failed.

In the table below, you’ll see, for each third-party library we tested, the number of successful test calls / total test calls, and a qualitative description of how both Pandas and Modin integrate with that library.

In the deeper dive, you can view the Jupyter notebook we have used to test API calls and the corresponding Github issues filed. If you come across other issues/ examples in your own workflows we encourage you to file an issue or contribute a PR!

Note

These interoperability metrics are preliminary and not all APIs for each library have been tested. Feel free to add more!

Modin Interoperability by Library#

Library

API successes / calls

Interoperability

seaborn

73% (11/15)

Pandas: Accepts Pandas DataFrames as inputs for producing plot
Modin: Mostly accepts Modin DataFrames as inputs for producing plots, but fails completely in some cases (pairplot, lmplot), and in others (catplot, objects.Plot) only works for some parameter combinations

plotly

78% (7 / 9)

Pandas: Accepts Pandas DataFrames as inputs for producing plots, including specifying X and Y parameters as df columns
Modin: Mostly accepts Modin DataFrames as inputs for producing plots (the exception is choropleth), but fails when specifying X and Y parameters as df columns

matplotlib

100% (5 / 5)

Pandas: Accepts Pandas DataFrames as inputs for producing plots like scatter, barh, etc.
Modin: Accepts Modin DataFrames as inputs for producing plots like scatter, barh, etc.

altair

0% (0 / 1)

Pandas: Accepts Pandas DataFrames as inputs for producing charts through Chart
Modin: Does not accept Modin DataFrames as inputs for producing charts through Chart

bokeh

0% (0 / 1)

Pandas: Loads Pandas DataFrames through ColumnDataSource
Modin: Does not load Modin DataFrames through ColumnDataSource

sklearn

100% (6 / 6)

Pandas: Many functions take Pandas DataFrames as inputs
Modin: Many functions take Modin DataFrames as inputs

Hugging Face (Transformers, Datasets)

100% (2 / 2)

Pandas: Loads Pandas DataFrames into Datasets, and processes Pandas DataFrame rows as inputs using Transformers.InputExample (deprecated)
Modin: Loads Modin DataFrames into Datasets (though slowly), and processes Modin DataFrame rows as inputs through Transformers.InputExample (deprecated)

Tensorflow

75% (3 / 4)

Pandas: Converts Pandas dataframes to tensors
Modin: Converts Modin DataFrames to tensors, but specialized APIs like Keras might not work yet

NLTK

100% (1 / 1)

Pandas: Performs transformations like tokenization on Pandas DataFrames
Modin: Performs transformations like tokenization on Modin DataFrames

XGBoost

100% (1 / 1)

Pandas: Loads Pandas DataFrames through the DMatrix function
Modin: Loads Modin DataFrames through the DMatrix function

statsmodels

50% (1 / 2)

Pandas: Can accept Pandas DataFrames when fitting models
Modin: Sometimes accepts Modin DataFrames when fitting models (e.g., formula.api.ols), but does not in others (e.g., api.OLS)

A Deeper Dive#

seaborn#

Jupyter Notebook

Github Issues

plotly#

Jupyter Notebook

Github Issues

matplotlib#

Jupyter Notebook

altair#

Jupyter Notebook

Github Issues

bokeh#

Jupyter Notebook

Github Issues

sklearn#

Jupyter Notebook

Hugging Face#

Jupyter Notebook

Tensorflow#

Jupyter Notebook

Github Issues

NLTK#

Jupyter Notebook

XGBoost#

Jupyter Notebook

statsmodels#

Jupyter Notebook

Github Issues

Appendix: System Information#

The example scripts here were run on the following system:

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Big Sur 11.5.2

  • Modin version: 0.18.0+3.g4114183f

  • Ray version: 2.0.1

  • Python version: 3.9.7.final.0

  • Machine: MacBook Pro (16-inch, 2019)

  • Processor: 2.3 GHz 8-core Intel Core i9 processor

  • Memory: 16 GB 2667 MHz DDR4