Advanced Usage

Modin aims to not only optimize pandas, but also provide a comprehensive, integrated toolkit for data scientists. We are actively developing data science tools such as DataFrame spreadsheet integration, DataFrame algebra, progress bars, SQL queries on DataFrames, and more. Join us on Slack and Discourse for the latest updates!

Experimental APIs

Modin also supports these experimental APIs on top of pandas that are under active development.

DataFrame partitioning API

Modin DataFrame provides an API to directly access partitions: you can extract physical partitions from a DataFrame, modify their structure by reshuffling or applying some functions, and create a DataFrame from those modified partitions. Visit pandas partitioning API documentation to learn more.

Modin Spreadsheet API

The Spreadsheet API for Modin allows you to render the dataframe as a spreadsheet to easily explore your data and perform operations on a graphical user interface. The API also includes features for recording the changes made to the dataframe and exporting them as reproducible code. Built on top of Modin and SlickGrid, the spreadsheet interface is able to provide interactive response times even at a scale of billions of rows. See our Modin Spreadsheet API documentation for more details.

../../_images/modin_spreadsheet_mini_demo.gif

Progress Bar

Visual progress bar for Dataframe operations such as groupby and fillna, as well as for file reading operations such as read_csv. Built using the tqdm library and Ray execution engine. See Progress Bar documentation for more details.

../../_images/progress_bar_example.png

Dataframe Algebra

A minimal set of operators that can be composed to express any dataframe query for use in query planning and optimization. See our paper for more information, and full documentation is coming soon!

SQL on Modin Dataframes

Read about Modin Dataframe support for SQL queries in this recent blog post. Check out the Modin SQL documentation as well!

../../_images/modin_sql_example.png

Distributed XGBoost on Modin

Modin provides an implementation of distributed XGBoost machine learning algorithm on Modin DataFrames. See our Distributed XGBoost on Modin documentation for details about installation and usage, as well as Modin XGBoost architecture documentation for information about implementation and internal execution flow.