In this section, we explain the design and motivation behind Modin and why you should use Modin to scale up your pandas workflows. We first describe the architectural differences between pandas and Modin. Then we describe how Modin can also help resolve out-of-memory issues common to pandas. Finally, we look at the key differences between Modin and other distributed dataframe libraries.
- How does Modin differ from pandas?
- Out-of-memory data with Modin
- Modin vs. Dask DataFrame vs. Koalas
Modin is built on many years of research and development at UC Berkeley. For more information on how this works underneath the hoods, check out our publications in this space:
Flexible Rule-Based Decomposition and Metadata Independence in Modin (VLDB 2021)
Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time (IEEE Data Eng 2021)
Dataframe Systems: Theory, Architecture, and Implementation (PhD Dissertation 2021)
Scaling Data Science does not mean Scaling Machines (CIDR 2021)
Towards Scalable Dataframe Systems (VLDB 2020)