Modin Logo
0.12.0

Installation

  • Installation
    • Installing with pip
      • Stable version
      • Release candidates
      • Installing specific dependency sets
      • Installing on Google Colab
    • Installing with conda
      • Using conda-forge channel
      • Using IntelĀ® Distribution of Modin
    • Installing from the GitHub master branch
    • Windows
    • Building Modin from Source

Getting Started

  • Using Modin
    • Quickstart
    • Using Modin on a Single Node
    • APIs Supported
    • Connecting to a database for read_sql
    • Using Modin on a Cluster (experimental)
    • Advanced usage (experimental)
      • Exceeding memory (Out of core pandas)
      • Reducing or limiting the resources Modin can use
    • Examples
  • Out of Core in Modin
    • Starting Modin with out of core enabled
    • Disabling Out of Core
    • Running an example with out of core

Examples

  • Examples
    • scikit-learn with LinearRegression

Experimental Features

  • Overview
    • Modin Spreadsheet API: Render Dataframes as Spreadsheets
    • Progress Bar
    • Dataframe Algebra
    • SQL on Modin Dataframes
    • Distributed XGBoost on Modin
  • SQL on Modin Dataframes
    • A Short Example Using the Google Play Store
    • Writing Complex Queries
    • Further Examples and Full Documentation
  • Modin Spreadsheets API
    • Getting started
    • Basic Manipulations through User Interface
    • Virtual Rendering
    • Transformation History and Exporting Code
    • Customizable Interface
    • Converting Spreadsheets To and From Dataframes
    • Further API Documentation
  • Progress Bar
    • Quickstart
  • Distributed XGBoost on Modin
    • Install XGBoost on Modin
    • XGBoost Train and Predict
    • ModinDMatrix
    • A Single Node / Cluster setup
    • Usage example
  • Modin in the Cloud
    • Prerequisites
    • Setup environment
    • Architecture
    • Public interface
    • Usage examples

How is Modin different from ...?

  • Modin vs. pandas
    • Scalablity of implementation
    • Memory usage and immutability
    • API vs implementation
  • Modin vs. Dask Dataframe
    • API
      • Dask DataFrame
      • Modin
    • Architecture
      • Dask DataFrame
      • Modin
  • Modin vs. Koalas and Spark

Supported APIs

  • Supported APIs and Defaulting to pandas
    • Questions on implementation details
    • Defaulting to pandas
  • pd.DataFrame supported APIs
  • pd.Series supported APIs
  • pandas Utilities Supported
    • Other objects & structures
  • pd.read_<file> and I/O APIs

Developer Documentation

  • Contributing
    • Getting Started
    • Certificate of Origin
      • CERTIFICATE OF ORIGIN V 1.1
    • Commit Message formatting
    • General Rules for committers
    • Development Dependencies
    • Code Formatting and Lint
    • Adding a test
    • Running the tests
    • Performance measurement
    • Building documentation
    • Contributing a new execution framework or in-memory format
  • System Architecture
    • High-Level Architectural View
    • System View
    • Subsystem/Container View
    • Component View
      • Modin PandasDataframe Objects
        • PandasDataframe
        • PandasDataframePartition
        • PandasDataframeAxisPartition
        • PandasDataframePartitionManager
      • Generic Ray-based members
      • PandasOnRay Dataframe implementation
        • PandasOnRayDataframe
        • PandasOnRayDataframePartition
        • PandasOnRayDataframeAxisPartition
        • PandasOnRayDataframeColumnPartition
        • PandasOnRayDataframeRowPartition
        • PandasOnRayDataframePartitionManager
      • cuDFOnRay Dataframe Implementation
        • cuDFOnRayDataframe
        • cuDFOnRayDataframePartition
        • cuDFOnRayDataframeAxisPartition
        • cuOnRayDataframeColumnPartition
        • cuDFOnRayDataframeRowPartition
        • cuDFOnRayDataframePartitionManager
        • GPUManager
      • PandasOnDask Dataframe implementation
        • PandasOnDaskDataframe
        • PandasOnDaskDataframePartition
        • PandasOnDaskDataframeAxisPartition
        • PandasOnDaskDataframeColumnPartition
        • PandasOnDaskDataframeRowPartition
        • PandasOnDaskDataframePartitionManager
      • Experimental
        • Scikit-learn module description
        • Modin XGBoost module description
      • Storage Formats
      • Query Compiler
        • BaseQueryCompiler
        • Pandas storage format
        • PyArrow storage format
        • High-level module overview
      • PandasOnPython Dataframe implementation
        • PandasOnPythonDataframe
        • PandasOnPythonDataframePartition
        • PandasOnPythonDataframeAxisPartition
        • PandasOnPythonFrameColumnPartition
        • PandasOnPythonFrameRowPartition
        • PandasOnPythonDataframePartition
    • DataFrame Partitioning
      • Index
      • API
        • BasePandasDataset
        • DataFrame Module Overview
        • Series Module Overview
      • Query Compiler
      • Core Modin Dataframe
        • Core Modin Dataframe API
      • Execution Engine/Framework
      • Internal abstractions
        • Partition Manager
        • Partition
      • Supported Execution Frameworks and Memory Formats
    • Module/Class View
  • Partition API in Modin
    • Partition IPs
    • Partition API implementations
      • Pandas Partition API
        • unwrap_partitions
        • from_partitions
        • Example
    • Ray engine
    • Dask engine
    • How to handle Ray objects that are lower than 100 kB

Engines, Storage formats, and APIs

  • pandas on Ray
  • Pandas on Dask
  • OmniSci
  • Pyarrow on Ray

Help

  • Troubleshooting
    • Frequently encountered issues
      • Error During execution: ArrowIOError: Broken Pipe
      • Error during execution: ArrowInvalid: Maximum size exceeded (2GB)
      • Hanging on import modin.pandas as pd
      • Importing heterogeneous data by read_csv
  • Contact
    • Mailing List
    • Issues
Modin
  • Docs »
  • Search
  • Edit on GitHub


© Copyright 2018-2021, Modin Revision 054e7fb9.

Built with Sphinx using a theme provided by Read the Docs.