PyArrow storage format

In general, PyArrow storage formats follow the flow of the pandas ones: query compiler contains an instance of Modin Dataframe, which is internally split into partitions. The main difference is that partitions contain PyArrow tables, instead of pandas.DataFrame-s like with pandas storage format. To learn more about this approach please visit PyArrowOnRay execution section.

High-Level Module Overview

This module houses submodules which are responsible for communication between the query compiler level and execution implementation level for PyArrow storage format:

Note

Currently the only one available PyArrow storage format factory is PyarrowOnRay which works in experimental mode only.