PyArrow storage format¶

In general, PyArrow storage formats follow the flow of the pandas ones: query compiler contains an instance of Modin Dataframe, which is internally split into partitions. The main difference is that partitions contain PyArrow tables, instead of pandas.DataFrame-s like with pandas storage format. To learn more about this approach please visit PyArrowOnRay execution section.

High-Level Module Overview¶

This module houses submodules which are responsible for communication between the query compiler level and execution implementation level for PyArrow storage format:

Query compiler is responsible for compiling efficient queries for PyarrowOnRayDataframe.
Parsers are responsible for parsing data on workers during IO operations.

Note

Currently the only one available PyArrow storage format factory is PyarrowOnRay which works in experimental mode only.