PyArrow storage format#
In general, PyArrow storage formats follow the flow of the pandas ones: query compiler contains an instance of Modin Dataframe,
which is internally split into partitions. The main difference is that partitions contain PyArrow tables,
instead of pandas.DataFrame
-s like with pandas storage format. To learn more about this approach please
visit PyArrowOnRay execution section.
High-Level Module Overview#
This module houses submodules which are responsible for communication between the query compiler level and execution implementation level for PyArrow storage format:
Query compiler is responsible for compiling efficient queries for PyarrowOnRayDataframe.
Parsers are responsible for parsing data on workers during IO operations.
Note
Currently the only one available PyArrow storage format factory is PyarrowOnRay
which works
in experimental mode only.