PyArrow storage format¶
In general, PyArrow storage formats follow the flow of the pandas ones: query compiler contains an instance of Modin Dataframe,
which is internally split into partitions. The main difference is that partitions contain PyArrow tables,
pandas.DataFrame-s like with pandas storage format. To learn more about this approach please
visit PyArrowOnRay execution section.
High-Level Module Overview¶
This module houses submodules which are responsible for communication between the query compiler level and execution implementation level for PyArrow storage format:
Parsers are responsible for parsing data on workers during IO operations.
Currently the only one available PyArrow storage format factory is
PyarrowOnRay which works
in experimental mode only.