In general, PyArrow backend follows the flow of the pandas backend: query compiler contains an instance of Modin Frame, which is internally split into partitions. The main difference is that partitions contain PyArrow tables, instead of DataFrames like in pandas backend. To learn more about this approach please visit PyArrow execution engine section.
High-Level Module Overview¶
This module houses submodules which are responsible for communication between the query compiler level and execution engine level for PyArrow backend:
Parsers are responsible for parsing data on workers during IO operations.
Currently the only one available PyArrow backend factory is
PyarrowOnRay which works
in experimental mode only.