:orphan: PyArrow-on-Ray Module Description """"""""""""""""""""""""""""""""" High-Level Module Overview '''''''''''''''''''''''''' This module houses experimental functionality with PyArrow storage format and Ray engine. The biggest difference from core engines is that internally each partition is represented as ``pyarrow.Table`` put in the ``Ray`` Plasma store. Why to Use PyArrow Tables ''''''''''''''''''''''''' As it was `mentioned `_ by the pandas creator, pandas internal architecture is not optimal and sometimes needs up to ten times more memory than the original dataset size (note, that pandas rule of thumb: `have 5 to 10 times as much RAM as the size of your dataset`). In order to fix this issue (or at least to reduce needed memory amount and needed data copying), ``PyArrow-on-Ray`` module was added. Due to the optimized architecture of PyArrow Tables, `no additional copies are needed `_ in some corner cases, which can significantly improve Modin performance. The downside of this approach is that PyArrow and pandas do not support the same APIs and some functions/parameters may have different signatures or output different results, so for now the ``PyArrow-on-Ray`` engine is under development and marked as experimental.