Pandas Parsers Module Description

High-Level Module Overview

This module houses parser classes (classes that are used for data parsing on the workers) and util functions for handling parsing results. PandasParser is base class for parser classes with pandas storage format, that contains methods common for all child classes. Other module classes implement parse function that performs parsing of specific format data basing on the chunk information computed in the module. After chunk data parsing is completed, resulting DataFrame-s will be splitted into smaller DataFrame-s according to num_splits parameter, data type and number or rows/columns in the parsed chunk, and then these frames and some additional metadata will be returned.


If you are interested in the data parsing mechanism implementation details, please refer to the source code documentation.

Public API

Module houses Modin parser classes, that are used for data parsing on the workers.


Data parsing mechanism differs depending on the data format type:

  • text format type (CSV, EXCEL, FWF, JSON): File parsing begins from retrieving start and end parameters from parse kwargs - these parameters define start and end bytes of data file, that should be read in the concrete partition. Using this data and file handle got from fname, binary data is read by python read function. Then resulting data is passed into pandas.read_* function as io.BytesIO object to get corresponding pandas.DataFrame (we need to do this because Modin partitions internally stores data as pandas.DataFrame).

  • columnar store type (FEATHER, HDF, PARQUET): In this case data chunk to be read is defined by columns names passed as columns parameter as part of parse kwargs, so no additional action is needed and fname and kwargs are just passed into pandas.read_* function (in some corner cases pyarrow.read_* function can be used).

  • SQL type: Chunking is incorporated in the sql parameter as part of query, so parse parameters are passed into pandas.read_sql function without modification.

class modin.core.storage_formats.pandas.parsers.PandasCSVGlobParser

Class for handling multiple CSV files simultaneously on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(chunks, **kwargs)

Parse data on the workers.

  • chunks (list) – List, where each element of the list is a list of tuples. The inner lists of tuples contains the data file name of the chunk, chunk start offset, and chunk end offsets for its corresponding file.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasCSVParser

Class for handling CSV files on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasExcelParser

Class for handling excel files on the workers using pandas storage format.

Inherits common functions from PandasParser class.

classmethod get_sheet_data(sheet, convert_float)

Get raw data from the excel sheet.

  • sheet (openpyxl.worksheet.worksheet.Worksheet) – Sheet to get data from.

  • convert_float (bool) – Whether to convert floats to ints or not.


List with sheet data.

Return type


static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasFWFParser

Class for handling tables with fixed-width formatted lines on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasFeatherParser

Class for handling FEATHER files on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str, path object or file-like object) – Name of the file, path or file-like object to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasHDFParser

Class for handling HDF data on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str, path object, pandas.HDFStore or file-like object) – Name of the file, path pandas.HDFStore or file-like object to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasJSONParser

Class for handling JSON files on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasParquetParser

Class for handling PARQUET data on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasParser

Base class for parser classes with pandas storage format.

static generic_parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


classmethod get_dtypes(dtypes_ids)

Get common for all partitions dtype for each of the columns.


dtypes_ids (list) – Array with references to the partitions dtypes objects.


frame_dtypes – Resulting dtype or pandas.Series where column names are used as index and types of columns are used as values for full resulting frame.

Return type

pandas.Series or dtype

infer_compression(compression: str | None) str | None

Get the compression method for filepath_or_buffer. If compression=’infer’, the inferred compression method is returned. Otherwise, the input compression method is returned unchanged, unless it’s invalid, in which case an error is raised.

  • filepath_or_buffer (str or file handle) – File path or object.

  • compression (str or dict, default 'infer') –

    For on-the-fly compression of the output data. If ‘infer’ and ‘filepath_or_buffer’ path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, or ‘.zst’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, or zstandard.ZstdDecompressor, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

    Changed in version 1.4.0: Zstandard support.

Return type

string or None


ValueError on invalid compression specified.

classmethod single_worker_read(fname, **kwargs)

Perform reading by single worker (default-to-pandas implementation).

  • fname (str, path object or file-like object) – Name of the file or file-like object to read.

  • **kwargs (dict) – Keywords arguments to be passed into read_* function.


  • BaseQueryCompiler or

  • dict or

  • – Object with imported data (or with reference to data) for furher processing, object type depends on the child class parse function result type.

class modin.core.storage_formats.pandas.parsers.PandasPickleExperimentalParser

Class for handling pickled pandas objects on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(fname, **kwargs)

Parse data on the workers.

  • fname (str or path object) – Name of the file or path to read.

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type


class modin.core.storage_formats.pandas.parsers.PandasSQLParser

Class for handling SQL queries or tables on the workers using pandas storage format.

Inherits common functions from PandasParser class.

static parse(sql, con, index_col, **kwargs)

Parse data on the workers.

  • sql (str or SQLAlchemy Selectable (select or text object)) – SQL query to be executed or a table name.

  • con (SQLAlchemy connectable, str, or sqlite3 connection) – Connection object to database.

  • index_col (str or list of str) – Column(s) to set as index(MultiIndex).

  • **kwargs (dict) – Keywords arguments to be used by parse function or passed into read_* function.


List with splitted parse results and it’s metadata (index, dtypes, etc.).

Return type



Find a common data type among the given dtypes.


types (array-like) – Array of dtypes.


  • pandas.core.dtypes.dtypes.ExtensionDtype or

  • np.dtype or

  • Nonedtype that is common for all passed types.