IO module Description For ExperimentalPandasOnUnidist Execution#

High-Level Module Overview#

This module houses experimental functionality with pandas storage format and Unidist engine. This functionality is concentrated in the ExperimentalPandasOnUnidistIO class, that contains methods, which extend typical pandas API to give user more flexibility with IO operations.

Usage Guide#

In order to use the experimental features, just modify standard Modin import statement as follows:

# import modin.pandas as pd
import modin.experimental.pandas as pd

Submodules Description#

The modin.experimental.core.execution.unidist.implementations.pandas_on_unidist module primarily houses utils and functions for the experimental IO class:

  • io.py - submodule containing IO class and parse functions, which are responsible for data processing on the workers.

Public API#

class modin.experimental.core.execution.unidist.implementations.pandas_on_unidist.io.io.ExperimentalPandasOnUnidistIO#

Class for handling experimental IO functionality with pandas storage format and unidist engine.

ExperimentalPandasOnUnidistIO inherits some util functions and unmodified IO functions from PandasOnUnidistIO class.

classmethod read_csv_glob(*args, **kwargs)#

Read data according passed args and kwargs.

Parameters
  • *args (iterable) – Positional arguments to be passed into _read function.

  • **kwargs (dict) – Keywords arguments to be passed into _read function.

Returns

query_compiler – Query compiler with imported data for further processing.

Return type

BaseQueryCompiler

Notes

read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.

classmethod read_custom_text(*args, **kwargs)#

Read data according passed args and kwargs.

Parameters
  • *args (iterable) – Positional arguments to be passed into _read function.

  • **kwargs (dict) – Keywords arguments to be passed into _read function.

Returns

query_compiler – Query compiler with imported data for further processing.

Return type

BaseQueryCompiler

Notes

read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.

classmethod read_pickle_distributed(*args, **kwargs)#

Read data according passed args and kwargs.

Parameters
  • *args (iterable) – Positional arguments to be passed into _read function.

  • **kwargs (dict) – Keywords arguments to be passed into _read function.

Returns

query_compiler – Query compiler with imported data for further processing.

Return type

BaseQueryCompiler

Notes

read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.

classmethod read_sql(*args, **kwargs)#

Read data according passed args and kwargs.

Parameters
  • *args (iterable) – Positional arguments to be passed into _read function.

  • **kwargs (dict) – Keywords arguments to be passed into _read function.

Returns

query_compiler – Query compiler with imported data for further processing.

Return type

BaseQueryCompiler

Notes

read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.

classmethod to_pickle_distributed(qc, **kwargs)#

When * is in the filename, all partitions are written to their own separate file.

The filenames is determined as follows: - if * is in the filename, then it will be replaced by the ascending sequence 0, 1, 2, … - if * is not in the filename, then the default implementation will be used.

Example: 4 partitions and input filename=”partition*.pkl.gz”, then filenames will be: partition0.pkl.gz, partition1.pkl.gz, partition2.pkl.gz, partition3.pkl.gz.

Parameters
  • qc (BaseQueryCompiler) – The query compiler of the Modin dataframe that we want to run to_pickle_distributed on.

  • **kwargs (dict) – Parameters for pandas.to_pickle(**kwargs).