IO module Description For ExperimentalPandasOnUnidist Execution#
High-Level Module Overview#
This module houses experimental functionality with pandas storage format and Unidist
engine. This functionality is concentrated in the ExperimentalPandasOnUnidistIO
class, that contains methods, which extend typical pandas API to give user
more flexibility with IO operations.
Usage Guide#
In order to use the experimental features, just modify standard Modin import statement as follows:
# import modin.pandas as pd
import modin.experimental.pandas as pd
Submodules Description#
The modin.experimental.core.execution.unidist.implementations.pandas_on_unidist
module primarily houses utils and
functions for the experimental IO class:
io.py
- submodule containing IO class and parse functions, which are responsible for data processing on the workers.
Public API#
- class modin.experimental.core.execution.unidist.implementations.pandas_on_unidist.io.io.ExperimentalPandasOnUnidistIO#
Class for handling experimental IO functionality with pandas storage format and unidist engine.
ExperimentalPandasOnUnidistIO
inherits some util functions and unmodified IO functions fromPandasOnUnidistIO
class.- classmethod read_csv_glob(*args, **kwargs)#
Read data according passed args and kwargs.
- Parameters
*args (iterable) – Positional arguments to be passed into _read function.
**kwargs (dict) – Keywords arguments to be passed into _read function.
- Returns
query_compiler – Query compiler with imported data for further processing.
- Return type
Notes
read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.
- classmethod read_custom_text(*args, **kwargs)#
Read data according passed args and kwargs.
- Parameters
*args (iterable) – Positional arguments to be passed into _read function.
**kwargs (dict) – Keywords arguments to be passed into _read function.
- Returns
query_compiler – Query compiler with imported data for further processing.
- Return type
Notes
read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.
- classmethod read_pickle_distributed(*args, **kwargs)#
Read data according passed args and kwargs.
- Parameters
*args (iterable) – Positional arguments to be passed into _read function.
**kwargs (dict) – Keywords arguments to be passed into _read function.
- Returns
query_compiler – Query compiler with imported data for further processing.
- Return type
Notes
read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.
- classmethod read_sql(*args, **kwargs)#
Read data according passed args and kwargs.
- Parameters
*args (iterable) – Positional arguments to be passed into _read function.
**kwargs (dict) – Keywords arguments to be passed into _read function.
- Returns
query_compiler – Query compiler with imported data for further processing.
- Return type
Notes
read is high-level function that calls specific for defined storage format, engine and dispatcher class _read function with passed parameters and performs some postprocessing work on the resulting query_compiler object.
- classmethod to_pickle_distributed(qc, **kwargs)#
When * is in the filename, all partitions are written to their own separate file.
The filenames is determined as follows: - if * is in the filename, then it will be replaced by the ascending sequence 0, 1, 2, … - if * is not in the filename, then the default implementation will be used.
Example: 4 partitions and input filename=”partition*.pkl.gz”, then filenames will be: partition0.pkl.gz, partition1.pkl.gz, partition2.pkl.gz, partition3.pkl.gz.
- Parameters
qc (BaseQueryCompiler) – The query compiler of the Modin dataframe that we want to run
to_pickle_distributed
on.**kwargs (dict) – Parameters for
pandas.to_pickle(**kwargs)
.