Config Module Overview

Using this module, the user can tune Modin’s behavior. To see all avaliable configs just run python -m modin.config, this command will print all Modin configs with descriptions.

Public API

Potentially, the source of configs can be any, but for now only environment variables are implemented. Any environment variable originate from EnvironmentVariable, which contains most of the config API implementation.

class modin.config.envvars.EnvironmentVariable

Base class for environment variables-based configuration.

classmethod get()

Get config value.

Returns

Decoded and verified config value.

Return type

Any

classmethod get_help() str

Generate user-presentable help for the config.

Returns

Return type

str

classmethod get_value_source()

Get value source of the config.

Returns

Return type

int

classmethod once(onvalue, callback)

Execute callback if config value matches onvalue value.

Otherwise accumulate callbacks associated with the given onvalue in the _once container.

Parameters
  • onvalue (Any) – Config value to set.

  • callback (callable) – Callable that should be executed if config value matches onvalue.

classmethod put(value)

Set config value.

Parameters

value (Any) – Config value to set.

classmethod subscribe(callback)

Add callback to the _subs list and then execute it.

Parameters

callback (callable) – Callable to execute.

Modin Configs List

Config Name

Env. Variable Name

Default Value

Description

Options

AsvDataSizeConfig

MODIN_ASV_DATASIZE_CONFIG

Allows to override default size of data (shapes).

AsvImplementation

MODIN_ASV_USE_IMPL

modin

Allows to select a library that we will use for testing performance.

(‘modin’, ‘pandas’)

Backend

MODIN_BACKEND

Pandas

Engine to run on a single node of distribution.

(‘Pandas’, ‘OmniSci’, ‘Pyarrow’, ‘Cudf’)

BenchmarkMode

MODIN_BENCHMARK_MODE

False

Whether or not to perform computations synchronously.

CpuCount

MODIN_CPUS

2

How many CPU cores to use during initialization of the Modin engine.

DoLogRpyc

MODIN_LOG_RPYC

Whether to gather RPyC logs (applicable for remote context).

DoTraceRpyc

MODIN_TRACE_RPYC

Whether to trace RPyC calls (applicable for remote context).

DoUseCalcite

MODIN_USE_CALCITE

True

Whether to use Calcite for OmniSci queries execution.

Engine

MODIN_ENGINE

Ray

Distribution engine to run queries by.

(‘Ray’, ‘Dask’, ‘Python’, ‘Native’)

GpuCount

MODIN_GPUS

How may GPU devices to utilize across the whole distribution.

IsDebug

MODIN_DEBUG

Force Modin engine to be “Python” unless specified by $MODIN_ENGINE.

IsExperimental

MODIN_EXPERIMENTAL

Whether to Turn on experimental features.

IsRayCluster

MODIN_RAY_CLUSTER

Whether Modin is running on pre-initialized Ray cluster.

Memory

MODIN_MEMORY

How much memory (in bytes) give to an execution engine.

  • In Ray case: the amount of memory to start the Plasma object store with.

  • In Dask case: the amount of memory that is given to each worker depending on CPUs used.

NPartitions

MODIN_NPARTITIONS

2

How many partitions to use for a Modin DataFrame (along each axis).

OmnisciFragmentSize

MODIN_OMNISCI_FRAGMENT_SIZE

How big a fragment in OmniSci should be when creating a table (in rows).

OmnisciLaunchParameters

MODIN_OMNISCI_LAUNCH_PARAMETERS

{‘enable_union’: 1, ‘enable_columnar_output’: 1, ‘enable_lazy_fetch’: 0, ‘null_div_by_zero’: 1, ‘enable_watchdog’: 0}

Additional command line options for the OmniSci engine.

Please visit OmniSci documentation for the description of available parameters: https://docs.omnisci.com/installation-and-configuration/config-parameters#configuration-parameters-for-omniscidb

PersistentPickle

MODIN_PERSISTENT_PICKLE

False

Wheather serialization should be persistent.

ProgressBar

MODIN_PROGRESS_BAR

False

Whether or not to show the progress bar.

RayRedisAddress

MODIN_REDIS_ADDRESS

Redis address to connect to when running in Ray cluster.

RayRedisPassword

MODIN_REDIS_PASSWORD

random string

What password to use for connecting to Redis.

SocksProxy

MODIN_SOCKS_PROXY

SOCKS proxy address if it is needed for SSH to work.

TestDatasetSize

MODIN_TEST_DATASET_SIZE

Dataset size for running some tests.

(‘Small’, ‘Normal’, ‘Big’)

TestRayClient

MODIN_TEST_RAY_CLIENT

False

Set to true to start and connect Ray client before a testing session starts.

TrackFileLeaks

MODIN_TEST_TRACK_FILE_LEAKS

True

Whether to track for open file handles leakage during testing.

Usage Guide

See example of interation with Modin configs below, as it can be seen config value can be set either by setting the environment variable or by using config API.

import os

# Setting `MODIN_BACKEND` environment variable.
# Also can be set outside the script.
os.environ["MODIN_BACKEND"] = "OmniSci"

import modin.config
import modin.pandas as pd

# Checking initially set `Backend` config,
# which corresponds to `MODIN_BACKEND` environment
# variable
print(modin.config.Backend.get()) # prints 'Omnisci'

# Checking default value of `NPartitions`
print(modin.config.NPartitions.get()) # prints '8'

# Changing value of `NPartitions`
modin.config.NPartitions.put(16)
print(modin.config.NPartitions.get()) # prints '16'