Modin Configuration Settings

To adjust Modin’s default behavior, you can set the value of Modin configs by setting an environment variable or by using the modin.config API. To list all avaliable configs in Modin, please run python -m modin.config to print all Modin configs with descriptions.

Public API

Potentially, the source of configs can be any, but for now only environment variables are implemented. Any environment variable originate from EnvironmentVariable, which contains most of the config API implementation.

class modin.config.envvars.EnvironmentVariable

Base class for environment variables-based configuration.

classmethod get()

Get config value.

Returns

Decoded and verified config value.

Return type

Any

classmethod get_help() str

Generate user-presentable help for the config.

Return type

str

classmethod get_value_source()

Get value source of the config.

Return type

int

classmethod once(onvalue, callback)

Execute callback if config value matches onvalue value.

Otherwise accumulate callbacks associated with the given onvalue in the _once container.

Parameters
  • onvalue (Any) – Config value to set.

  • callback (callable) – Callable that should be executed if config value matches onvalue.

classmethod put(value)

Set config value.

Parameters

value (Any) – Config value to set.

classmethod subscribe(callback)

Add callback to the _subs list and then execute it.

Parameters

callback (callable) – Callable to execute.

Modin Configs List

Config Name

Env. Variable Name

Default Value

Description

Options

AsvDataSizeConfig

MODIN_ASV_DATASIZE_CONFIG

Allows to override default size of data (shapes).

AsvImplementation

MODIN_ASV_USE_IMPL

modin

Allows to select a library that we will use for testing performance.

(‘modin’, ‘pandas’)

BenchmarkMode

MODIN_BENCHMARK_MODE

False

Whether or not to perform computations synchronously.

CpuCount

MODIN_CPUS

2

How many CPU cores to use during initialization of the Modin engine.

DoLogRpyc

MODIN_LOG_RPYC

Whether to gather RPyC logs (applicable for remote context).

DoTraceRpyc

MODIN_TRACE_RPYC

Whether to trace RPyC calls (applicable for remote context).

DoUseCalcite

MODIN_USE_CALCITE

True

Whether to use Calcite for OmniSci queries execution.

Engine

MODIN_ENGINE

Ray

Distribution engine to run queries by.

(‘Ray’, ‘Dask’, ‘Python’, ‘Native’)

GpuCount

MODIN_GPUS

How may GPU devices to utilize across the whole distribution.

IsDebug

MODIN_DEBUG

Force Modin engine to be “Python” unless specified by $MODIN_ENGINE.

IsExperimental

MODIN_EXPERIMENTAL

Whether to Turn on experimental features.

IsRayCluster

MODIN_RAY_CLUSTER

Whether Modin is running on pre-initialized Ray cluster.

Memory

MODIN_MEMORY

How much memory (in bytes) give to an execution engine.

  • In Ray case: the amount of memory to start the Plasma object store with.

  • In Dask case: the amount of memory that is given to each worker depending on CPUs used.

MinPartitionSize

MODIN_MIN_PARTITION_SIZE

32

Minimum number of rows/columns in a single pandas partition split.

Once a partition for a pandas dataframe has more than this many elements, Modin adds another partition.

NPartitions

MODIN_NPARTITIONS

2

How many partitions to use for a Modin DataFrame (along each axis).

OmnisciFragmentSize

MODIN_OMNISCI_FRAGMENT_SIZE

How big a fragment in OmniSci should be when creating a table (in rows).

OmnisciLaunchParameters

MODIN_OMNISCI_LAUNCH_PARAMETERS

{‘enable_union’: 1, ‘enable_columnar_output’: 1, ‘enable_lazy_fetch’: 0, ‘null_div_by_zero’: 1, ‘enable_watchdog’: 0, ‘enable_thrift_logs’: 0}

Additional command line options for the OmniSci engine.

Please visit OmniSci documentation for the description of available parameters: https://docs.omnisci.com/installation-and-configuration/config-parameters#configuration-parameters-for-omniscidb

PersistentPickle

MODIN_PERSISTENT_PICKLE

False

Wheather serialization should be persistent.

ProgressBar

MODIN_PROGRESS_BAR

False

Whether or not to show the progress bar.

RayRedisAddress

MODIN_REDIS_ADDRESS

Redis address to connect to when running in Ray cluster.

RayRedisPassword

MODIN_REDIS_PASSWORD

random string

What password to use for connecting to Redis.

SocksProxy

MODIN_SOCKS_PROXY

SOCKS proxy address if it is needed for SSH to work.

StorageFormat

MODIN_STORAGE_FORMAT

Pandas

Engine to run on a single node of distribution.

(‘Pandas’, ‘OmniSci’, ‘Pyarrow’, ‘Cudf’)

TestDatasetSize

MODIN_TEST_DATASET_SIZE

Dataset size for running some tests.

(‘Small’, ‘Normal’, ‘Big’)

TestRayClient

MODIN_TEST_RAY_CLIENT

False

Set to true to start and connect Ray client before a testing session starts.

TrackFileLeaks

MODIN_TEST_TRACK_FILE_LEAKS

True

Whether to track for open file handles leakage during testing.

Usage Guide

See example of interaction with Modin configs below, as it can be seen config value can be set either by setting the environment variable or by using config API.

import os

# Setting `MODIN_STORAGE_FORMAT` environment variable.
# Also can be set outside the script.
os.environ["MODIN_STORAGE_FORMAT"] = "OmniSci"

import modin.config
import modin.pandas as pd

# Checking initially set `StorageFormat` config,
# which corresponds to `MODIN_STORAGE_FORMAT` environment
# variable
print(modin.config.StorageFormat.get()) # prints 'Omnisci'

# Checking default value of `NPartitions`
print(modin.config.NPartitions.get()) # prints '8'

# Changing value of `NPartitions`
modin.config.NPartitions.put(16)
print(modin.config.NPartitions.get()) # prints '16'