Modin Configuration Settings#
To adjust Modin’s default behavior, you can set the value of Modin
configs by setting an environment variable or by using the
modin.config
API. To list all available configs in Modin, please
run python -m modin.config
to print all
Modin configs with descriptions.
Public API#
Potentially, the source of configs can be any, but for now only environment
variables are implemented. Any environment variable originate from
EnvironmentVariable
, which contains most of
the config API implementation.
- class modin.config.envvars.EnvironmentVariable#
Base class for environment variables-based configuration.
- classmethod get() Any #
Get config value.
- Returns
Decoded and verified config value.
- Return type
Any
- classmethod get_help() str #
Generate user-presentable help for the config.
- Return type
str
- classmethod get_value_source() ValueSource #
Get value source of the config.
- Return type
ValueSource
- classmethod once(onvalue: Any, callback: Callable) None #
Execute callback if config value matches onvalue value.
Otherwise accumulate callbacks associated with the given onvalue in the _once container.
- Parameters
onvalue (Any) – Config value to set.
callback (callable) – Callable that should be executed if config value matches onvalue.
- classmethod put(value: Any) None #
Set config value.
- Parameters
value (Any) – Config value to set.
- classmethod subscribe(callback: Callable) None #
Add callback to the _subs list and then execute it.
- Parameters
callback (callable) – Callable to execute.
Modin Configs List#
Config Name |
Env. Variable Name |
Default Value |
Description |
Options |
---|---|---|---|---|
AsvDataSizeConfig |
MODIN_ASV_DATASIZE_CONFIG |
Allows to override default size of data (shapes). |
||
AsvImplementation |
MODIN_ASV_USE_IMPL |
modin |
Allows to select a library that we will use for testing performance. |
(‘modin’, ‘pandas’) |
AsyncReadMode |
MODIN_ASYNC_READ_MODE |
False |
It does not wait for the end of reading information from the source. Can break situations when reading occurs in a context, when exiting from which the source is deleted. |
|
BenchmarkMode |
MODIN_BENCHMARK_MODE |
False |
Whether or not to perform computations synchronously. |
|
CIAWSAccessKeyID |
AWS_ACCESS_KEY_ID |
foobar_key |
Set to AWS_ACCESS_KEY_ID when running mock S3 tests for Modin in GitHub CI. |
|
CIAWSSecretAccessKey |
AWS_SECRET_ACCESS_KEY |
foobar_secret |
Set to AWS_SECRET_ACCESS_KEY when running mock S3 tests for Modin in GitHub CI. |
|
CpuCount |
MODIN_CPUS |
multiprocessing.cpu_count() |
How many CPU cores to use during initialization of the Modin engine. |
|
DoLogRpyc |
MODIN_LOG_RPYC |
Whether to gather RPyC logs (applicable for remote context). |
||
DoTraceRpyc |
MODIN_TRACE_RPYC |
Whether to trace RPyC calls (applicable for remote context). |
||
DoUseCalcite |
MODIN_USE_CALCITE |
True |
Whether to use Calcite for HDK queries execution. |
|
Engine |
MODIN_ENGINE |
Ray |
Distribution engine to run queries by. |
(‘Ray’, ‘Dask’, ‘Python’, ‘Native’, ‘Unidist’) |
ExperimentalGroupbyImpl |
MODIN_EXPERIMENTAL_GROUPBY |
False |
Set to true to use Modin’s experimental group by implementation. Experimental groupby is implemented using a range-partitioning technique, note that it may not always work better than the original Modin’s TreeReduce and FullAxis implementations. For more information visit the according section of Modin’s documentation: TODO: add a link to the section once it’s written. |
|
ExperimentalNumPyAPI |
MODIN_EXPERIMENTAL_NUMPY_API |
False |
Set to true to use Modin’s experimental NumPy API. |
|
GithubCI |
MODIN_GITHUB_CI |
False |
Set to true when running Modin in GitHub CI. |
|
GpuCount |
MODIN_GPUS |
How may GPU devices to utilize across the whole distribution. |
||
HdkFragmentSize |
MODIN_HDK_FRAGMENT_SIZE |
How big a fragment in HDK should be when creating a table (in rows). |
||
HdkLaunchParameters |
MODIN_HDK_LAUNCH_PARAMETERS |
{‘enable_union’: 1, ‘enable_columnar_output’: 1, ‘enable_lazy_fetch’: 0, ‘null_div_by_zero’: 1, ‘enable_watchdog’: 0, ‘enable_thrift_logs’: 0, ‘enable_multifrag_execution_result’: 1, ‘cpu_only’: 1, ‘enable_lazy_dict_materialization’: 0, ‘log_dir’: ‘pyhdk_log’} |
Additional command line options for the HDK engine. Please visit OmniSci documentation for the description of available parameters: https://docs.omnisci.com/installation-and-configuration/config-parameters#configuration-parameters-for-omniscidb |
|
IsDebug |
MODIN_DEBUG |
Force Modin engine to be “Python” unless specified by $MODIN_ENGINE. |
||
IsExperimental |
MODIN_EXPERIMENTAL |
Whether to Turn on experimental features. |
||
IsRayCluster |
MODIN_RAY_CLUSTER |
Whether Modin is running on pre-initialized Ray cluster. |
||
LogFileSize |
MODIN_LOG_FILE_SIZE |
10 |
Max size of logs (in MBs) to store per Modin job. |
|
LogMemoryInterval |
MODIN_LOG_MEMORY_INTERVAL |
5 |
Interval (in seconds) to profile memory utilization for logging. |
|
LogMode |
MODIN_LOG_MODE |
disable |
Set |
(‘enable’, ‘disable’, ‘enable_api_only’) |
Memory |
MODIN_MEMORY |
How much memory (in bytes) give to an execution engine. Notes:
|
||
MinPartitionSize |
MODIN_MIN_PARTITION_SIZE |
32 |
Minimum number of rows/columns in a single pandas partition split. Once a partition for a pandas dataframe has more than this many elements, Modin adds another partition. |
|
NPartitions |
MODIN_NPARTITIONS |
equals to MODIN_CPUS env |
How many partitions to use for a Modin DataFrame (along each axis). |
|
PersistentPickle |
MODIN_PERSISTENT_PICKLE |
False |
Whether serialization should be persistent. |
|
ProgressBar |
MODIN_PROGRESS_BAR |
False |
Whether or not to show the progress bar. |
|
RayRedisAddress |
MODIN_REDIS_ADDRESS |
Redis address to connect to when running in Ray cluster. |
||
RayRedisPassword |
MODIN_REDIS_PASSWORD |
random string |
What password to use for connecting to Redis. |
|
ReadSqlEngine |
MODIN_READ_SQL_ENGINE |
Pandas |
Engine to run read_sql. |
(‘Pandas’, ‘Connectorx’) |
SocksProxy |
MODIN_SOCKS_PROXY |
SOCKS proxy address if it is needed for SSH to work. |
||
StorageFormat |
MODIN_STORAGE_FORMAT |
Pandas |
Engine to run on a single node of distribution. |
(‘Pandas’, ‘Hdk’, ‘Pyarrow’, ‘Cudf’) |
TestDatasetSize |
MODIN_TEST_DATASET_SIZE |
Dataset size for running some tests. |
(‘Small’, ‘Normal’, ‘Big’) |
|
TestRayClient |
MODIN_TEST_RAY_CLIENT |
False |
Set to true to start and connect Ray client before a testing session starts. |
|
TestReadFromPostgres |
MODIN_TEST_READ_FROM_POSTGRES |
False |
Set to true to test reading from Postgres. |
|
TestReadFromSqlServer |
MODIN_TEST_READ_FROM_SQL_SERVER |
False |
Set to true to test reading from SQL server. |
|
TrackFileLeaks |
MODIN_TEST_TRACK_FILE_LEAKS |
True |
Whether to track for open file handles leakage during testing. |
Usage Guide#
See example of interaction with Modin configs below, as it can be seen config value can be set either by setting the environment variable or by using config API.
import os
# Setting `MODIN_STORAGE_FORMAT` environment variable.
# Also can be set outside the script.
os.environ["MODIN_STORAGE_FORMAT"] = "Hdk"
import modin.config
import modin.pandas as pd
# Checking initially set `StorageFormat` config,
# which corresponds to `MODIN_STORAGE_FORMAT` environment
# variable
print(modin.config.StorageFormat.get()) # prints 'Hdk'
# Checking default value of `NPartitions`
print(modin.config.NPartitions.get()) # prints '8'
# Changing value of `NPartitions`
modin.config.NPartitions.put(16)
print(modin.config.NPartitions.get()) # prints '16'