Modin Logging#

Modin logging offers users greater insight into their queries by logging internal Modin API calls, partition metadata, and profiling system memory. When Modin logging is enabled (default disabled), log files are written to a local .modin directory at the same directory level as the notebook/script used to run Modin.

The logs generated by Modin Logging will be written to a .modin/logs/job_<uuid> directory, uniquely named after the job uuid. The logs that contain the Modin API stack traces are named trace.log. The logs that contain the memory utilization metrics are named memory.log. By default, if any log file exceeds 10MB (configurable with LogFileSize), that file will be saved and a separate log file will be created. For instance, if users have 20MB worth of Modin API logs, they can expect to find trace.log.1 and trace.log.2 in the .modin/logs/job_<uuid> directory. After 10 * LogFileSize MB or by default 100MB of logs, the logs will rollover and the original log files beginning with trace.log.1 will be overwritten with the new log lines.

Developer Warning: In some cases, running services like JupyterLab in the modin/modin directory may result in circular dependency issues. This is due to a naming conflict between the modin/logging directory and the Python logging module, which may be used as a default in such environments. To resolve this, please run Jupyterlab or other similar services from directories other than modin/modin.

Usage examples#

In the example below, we enable logging for internal Modin API calls, partition metadata and memory profiling. We can set the granularity (in seconds) at which the system memory utilization is logged using LogMemoryInterval. We can also set the maximum size of the logs (in MBs) using LogFileSize.

import modin.pandas as pd
from modin.config import LogMode, LogMemoryInterval, LogFileSize
LogMode.enable()
LogMemoryInterval.put(2) # Defaults to 5 seconds, new interval is 2 seconds
LogFileSize.put(5) # Defaults to 10 MB per log file, new size is 5 MB

# User code goes here

Disable Modin logging like so:

import modin.pandas as pd
from modin.config import LogMode
LogMode.disable()

# User code goes here

In Modin the lower-level functionality is logged in debug level, and higher level functionality in info level. By default when logging is enabled in Modin, both high level and low level functionality are logged. The below example script could be used to switch between logging all functions vs only logging higher level functions. Setting logger level to logging.INFO logs only higher level functions.

import modin.pandas as pd
from modin.logging.config import get_logger
from modin.config import LogMode
import logging
LogMode.enable()
logger = get_logger()
logger.setLevel(logging.INFO) # Replace with logger.setLevel(logging.DEBUG)  for lower level logs
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df = pd.concat([df, df])

Debugging from user defined functions:

Warning

When attempting to use Modin logging in user defined functions that execute in workers for logging lower-level operators as in example below, multiple log directories .modin/logs/job_** would be created for each worker executing the UDF.

import modin.pandas as pd

def udf(x):
    from modin.config import LogMode

    LogMode.enable()

    return x + 1

modin_df = pd.DataFrame([0, 1, 2, 3])
print(modin_df.map(udf))

So the recommended approach would be to use a different logger as in the below snipet to log from user defined functions that execute on workers. Below is an an example to log from UDF. For this the logger config has to be specified inside the UDF that would execute on a remote worker.

import logging
import modin.pandas as pd

def udf(x):
    logging.basicConfig(filename='modin_udf.log', level=logging.INFO)
    logging.info("This log message will be written to modin_udf.log ")

    # User code goes here
    return x + 1

modin_df = pd.DataFrame([0, 1, 2, 3])
print(modin_df.map(udf))