Modin Logging#
Modin logging offers users greater insight into their queries by logging internal Modin API calls, partition metadata,
and profiling system memory. When Modin logging is enabled (default disabled), log files are written to a local .modin
directory at the same
directory level as the notebook/script used to run Modin.
The logs generated by Modin Logging will be written to a .modin/logs/job_<uuid>
directory, uniquely named after the job uuid.
The logs that contain the Modin API stack traces are named trace.log
. The logs that contain the memory utilization metrics are
named memory.log
. By default, if any log file exceeds 10MB (configurable with LogFileSize
), that file will be saved and a
separate log file will be created. For instance, if users have 20MB worth of Modin API logs, they can expect to find trace.log.1
and trace.log.2
in the .modin/logs/job_<uuid>
directory. After 10 * LogFileSize
MB or by default 100MB of logs, the logs will
rollover and the original log files beginning with trace.log.1
will be overwritten with the new log lines.
Developer Warning: In some cases, running services like JupyterLab in the modin/modin
directory may result in circular dependency issues.
This is due to a naming conflict between the modin/logging
directory and the Python logging
module, which may be used as a default in
such environments. To resolve this, please run Jupyterlab or other similar services from directories other than modin/modin
.
Usage examples#
In the example below, we enable logging for internal Modin API calls, partition metadata and memory profiling.
We can set the granularity (in seconds) at which the system memory utilization is logged using LogMemoryInterval
.
We can also set the maximum size of the logs (in MBs) using LogFileSize
.
import modin.pandas as pd
from modin.config import LogMode, LogMemoryInterval, LogFileSize
LogMode.enable()
LogMemoryInterval.put(2) # Defaults to 5 seconds, new interval is 2 seconds
LogFileSize.put(5) # Defaults to 10 MB per log file, new size is 5 MB
# User code goes here
Disable Modin logging like so:
import modin.pandas as pd
from modin.config import LogMode
LogMode.disable()
# User code goes here
In Modin the lower-level functionality is logged in debug level, and higher level functionality in info level.
By default when logging is enabled in Modin, both high level and low level functionality are logged.
The below example script could be used to switch between logging all functions vs only logging higher level functions.
Setting logger level to logging.INFO
logs only higher level functions.
import modin.pandas as pd
from modin.logging.config import get_logger
from modin.config import LogMode
import logging
LogMode.enable()
logger = get_logger()
logger.setLevel(logging.INFO) # Replace with logger.setLevel(logging.DEBUG) for lower level logs
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df = pd.concat([df, df])
Debugging from user defined functions:
Warning
When attempting to use Modin logging in user defined functions that execute in workers for logging lower-level operators
as in example below, multiple log directories .modin/logs/job_**
would be created for each worker executing the UDF.
import modin.pandas as pd
def udf(x):
from modin.config import LogMode
LogMode.enable()
return x + 1
modin_df = pd.DataFrame([0, 1, 2, 3])
print(modin_df.map(udf))
So the recommended approach would be to use a different logger as in the below snipet to log from user defined functions that execute on workers. Below is an an example to log from UDF. For this the logger config has to be specified inside the UDF that would execute on a remote worker.
import logging
import modin.pandas as pd
def udf(x):
logging.basicConfig(filename='modin_udf.log', level=logging.INFO)
logging.info("This log message will be written to modin_udf.log ")
# User code goes here
return x + 1
modin_df = pd.DataFrame([0, 1, 2, 3])
print(modin_df.map(udf))