HDK#

This section describes usage related documents for the HDK-based engine of Modin.

This engine uses the HDK library to obtain high single-node scalability for specific set of dataframe operations. To enable this engine you can set the following environment variable:

export MODIN_STORAGE_FORMAT=hdk

or use it in your code:

import modin.config as cfg
cfg.StorageFormat.put('hdk')

Since HDK is run through its native engine, Modin automatically sets MODIN_ENGINE=Native and you might not specify it explicitly. If for some reasons Native engine is explicitly set using modin.config or MODIN_ENGINE environment variable, make sure you also tell Modin that Experimental mode is turned on (export MODIN_EXPERIMENTAL=true or cfg.IsExperimental.put(True)) otherwise the following error occurs:

FactoryNotFoundError: HDK on Native is only accessible through the experimental API.
Run `import modin.experimental.pandas as pd` to use HDK on Native.

Note

If you encounter LLVM ERROR: inconsistency in registered CommandLine options error when using HDK, please refer to the respective section in Troubleshooting page to avoid the issue.

Running on a GPU#

Prerequisites:

  • HDK’s GPU mode is currently supported on Linux and Intel GPU only.

  • HDK supports Gen9 architecture and higher (including Xe & Arc).

  • HDK’s GPU mode requires proper driver installation. Follow this guide to set up your system. Make sure to install the compute runtime packages: intel-opencl-icd, intel-level-zero-gpu, level-zero.

  • Make sure your GPU is visible and accessible.

Note

You can use hwinfo and clinfo utilities to verify the driver installation and device accessibility.

HDK supports a heterogeneous execution mode (experimental) that is disabled by default in Modin. Starting with pyHDK version 0.7 Modin can run the workload on Intel GPU. Run on a GPU via MODIN_HDK_LAUNCH_PARAMETERS="cpu_only=0" python <your-script.py>.