![]() |
Project
|
Run time compilation is a feature of the GPUReconstruction library, which can recompile the GPU code for HIP and for CUDA at runtime, and apply some optimizations and changes. It is planned to add support for CPU code and OpenCL code in the future.
The changes that can be applied are:
constexpr
optimization: configuration values that are constant during the processing are replaced by constexpr
expressions, which allows the compiler to optimize the code better. Benchmarks in 2024 habe shown 5% performance improvement with CUDA and 2% improvement with HIP.Generally, RTC is enabled via the --RTCenable
flag for the standalone benchmark, or via the GPU_proc_rtc.enable=1
configKeyValue
for O2. For a list of RTC options, please see GPUSettingsList.h.
Caching the output:
--RTCcacheOutput
setting. The folder to store the cache files can be selected via --RTCTECHcacheFolder
and with --RTCTECHcacheMutex
(default: enabled), a file-lock mutex can be used to synchronize access to the cache folder. The cached code is checked against the to-be-compiled source code with SHA1 hashes, and only if the code is not change the cache is used, otherwise the code is recompiled and the cache updated. It is possible to force using outdated cache files via the --RTCTECHignoreCacheValid
option.For chaning the launch bounds and other parameters, please consider --RTCTECHloadLaunchBoundsFromFile
(and --RTCTECHprintLaunchBounds
), which can launch a parameter set which can be created via dumpGPUDefParam.C. A set of default parameters is stored in [INSTALL_FOLDER]/share/GPU
.
It is possible to select a different target architecture for the compilation via --RTCTECHoverrideArchitecture
, and the compilation can be prepended by a command with --RTCTECHprependCommand
, e.g. for CPU pinning. See for example dpl-workflow.sh.
--RTCdeterministic
enables the Deterministic Mode (compile-time setting) for RTC. Usually you don't need to bother, as for the deterministic mode it is autoenabled from --PROCdeterministicGPUReconstruction
, but the explicit --RTCdeterministic
is available for tests.
Finally, --RTCoptConstexpr
and --RTCoptSpecialCode
enable the constexpr and code removal optimizations. For an example how the TPC V/M shape corrections are removed, see TPCFastTransform.h.