![]() |
Project
|
The TPC tracking code is not fully deterministic, i.e. running multiple times on the same data set might yield a slightly different number of tracks on the O(per mille) level.
For debugging, testing, and validation, a deterministic mode is implemented, which should yield 100% reproducible results, on CPU and on GPU and when running multiple times. It uses a combination of
This is steered by 3 options:
-DGPUCA_DETERMINISTIC_MODE
Cmake setting : Compile-time setting.--PROCdeterministicGPUReconstruction
command line option / GPU_proc.deterministicGPUReconstruction
--configKeyValue
setting : Run time setting.--RTCdeterministic
command line option / GPU_proc_rtc.deterministic
--configKeyValue
setting. (Auto-enabled by the deterministicGPUReconstruction
setting.) : Compile-time setting for RTC code.Note that enabling a single setting will not result in fully deterministic behavior! Each setting enables different deterministic aspects! In order to be fully deterministic, all settings must be enabled, where the RTC setting is automatically enabled if not explicitly disabled.
GPUCA_DETERMINISTIC_MODE
has multiple levels, which are described here: FindO2GPU.cmake.
GPUCA_DETERMINISTIC_MODE=GPU
is needed.GPUCA_DETERMINISTIC_MODE=WHOLEO2
Enabling the options is a bit different for O2 and for the standalone benchmark:
--PROCdeterministicGPUReconstruction 1
.set(GPUCA_DETERMINISTIC_MODE GPU)
to the beginning of the GPU CMakeLists.txt or add set(GPUCA_DETERMINISTIC_MODE WHOLEO2)
to the beginning of the Global CMakeLists.txt, and use the configKeyValue
GPU_proc.deterministicGPUReconstruction
. In order to enable this for the Full-System-Test or with dpl-workflow.sh, please export CONFIG_EXTRA_PROCESS_o2_gpu_reco_workflow=GPU_proc.deterministicGPUReconstruction=1;
.With these settings, if one runs multiple times, the number of clusters and number of tracks should be always fully identical. Note that this yields a significant performance penalty during the processing, therefore the deterministic mode is not compiled in by default, but it must be enabled explicitly and code must be recompiled.
Beyond comparing only the number of clusters and number of tracks, it is also possible to compare intermediate results. To do so, please use the standalone benchmark (either ./ca
or o2-gpu-standalone-benchmark
binary) with the --debug 6
option. It will create a dump container all (most) intermediate results in text form, which can be compared. The output files is called CPU.out
if using the CPU backend, and GPU.out
for the GPU backend. Note that the dump files will be huge and the processing will be slow and consume much more memory than normal with --debug 6 . It has been tested with datasets containing up to 50 Pb-Pb collisions, and might fail for larger data. The dump files (if the deterministic mode is used with both compile- and runtime-activation), the files should be 100% identical and can just be compared with
diff`.