Notes on binding etc. for hybrid MPI + OpenMP jobs. Using OpenMPI 4.1.2 and OpenMPI 4.5 here (GCC 9.4 on Ubuntu 20.04 LTS). Also trying to incorporate NUMA/SMT concerns.
OpenMP binding/placement options:
cores/close will assign cores consecutively (socket 0 first, and then socket 1 cores)
cores/spread will round robin across cores. (socket 0/1/0/1…)
- former might be useful if there’s a lot of shared memory traffic
- latter might be useful if there’s competition for memory bandwidth
Linux core numbering in a NUMA/SMT node
(Empirical data, may not be a rule followed across vendors etc)
Cores 0–7: socket 0
Cores 8–15: socket 1
Cores 16:23: socket 0
Cores 24:31: socket 1
So if you want to avoid hyperthreaded cores, just use core #’s 0–15. 16 is hyperthreaded with 0, 17 with 1, 18 with 2, and so on…
If you just want to use 16 physical cores and bind to them, the following would work:
Note that this will actually bind the threads to different physical cores, preventing the OS from moving them subsequently.
— bind-to none