Profiling MPI with Perf

Ankush Jain
1 min readSep 5, 2023

The goal here is to profile MPI-based applications. The key difference is that we would also like to profile MPI itself. Most high-level HPC tracing and profiling tools stop at MPI-level calls.

So we use perf. perf does not care about MPI or not — it will profile the entire call path, down to kernel-level stuff. These are notes on how to get it right for MPI.

  1. Compile application with -g -fno-omit-frame-pointer
  2. Run with mpirun -n 4 bash -c 'perf record -g --call-graph=dwarf -o perf.out.$PMI_RANK ./a.out'
  3. Inspect with perf report -n -i perf.out.0

Even though we used -fno-omit-frame-pointer , I could not get the binary symbols unless I used call-graph=dwarf . I have no explanation for this. Maybe mpicc forces some optimization flags upon us? (I’m using mvapich 2.3.6).

Now ideally, we’d like to be able to profile only certain regions to keep things under control. Apparently the way to do that is using control file descriptors that are managed programmatically.

https://stackoverflow.com/questions/74340680/perf-record-after-my-code-reaches-a-certain-point

--

--