HPCToolkit User Manual#
HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the world’s largest GPU-accelerated supercomputers. HPCToolkit can measure a program’s work, resource consumption, and inefficiency on both CPUs and GPUs (Adhianto et al. 2010; Zhou et al. 2020; Adhianto et al. 2024). HPCToolkit correlates such metrics with the program’s source code, works with multilingual, fully optimized binaries, has very low measurement overhead, and scales to large parallel systems. HPCToolkit’s measurements provide support for analyzing a program execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system.
- Introduction
- HPCToolkit Overview
- Quick Start
- Installing HPCToolkit with Spack
- Building from Source with Meson
- Effective Strategies for Analyzing Program Performance
- Monitoring Dynamically-linked Applications with
hpcrun - Monitoring MPI Applications
- Measurement and Analysis of GPU-accelerated Applications
- Measurement and Analysis of OpenMP Multithreading
- Hpcviewer
- Overview
- Profile View
- Trace view
- Accessing Remote Databases
- Known Issues
- No support for CUDA 13
- Using Level Zero, time may be observed as non-monotonic
- When monitoring applications that use ROCm using LD_AUDIT in
hpcrunmay cause it to fail to elide OpenMP runtime frames - When using Intel GPUs,
hpcrunmay report that substantial time is spent in a partial call path consisting of only an unknown procedure hpcrunreports partial call paths for code executed by a constructor prior to entering mainhpcrunmay fail to measure a program execution on a CPU with hardware performance counters- hpcrun may associate several profiles and traces with rank 0, thread 0
hpcrunsometimes enables writing of read-only data- A confusing label for GPU theoretical occupancy
- FAQ and Troubleshooting
- Environment Variables
- Getting Help