CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications

SC19 Proceedings

CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications

Workshop: CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications

Abstract: GPUs are powerful, massively parallel processors, which require a vast amount of thread parallelism to keep their thousands of execution units busy, and to tolerate latency when accessing its high-throughput memory system. Understanding the behavior of massively threaded GPU programs can be difficult, even though recent GPUs provide an abundance of hardware performance counters, which collect statistics about certain events. Profiling tools that assist the user in such analysis for their GPUs, like NVIDIA's nvprof and cupti, are state-of-the-art. However, instrumentation based on reading hardware performance counters can be slow, in particular when the number of metrics is large. Furthermore, the results can be inaccurate as instructions are grouped to match the available set of hardware counters.

Back to The 10th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS19) Archive Listing

Back to Full Workshop Archive Listing