SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 32: OSU INAM: A Profiling and Visualization Tool for Scalable and In-Depth Analysis of High-Performance GPU-Enabled HPC Clusters​

Student: Pouya Kousha (Ohio State University)
Supervisor: Dhabaleswar Panda (Ohio State University)

Abstract: The lack of low-overhead and scalable monitoring tools have prevented a comprehensive study of efficiency and utilization of emerging NVLink-enabled GPU clusters. We address this by proposing and designing an in-depth, real-time analysis, profiling, and visualization tool for high-performance GPU-enabled clusters with NVLinks on the top of the OSU INAM. The proposed tool is capable of presenting a unified and holistic view of MPI-level and fabric level information for emerging NVLink-enabled high-performance GPU clusters. It also provides insights into the efficiency and utilization of underlying interconnects for different communication patterns. We also designed a low overhead and scalable modules to discover the fabric topology and gather fabric metrics by using different levels of threading, bulk insertions and deletions for storage, and using parallel components for fabric discovery and port metric inquiry.

ACM-SRC Semi-Finalist: no

Poster: PDF
Poster Summary: PDF

Back to Poster Archive Listing