SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks

Authors: Sudheer Chunduri (Argonne National Laboratory), Taylor Groves (Lawrence Berkeley National Laboratory), Peter Mendygral (Cray Inc), Brian Austin (Lawrence Berkeley National Laboratory), Jacob Balma (Cray Inc), Krishna Kandalla (Cray Inc), Kalyan Kumaran (Argonne National Laboratory), Glenn Lockwood (Lawrence Berkeley National Laboratory), Scott Parker (Argonne National Laboratory), Steven Warren (Cray Inc), Nathan Wichmann (Cray Inc), Nicholas Wright (Lawrence Berkeley National Laboratory)

Abstract: Network congestion is one of the biggest problems facing HPC systems today, affecting system throughput, performance, user experience, and reproducibility. Congestion manifests as run-to-run variability due to contention for shared resources (like filesystems) or routes between compute endpoints. Despite its significance, current network benchmarks fail to proxy the real-world network utilization seen on congested systems. We propose a new open-source benchmark suite called the Global Performance and CongestionNetwork Tests (GPCNeT) to advance the state of the practice in this area. The guiding principles used in designing GPCNeT are described, and the methodology employed to maximize its utility is presented. The capabilities of GPCNeT evaluated by analyzing results from several world’s largest HPC systems, including an evaluation of congestion management on a next-generation network. The results show that systems of all technologies and scales are susceptible to congestion, and this work motivates the need for congestion control in next-generation networks.

Presentation: file

Back to Technical Papers Archive Listing