Authors:
Abstract: In 2019, the Department of Energy deployed the Summit and Sierra supercomputers, both employing the latest interconnect technology. In this paper, we provide an in-depth assessment of the systems' interconnects, that is based on Enhanced Data Rate (EDR) 100 Gb/s Mellanox Infiniband. Both systems use second-generation EDR Host Channel Adapters (HCAs) and switches adding several new features such as Adaptive Routing (AR), switch-based collectives, HCA-based tag matching, and NVMe-over-Fabrics offload. Although based on the same components, Summit's network is "non-blocking'' (i.e., fully provisioned) and Sierra's network has a 2:1 taper. We evaluate the two systems' interconnects using traditional communication benchmarks as well as real applications. We find that the new Adaptive Routing dramatically improves performance, but the other new features still need improvement.
Presentation: file
Back to Technical Papers Archive Listing