Preparation and Optimization of a Diverse Workload for a Large-Scale Heterogeneous System
Authors: Ian Karlin (Lawrence Livermore National Laboratory), Yoonho Park (IBM Research), Bronis R. de Supinski (Lawrence Livermore National Laboratory), Peng Wang (Nvidia Corporation), Bert Still (Lawrence Livermore National Laboratory), David Beckingsale (Lawrence Livermore National Laboratory), Robert Blake (Lawrence Livermore National Laboratory), Tong Chen (IBM Research), Guojing Cong (IBM Research), Carlos Costa (IBM Research), Johann Dahm (IBM Research), Giacomo Domeniconi (IBM Research), Thomas Epperly (Lawrence Livermore National Laboratory), Aaron Fisher (Lawrence Livermore National Laboratory), Sara Kokkila Schumacher (IBM Research), Steven Langer (Lawrence Livermore National Laboratory), Hai Le (Lawrence Livermore National Laboratory), Eun Kyung Lee (IBM Research), Naoya Maruyama (Lawrence Livermore National Laboratory), Xinyu Que (IBM Research), David Richards (Lawrence Livermore National Laboratory), Bjorn Sjogreen (Lawrence Livermore National Laboratory), Jonathan Wong (Lawrence Livermore National Laboratory), Carol Woodward (Lawrence Livermore National Laboratory), Ulrike Yang (Lawrence Livermore National Laboratory), Xiaohua Zhang (Lawrence Livermore National Laboratory), Bob Anderson (Lawrence Livermore National Laboratory), David Appelhans (IBM Research), Levi Barnes (Nvidia Corporation), Peter Barnes (Lawrence Livermore National Laboratory), Sorin Bastea (Lawrence Livermore National Laboratory), David Boehme (Lawrence Livermore National Laboratory), Jamie A. Bramwell (Lawrence Livermore National Laboratory), Jim Brase (Lawrence Livermore National Laboratory), Jose Brunheroto (IBM Research), Barry Chen (Lawrence Livermore National Laboratory), Charway R. Cooper (Lawrence Livermore National Laboratory), Tony DeGroot (Lawrence Livermore National Laboratory), Rob Falgout (Lawrence Livermore National Laboratory), Todd Gamblin (Lawrence Livermore National Laboratory), David Gardner (Lawrence Livermore National Laboratory), James Gosli (Lawrence Livermore National Laboratory), John Gunnels (IBM Research), Max Katz (Nvidia Corporation), Tzanio Kolev (Lawrence Livermore National Laboratory), I-Feng W. Kuo (Lawrence Livermore National Laboratory), Matthew P. Legendre (Lawrence Livermore National Laboratory), Ruipeng Li (Lawrence Livermore National Laboratory), Pei-Hung Lin (Lawrence Livermore National Laboratory), Shelby Lockhart (University of Illinois), Kathleen McCandless (Lawrence Livermore National Laboratory), Claudia Misale (IBM Research), Jaime Moreno (IBM Research), Rob Neely (Lawrence Livermore National Laboratory), Jarom Nelson (Lawrence Livermore National Laboratory), Rao Nimmakayala (Lawrence Livermore National Laboratory), Kathryn O'Brien (IBM Research), Kevin O'Brien (IBM Research), Ramesh Pankajakshan (Lawrence Livermore National Laboratory), Roger Pearce (Lawrence Livermore National Laboratory), Slaven Peles (Pacific Northwest National Laboratory (PNNL)), Phil Regier (Lawrence Livermore National Laboratory), Steve Rennich (Nvidia Corporation), Martin Schulz (Technical University Munich), Howard Scott (Lawrence Livermore National Laboratory), James Sexton (IBM Research), Kathleen Shoga (Lawrence Livermore National Laboratory), Shiv Sundram (Lawrence Livermore National Laboratory), Guillaume Thomas-Collignon (Nvidia Corporation), Brian van Essen (Lawrence Livermore National Laboratory), Alexey Voronin (Lawrence Livermore National Laboratory), Bob Walkup (IBM Research), Lu Wang (Lawrence Livermore National Laboratory), Chris Ward (IBM Research, UK), Hui-Fang Wen (IBM Research), Dan White (Lawrence Livermore National Laboratory), Christopher Young (Lawrence Livermore National Laboratory), Cyril Zeller (Nvidia Corporation), Ed Zywicz (Lawrence Livermore National Laboratory)
Abstract: Productivity from day one on supercomputers that leverage new technologies requires significant preparation. The institution procuring a novel system architecture often does not have enough people with all the requisite knowledge and skills to prepare for it. Thus, institutions have recently employed the "Center of Excellence" (CoE) concept to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500.
This paper documents CoE experiences preparing a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for employing different programming approaches. Our early science and performance results show the project enabled significant early seismic science with up to a 14x throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.
Presentation: file
Back to Technical Papers Archive Listing