SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Enabling Machine Learning-Based HPC Performance Diagnostics in Production Environments

Moderator: Ann Gentile (Sandia National Laboratories)

Panelists: William Kramer (Pittsburgh Supercomputing Center), Richard Gerber (Lawrence Berkeley National Laboratory), Nick Brown (Edinburgh Parallel Computing Centre), Aaron Saxton (University of Illinois)

Abstract: With the vast increases in system scales and complexity, a broad range of data is being collected at a scale that is impractical for direct human consumption. Recent advances in Machine Learning techniques and tools show great promise in developing system and application behavioral models that can be utilized to improve operational efficiency and application performance. The panel will discuss their differing perspectives regarding the potential outcomes of this technology, the current state of the art, and paths forward. We have assembled a diverse set of leading experts in the fields of systems management, performance analysis, and real world Machine Learning techniques applied to systems and application data.


Back to the Panel Archive Listing