SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Afternoon Keynote - Running large models in minutes: an engineering journey through high performance for AI


Workshop: Afternoon Keynote - Running large models in minutes: an engineering journey through high performance for AI

Abstract: From climate modelling to drug design, AI models are not fully part of scientific modelling and AI models are getting more complex and larger every year. The adoption of challenging workloads like the BERT language model and the popularity of Deep Learning performance blogs or benchmarks such as MLPerf highlight the importance of being able to quickly train and tune such models. Until recently, system design for HPC and AI were often done in isolation as the requirements for the platforms where different, making large scientific experimentations difficult. To overcome these gaps, systems are now designed with AI software in mind and scale is introduced in the software design from ground up so that each model running at the edge can be trained in minutes at scale. In this talk we will cover how software leverages the inherent scaling nature of large models and how HPC infrastructures can be built and leveraged as the ideal platforms for fast experimentation and large problems.






Back to Machine Learning in HPC Environments Archive Listing


Back to Full Workshop Archive Listing