SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer

Authors: Francesco Di Natale (Lawrence Livermore National Laboratory), Harsh Bhatia (Lawrence Livermore National Laboratory), Timothy S. Carpenter (Lawrence Livermore National Laboratory), Chris Neale (Los Alamos National Laboratory), Sara Kokkila Schumacher (IBM Research), Tomas Oppelstrup (Lawrence Livermore National Laboratory), Liam Stanton (San Jose State University), Xiaohua Zhang (Lawrence Livermore National Laboratory), Shiv Sundram (Lawrence Livermore National Laboratory), Thomas R. W. Scogland (Lawrence Livermore National Laboratory), Gautham Dharuman (Lawrence Livermore National Laboratory), Michael P. Surh (Lawrence Livermore National Laboratory), Yue Yang (Lawrence Livermore National Laboratory), Claudia Misale (IBM Research), Lars Schneidenbach (IBM Corporation), Carlos Costa (IBM Corporation), Changhoan Kim (IBM Corporation), Bruce D'Amora (IBM Corporation), Sandrasegaram Gnanakaran (Los Alamos National Laboratory), Dwight V. Nissley (Frederick National Laboratory for Cancer Research), Fred Streitz (Lawrence Livermore National Laboratory), Felice C. Lightstone (Lawrence Livermore National Laboratory), Peer-Timo Bremer (Lawrence Livermore National Laboratory), James N. Glosli (Lawrence Livermore National Laboratory), Helgi I. Ingolfsson (Lawrence Livermore National Laboratory)

Abstract: Most biological phenomena have microscopic foundations yet span macroscopic length- and time-scales, necessitating multiscale computational models. Efficient simulation of these complex multiscale models on modern heterogeneous architectures poses significant challenges in scheduling and co-managing resources such as computational power, communication bottlenecks, and filesystem bandwidth. To address these challenges, we present a novel massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), which combines a large length- and time-scale macro model with a high-fidelity molecular dynamics (MD) micro model using machine learning. We describe our infrastructure which is designed for high scalability, efficiency, robustness, portability, and fault tolerance on heterogeneous resources. We demonstrate MuMMI conducting the largest-of-its-kind simulation to investigate the dynamics of KRAS proteins in cancer initiation. Concurrently running up to 36,000 jobs on 16,000 GPUs and 176,000 CPU cores, we executed 120,000 MD simulations surpassing an aggregate simulation time of 200 milliseconds, orders of magnitude greater than comparable studies.

Presentation: file

Back to Technical Papers Archive Listing