Scalability and Data Security: Deep Learning with Health Data on Future HPC Platforms
Clouds and Distributed Computing
TimeMonday, 18 November 20199:05am - 9:40am
DescriptionPerforming health data analytics at scale presents several challenges to classic HPC environments. Datasets contain personal health information (PHI) and are updated regularly, complicating data access on publicly accessible HPC systems. Moreover, the diverse group of tasks and models – ranging from neural networks for information extraction to knowledge bases for predictive modeling – have widely varying scales, hardware preferences, and software requirements. Both exascale systems and cloud-based environments have the opportunity to play important roles by addressing data security and performance portability. Cloud platforms provide out-of-the-box solutions for maintaining data security, while recent work has extended secure computing environments to systems like OLCF Summit. In this talk I will discuss how we are handling the need for scalable HPC resources with the data security requirements inherent in working with personal health information, in the context of the interagency partnership between the Department of Energy and the National Cancer Institute. As part of this partnership, we are developing state-of-the-art deep learning models to perform information extraction from cancer pathology reports for near real-time cancer incidence reporting. Our approach to addressing the patient privacy complexities involves integral roles for both traditional HPC resources and cloud-like platforms, playing to the relative strengths of both modalities.