SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 152: Deep Domain Adaptation for Runtime Prediction in Dynamic Workload Scheduler

Authors: Hoang H. Nguyen (National Center for Atmospheric Research (NCAR); University of Illinois, Chicago), Ben Matthews (National Center for Atmospheric Research (NCAR)), Irfan Elahi (National Center for Atmospheric Research (NCAR))

Abstract: In HPC systems, users' requested runtime for submitted jobs plays a crucial role in efficiency. While underestimation of job runtime could terminate jobs before completion, overestimation could result in long queuing of other jobs in HPC systems. In reality, runtime prediction in HPC is challenging due to the complexity and dynamics of running workloads. Most of the current predictive runtime models are trained on static workloads. This poses a risk of over-fitting the predictions with bias from the learned workload distribution. In this work, we propose an adaptation of Correlation Alignment method in our deep neural network architecture (DCORAL) to alleviate the domain shift between workloads for better runtime predictions. Experiments on both standard benchmark workloads and NCAR real-time production workloads reveal that our proposed method results in a more stable training model across different workloads with low accuracy variance as compared to the other state-of-the-art methods.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing