Poster 152: Deep Domain Adaptation for Runtime Prediction in Dynamic Workload Scheduler
TimeThursday, 21 November 20198:30am - 5pm
DescriptionIn HPC systems, users' requested runtime for submitted jobs plays a crucial role in efficiency. While underestimation of job runtime could terminate jobs before completion, overestimation could result in long queuing of other jobs in HPC systems. In reality, runtime prediction in HPC is challenging due to the complexity and dynamics of running workloads. Most of the current predictive runtime models are trained on static workloads. This poses a risk of over-fitting the predictions with bias from the learned workload distribution. In this work, we propose an adaptation of Correlation Alignment method in our deep neural network architecture (DCORAL) to alleviate the domain shift between workloads for better runtime predictions. Experiments on both standard benchmark workloads and NCAR real-time production workloads reveal that our proposed method results in a more stable training model across different workloads with low accuracy variance as compared to the other state-of-the-art methods.