ACM Student Research Competition: Graduate Posters
ACM Student Research Competition: Undergraduate Posters
Posters
:
Poster 8: Mitigating Communication Bottlenecks in MPI-Based Distributed Learning
Event Type
ACM Student Research Competition: Graduate Posters
ACM Student Research Competition: Undergraduate Posters
Posters
Registration Categories
TP
EX
EXH
Tags
Student Program
TimeThursday, 21 November 20198:30am - 5pm
LocationE Concourse
DescriptionCurrent commercial and scientific facilities generate and maintain vast amounts of complex data. While machine learning (ML) techniques can provide crucial insight, developing these models is often impractical on a single process. Distributed learning techniques mitigate this problem; however, current models contain significant performance bottlenecks. Here, we conduct a detailed performance analysis of MPI_Learn, a widespread distributed ML framework for high-energy physics (HEP) applications, on the Summit supercomputer, by training a network to classify simulated collision events from high-energy particle detectors at the CERN Large Hadron Collider (LHC).

We conclude that these bottlenecks occur as a result of increasing communication time between the different processes, and to mitigate the bottlenecks we propose the implementation of a new distributed algorithm for stochastic gradient descent (SGD). We provide a proof of concept by demonstrating better scalability with results on 250 GPUs, and with hyperparameter optimization, show a ten-fold decrease in training time.
Archive
Back To Top Button