SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 147: Extremely Accelerated Deep Learning: ResNet-50 Training in 70.4 Seconds


Authors: Akihiro Tabuchi (Fujitsu Laboratories Ltd), Akihiko Kasagi (Fujitsu Laboratories Ltd), Masafumi Yamazaki (Fujitsu Laboratories Ltd), Takumi Honda (Fujitsu Laboratories Ltd), Masahiro Miwa (Fujitsu Laboratories Ltd), Takashi Shiraishi (Fujitsu Laboratories Ltd), Motohiro Kosaki (Fujitsu Laboratories Ltd), Naoto Fukumoto (Fujitsu Laboratories Ltd), Tsuguchika Tabaru (Fujitsu Laboratories Ltd), Atsushi Ike (Fujitsu Laboratories Ltd), Kohta Nakashima (Fujitsu Laboratories Ltd)

Abstract: Distributed deep learning using a large mini-batch is a key technology to accelerate training in deep learning. However, it is difficult to achieve a high scalability and maintain validation accuracy in distributed learning on large clusters. We introduce two optimizations, reducing the computation time and overlapping the communication with the computation. By applying the techniques and using 2,048 GPUs, we achieved the world's fastest ResNet-50 training in MLPerf, which is a de facto standard DNN benchmark (as of July 2019).

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF


Back to Poster Archive Listing