Large-Batch Training for LSTM and Beyond

SC19 Proceedings

Large-Batch Training for LSTM and Beyond

Authors: Yang You (University of California, Berkeley; Google LLC), Jonathan Hseu (Google LLC), Chris Ying (Google LLC), James Demmel (University of California, Berkeley), Kurt Keutzer (University of California, Berkeley), Cho-Jui Hsieh (University of California, Los Angeles (UCLA); Google LLC)

Abstract: Large-batch training approaches have enabled researchers to utilize distributed processing and greatly accelerate deep neural networks training. However, there are three problems in current large-batch research:

(1) Although RNN approaches like LSTM have been widely used in many applications, current large-batch research is principally focused on CNNs.

(2) Even for CNNs, there is no automated technique for extending the batch size beyond 8K.

(3) To keep the variance in the gradient expectation constant, theory suggests that a Sqrt Scaling scheme should be used in large-batch training.

Unfortunately, there are not many successful applications. In this paper, we propose Dynamic Adaptive-Tuning Engine (DATE) for better large-batch training. DATE achieves a 5.3x average speedup over the baselines for four LSTM-based applications on the same hardware. We finish the ImageNet training with ResNet-50 in two minutes on 1024 v3 TPUs (76.7% top-1 accuracy), which is the fastest version as of June 2019.

Presentation: file

Back to Technical Papers Archive Listing