Presentation
Large-Batch Training for LSTM and Beyond
SessionMachine Learning Training
Event Type
Paper
TP
Algorithms
Benchmarks
Deep Learning
Machine Learning
Parallel Programming Languages, Libraries, and Models
Scalable Computing
Sparse Computation
BSP Finalist
TimeTuesday, 19 November 201910:30am - 11am
Location401-402-403-404
DescriptionLarge-batch training approaches have enabled researchers to utilize distributed processing and greatly accelerate deep neural networks training. However, there are three problems in current large-batch research:
(1) Although RNN approaches like LSTM have been widely used in many applications, current large-batch research is principally focused on CNNs.
(2) Even for CNNs, there is no automated technique for extending the batch size beyond 8K.
(3) To keep the variance in the gradient expectation constant, theory suggests that a Sqrt Scaling scheme should be used in large-batch training.
Unfortunately, there are not many successful applications. In this paper, we propose Dynamic Adaptive-Tuning Engine (DATE) for better large-batch training. DATE achieves a 5.3x average speedup over the baselines for four LSTM-based applications on the same hardware. We finish the ImageNet training with ResNet-50 in two minutes on 1024 v3 TPUs (76.7% top-1 accuracy), which is the fastest version as of June 2019.
(1) Although RNN approaches like LSTM have been widely used in many applications, current large-batch research is principally focused on CNNs.
(2) Even for CNNs, there is no automated technique for extending the batch size beyond 8K.
(3) To keep the variance in the gradient expectation constant, theory suggests that a Sqrt Scaling scheme should be used in large-batch training.
Unfortunately, there are not many successful applications. In this paper, we propose Dynamic Adaptive-Tuning Engine (DATE) for better large-batch training. DATE achieves a 5.3x average speedup over the baselines for four LSTM-based applications on the same hardware. We finish the ImageNet training with ResNet-50 in two minutes on 1024 v3 TPUs (76.7% top-1 accuracy), which is the fastest version as of June 2019.
Download PDF
Archive