SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 102: Fast Training of an AI Radiologist: Leveraging Data Pipelining to Efficiently Utilize GPUs

Authors: Rakshith Vasudev (Dell EMC), John A. Lockman III (Dell EMC), Lucas A. Wilson (Dell EMC), Srinivas Varadharajan (Dell EMC), Frank Han (Dell EMC), Rengan Xu (Dell EMC), Quy Ta (Dell EMC)

Abstract: In a distributed deep learning training setting, using accelerators such as GPUs can be challenging to develop a high throughput model. If the accelerators are not utilized effectively, this could mean more time to solution, and thus the model's throughput is low. To use accelerators effectively across multiple nodes, we need to utilize an effective data pipelining mechanism that handles scaling gracefully so GPUs can be exploited of their parallelism. We study the effect of using the correct pipelining mechanism that is followed by tensorflow official models vs a naive pipelining mechanism that doesn't scale well, on two image classification models. Both the models using the optimized data pipeline demonstrate effective linear scaling when GPUs are added. We also show that converting to TF Records is not always necessary .

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing