SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 83: ETL: Elastic Training Layer for Deep Learning


Authors: Lei Xie (Tsinghua University, China), Jidong Zhai (Tsinghua University, China)

Abstract: Due to the rising of deep learning, clusters for deep learning training are widely deployed in production. However, static task configuration and resource fragmentation problems in existing clusters result in low efficiency and poor quality of service. We propose ETL, an elastic training layer for deep learning, to help address them once for all. ETL adopts many novel mechanisms, such as lightweight and configurable report primitive and asynchronous, parallel and IO-free state replication, to achieve both high elasticity and efficiency. The evaluation demonstrates the low overhead and high efficiency of these mechanisms and reveals the advantages of elastic deep learning supported by ETL.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF


Back to Poster Archive Listing