SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Abstract: Deep learning applications introduce heavy I/O loads on computer systems. The inherently long-running, highly concurrent, and random file accesses can easily saturate traditional shared file systems and negatively impact other users. We investigate here a solution to these problems based on leveraging local storage and the interconnect to serve training datasets at scale. We present FanStore, a user-level transient object store that provides low-latency and scalable POSIX-compliant file access by integrating the function interception technique and various metadata/data placement strategies. On a single node, FanStore provides performance similar to that of the XFS journaling file system. On many nodes, our experiments with real applications show that FanStore achieves over 90% scaling efficiency.

Back to Deep Learning on Supercomputers Archive Listing

Back to Full Workshop Archive Listing