Understanding Data Motion in the Modern HPC Data Center
TimeMonday, 18 November 20194:45pm - 5:10pm
DescriptionThe utilization and performance of storage, compute, and network resources within HPC data centers have been studied extensively, but much less work has gone towards characterizing how these resources are used in conjunction to solve larger scientific challenges. To address this gap, we present our work towards characterizing workloads and workflows at a data center-wide level by examining all data transfers that occurred between storage, compute, and the external network at the National Energy Research Scientific Computhing Center (NERSC) over a three-month period. Using a simple abstract representation of data transfers, we analyze over 100 million transfer logs from Darshan, HPSS user interfaces, and Globus to quantify the load on data paths between compute, storage, and the wide-area network based on transfer direction, user, transfer tool, source, destination, and time. We show that parallel I/O from user jobs, while undeniably important, is only one of several major I/O workloads that occurs throughout the execution of scientific workflows. We also show that this approach can be used to connect anomalous data traffic to specific users and file access patterns, and we construct time-resolved user transfer traces to demonstrate that it is possible to systematically identify coupled data motion for individual workflows.