DescriptionMost high-performance computing (HPC) clusters today use a global parallel file system to enable high data throughput. The parallel file system is typically centralized, and its storage media are physically separated from the compute cluster. Compute nodes as clients of the parallel file system are often additionally equipped with SSDs. The node internal storage media are rarely well-integrated into the I/O and compute workflows.
In this paper, we propose a hierarchical Persistent Client Caching (LPCC) mechanism for the Lustre file system. LPCC integrates with the Lustre HSM solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system for distributed persistent client caching. The evaluation results presented in this paper show LPCC's advantages for various workloads, enabling even speed-ups linear in the number of clients for several real-world scenarios.