Presentation

DescriptionThe importance of sparse data management is growing with data produced by large-scale experimental and observational facilities that contain small amounts of non-zero values. In this document, we explore different design options to support sparse data in HDF5, one of the most popular high-performance I/O libraries and file formats used for scientific data. We discuss several use cases and requirements. Our main hard design constraint is that any changes to the HDF5 dataset API would be a burden on users and not acceptable. The remaining options are discussed below. We provide a rationale for what we believe is the strongest candidate and describe how its potential benefits can be simulated with the existing HDF5 library. We have conducted a set of computational experiments the results of which are reported here. They show that our candidate meets all relevant requirements and gives us a certain degree of confidence for an HDF5 library-native implementation.