SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

MIQS: Metadata Indexing and Querying Service for Self-Describing File Formats

Authors: Wei Zhang (Texas Tech University), Suren Byna (Lawrence Berkeley National Laboratory), Houjun Tang (Lawrence Berkeley National Laboratory), Brody Williams (Texas Tech University), Yong Chen (Texas Tech University)

Abstract: Scientific applications often store datasets in self-describing data file formats, such as HDF5 and netCDF. Regrettably, to efficiently search the metadata within these files remains challenging due to the sheer size of the datasets. Existing solutions extract the metadata and store it in external database management systems (DBMS) to locate desired data. However, this practice introduces significant overhead and complexity in extraction and querying. In this research, we propose a novel Metadata Indexing and Querying Service (MIQS), which removes the external DBMS and utilizes in-memory index to achieve efficient metadata searching. MIQS follows the self-contained data management paradigm and provides portable and schema-free metadata indexing and querying functionalities for self-describing file formats. We have evaluated MIQS with the state-of-the-art MongoDB-based metadata indexing solution. MIQS achieved up to 99% time reduction in index construction and up to 172k× search performance improvement with up to 75% reduction in memory footprint.

Presentation: file

Back to Technical Papers Archive Listing