Abstract: Ontologies have become an increasingly popular semantic layer for integrating multiple heterogeneous datasets. However, significant challenges remain with supporting efficient and scalable processing of queries with data linked with ontologies (ontological queries). Ontological query processing queries requires explicitly defined query patterns be expanded to capture implicit ones, based on available ontology inference axioms. However, in practice such as in the biomedical domain, the complexity of the ontological axioms results in significantly large query expansions which present day query processing infrastructure cannot support. In particular, it remains unclear how to effectively parallelize such queries.
In this paper, we propose data and query transformations that enable inter-operator parallelism of ontological queries on Hadoop platforms. Our transformation techniques exploit ontological axioms, second order data types, and operator rewritings to eliminate expensive query substructures for increased parallelizability. Comprehensive experiments conducted on benchmark datasets show up to 25x performance improvement over existing approaches.
Back to Technical Papers Archive Listing