SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 48: Runtime System for GPU-Based Hierarchical LU Factorization


Authors: Qianxiang Ma (Tokyo Institute of Technology), Rio Yokota (Tokyo Institute of Technology, Global Scientific Information and Computing Center; Tokyo Institute of Technology)

Abstract: Hierarchical low-rank approximation can reduce both the storage and computation costs of dense matrices, but its implementation is challenging. In this research, we tackle one of the most difficult problems of GPU parallelization of the factorization of these hierarchical matrices. To this end, we are developing a new runtime system for GPUs that can schedule all tasks into one GPU kernel. Other existing runtime systems, like cuGraph and Standford Legion, can only manage streams and kernel-level parallelism. Even without too much tuning, we achieved 4x better performance in H-LU factorization with a single GPU when comparing with a well-tuned CPU-based hierarchical matrix library, HLIBpro, on moderately sized matrices. Additionally, we have significantly less runtime overheads exposed when processing smaller matrices.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF


Back to Poster Archive Listing