Poster 48: Runtime System for GPU-Based Hierarchical LU Factorization

SC19 Proceedings

Poster 48: Runtime System for GPU-Based Hierarchical LU Factorization

Authors: Qianxiang Ma (Tokyo Institute of Technology), Rio Yokota (Tokyo Institute of Technology, Global Scientific Information and Computing Center; Tokyo Institute of Technology)

Abstract: Hierarchical low-rank approximation can reduce both the storage and computation costs of dense matrices, but its implementation is challenging. In this research, we tackle one of the most difficult problems of GPU parallelization of the factorization of these hierarchical matrices. To this end, we are developing a new runtime system for GPUs that can schedule all tasks into one GPU kernel. Other existing runtime systems, like cuGraph and Standford Legion, can only manage streams and kernel-level parallelism. Even without too much tuning, we achieved 4x better performance in H-LU factorization with a single GPU when comparing with a well-tuned CPU-based hierarchical matrix library, HLIBpro, on moderately sized matrices. Additionally, we have significantly less runtime overheads exposed when processing smaller matrices.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing