Workshop: Stretching Jacobi: A Two-Stage Pivoting Approach for Block-Based Factorization
Abstract: Given enough parallel compute units, the processing time for sparse (in-)complete factorization is determined by the number of level sets found in the pattern of the matrix. For positive definite or diagonal dominant matrices, the iterative Jacobi-method presents itself as a scalable alternative on massively-parallel systems. In this work, we present modifications to the Jacobi algorithm, extending its applicability to a wider class of matrices. Our approach cuts the matrix into blocks to allow register-based pivoting in batched CUDA factorization kernels and, for the first time on GPUs, also flexible permutations on the block level in an a posteriori threshold full pivoting scheme. Experiments show our batched kernels to be on par with partially pivoted approaches and the resulting sparse factorizations compare favorable in quality and speed with their traditionally scheduled counterparts.