Workshop: Optimization of a Solver for Computational Materials and Structures Problems on NVIDIA Volta and AMD Instinct GPUs
Abstract: The Scalable Implementation of Finite Elements by NASA (ScIFEN) is a software package developed to solve complex computational materials and structures problems using the finite element method (FEM). In this paper, we describe optimization techniques to speed up the linear solver computation that occurs within the ScIFEN application. We consider GPUs from two different vendors, NVIDIA and AMD as our target platforms for optimization and highlight differences in performance and optimization techniques. The NVIDIA GPU Volta V100 is used in the Summit system deployed at Oak Ridge National Laboratory, and the new exascale system, Frontier, will be using AMD Radeon Instinct GPU. We evaluated the performance of various optimization techniques on test matrices, ranging in size from 100K to 4M, that are representative of ScIFEN applications. The linear solver computation is memory-bound on both GPUs. Our experiments show that on the NVIDIA GPU we obtained up to 79% of the theoretical peak bandwidth, while the AMD GPU achieved 59%. Overall, the NVIDIA V100 GPU outperforms the AMD MI 25 GPU. We observed an overall speedup of up to 37X on an NVIDIA V100 compared to an Intel Skylake 12-core machine. The solver for a 4M degree of freedom system took under 2.5 seconds.