Poster 73: Accelerating Large-Scale GW Calculations on Hybrid CPU-GPU Architectures

SC19 Proceedings

Poster 73: Accelerating Large-Scale GW Calculations on Hybrid CPU-GPU Architectures

Authors: Mauro Del Ben (Lawrence Berkeley National Laboratory), Charlene Yang (National Energy Research Scientific Computing Center (NERSC)), Felipe Jornada (University of California, Berkeley; Lawrence Berkeley National Laboratory), Steven G. Louie (University of California, Berkeley; Lawrence Berkeley National Laboratory), Jack Deslippe (National Energy Research Scientific Computing Center (NERSC))

Abstract: In this poster, we present the strategy, progress, and performance while GPU porting one of the major modules, epsilon, of the electronic structure code BerkeleyGW. Epsilon represents the most time-consuming routines in the BerkeleyGW workflow for large-scale material science simulations. Some of the porting/optimization strategies include, changing our original data layout to efficiently use libraries such as cuBLAS and cuFFT, implementation of specific CUDA kernels to minimize data copies between host/device and keeping data on device, efficient use of data streams to leverage high concurrency on the device, asynchronous memory copies and overlapping (MPI) communication on the host and computation on the device. Preliminary results are presented in terms of the speedup compare to the CPU-only implementation, strong/weak scaling, and power efficiency. Excellent acceleration is demonstrated: up to 30x for specific kernels. Our port also exhibits good scalability and about 16x higher FLOPs/watt efficiency compared to the CPU-only implementation.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing