Presentation
Poster 73: Accelerating Large-Scale GW Calculations on Hybrid CPU-GPU Architectures
SessionResearch Posters Display
Event Type
Posters
Research Posters
TP
EX
EXH
TimeThursday, 21 November 20198:30am - 5pm
LocationE Concourse
DescriptionIn this poster, we present the strategy, progress, and performance while GPU porting one of the major modules, epsilon, of the electronic structure code BerkeleyGW. Epsilon represents the most time-consuming routines in the BerkeleyGW workflow for large-scale material science simulations. Some of the porting/optimization strategies include, changing our original data layout to efficiently use libraries such as cuBLAS and cuFFT, implementation of specific CUDA kernels to minimize data copies between host/device and keeping data on device, efficient use of data streams to leverage high concurrency on the device, asynchronous memory copies and overlapping (MPI) communication on the host and computation on the device. Preliminary results are presented in terms of the speedup compare to the CPU-only implementation, strong/weak scaling, and power efficiency. Excellent acceleration is demonstrated: up to 30x for specific kernels. Our port also exhibits good scalability and about 16x higher FLOPs/watt efficiency compared to the CPU-only implementation.
Archive