Workshop: Accelerating the Global Arrays ComEx Runtime Using Multiple Progress Ranks
Abstract: Abstract—Partitioned Global Address Space (PGAS) models are a part of system software that is being designed to support communication runtimes for exascale applications. MPI has been shown to be a viable option to develop a scalable PGAS communication subsystem and has the advantages of its standardization and higher performance. We used MPI two-sided semantics with a combination of automatic and user defined splitting of MPI communicators to achieve asynchronous progress. Our implementation can make use of multiple asynchronous progress ranks (PR) per node that can be mapped to the computing architecture of a node in a distributed cluster. We are able to show significant speed up of over 2.0X and scaling of a communication bound computational chemistry application distributed over 1024 nodes of state-of-the-art HPC clusters. Our results show that while running a communication bound application workload on a certain number of cluster nodes, an optimum number of ranks dedicated for communication can be found to achieve asynchronous communication progress and obtain highest performance.