SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Unified Communication X (UCX) Community


Authors: Gilad Shainer (Mellanox Technologies), Jeff Kuehn (Los Alamos National Laboratory), Pavel Shamis (ARM Ltd), Dhabaleswar Panda (Ohio State University), Brad Benton (Advanced Micro Devices (AMD) Inc), Duncan Poole (Nvidia Corporation), Steve Poole (Los Alamos National Laboratory)

Abstract: In order to exploit the capabilities of new HPC systems and to meet their demands in scalability, communication software needs to scale on millions of cores and support applications with adequate functionality. UCX is a collaboration between industry, national labs and academia that consolidates that provides a unified open-source framework.

The UCX project is managed by the UCF consortium (http://www.ucfconsortium.org/) and includes members from LANL, ANL, Ohio State University, AMD, ARM, IBM, Mellanox, NVIDIA and more. The session will serves as the UCX community meeting, and will introduce the latest development to HPC developers and the broader user community.


Long Description: In order to exploit the capabilities of new HPC systems and to meet their demands in scalability, communication software needs to scale on millions of cores and support applications with adequate functionality to express their parallelism. UCX is a collaboration between industry, national labs and academia that consolidates multiple technologies that provides a unified open source framework. The UCX project is managed by the UCF consortium (http://www.ucfconsortium.org/) and includes members from LANL, ANL, Ohio State University, AMD, ARM, IBM, Mellanox, NVIDIA and more. The session will serves as the UCX community meeting, and will introduce the latest development and specification to HPC developers and the broader user community.

Modern HPC systems include extreme numbers of compute elements and extremely low-latency interconnection networks. In order to exploit the capabilities of these architectures and to meet their demands in scalability, communication software needs to scale and support applications with adequate functionality to express their parallelism. Moreover, communication software should add as little overhead as possible in order to avoid compromising the native performance of the interconnection network. These requirements make the design of high-performance communication software extremely intricate, since they demand minimal memory requirements and low instruction counts and cache activity while meeting stringent performance targets.

High-level programming models for communication (e.g., MPI, SHMEM) can be built on top of middleware, such as Portals, GASNet, UCCS, and ARMCI or use lower-level network-specific interfaces, often provided by the vendor. While the former offer high-level communication abstractions and portability across different systems, the latter offer proximity to the hardware and minimize overheads related to multiple software layers. An effort to combine the advantages of both is UCX, a communication framework for high-performance computing systems.

UCX has already been integrated with upstream of Open MPI project and OpenSHMEM, being used with MPICH and more. UCX is now being deployed with several large scale supercomputers around the world. The session will enable a dialog on the future plans for UCX and review the operations of the UCX consortium. It will include UCX performance results, and update on the integration with Python, Charm++ and more.


URL: http://www.ucfconsortium.org/


Back to Birds of a Feather Archive Listing