SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Network-Accelerated Non-Contiguous Memory Transfers

Authors: Salvatore Di Girolamo (ETH Zurich, Cray Inc), Konstantin Taranov (ETH Zurich), Andreas Kurth (ETH Zurich), Michael Schaffner (ETH Zurich), Timo Schneider (ETH Zurich), Jakub Beranek (IT4Innovations, Czech Republic), Maciej Besta (ETH Zurich), Luca Benini (ETH Zurich), Duncan Roweth (Cray Inc), Torsten Hoefler (ETH Zurich)

Abstract: Applications often communicate data that is non-contiguous in the send- or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate which tasks to offload to the NIC: In this work we argue that non-contiguous memory transfers can be transparently network-accelerated, truly achieving zero-copy communications. We implement and extend sPIN, a packet streaming processor, within a Portals 4 NIC SST model, and evaluate strategies for NIC-offloaded processing of MPI datatypes, ranging from datatype-specific handlers to general solutions for any MPI datatype. We demonstrate up to 8x speedup in the unpack throughput of real applications, demonstrating that non-contiguous memory transfers are a first-class candidate for network acceleration.

Presentation: file

Back to Technical Papers Archive Listing