Paper
:
Near-Memory Data Transformation for Efficient Sparse Matrix Multi-Vector Multiplication
Event Type
Paper
Registration Categories
TP
Tags
Data Management
GPUs
Memory
Networks
Performance
Software-defined networking
Sparse Computation
TimeWednesday, 20 November 20194pm - 4:30pm
Location301-302-303
DescriptionEfficient manipulation of sparse matrices is critical to a wide range of HPC applications. We study one common operation, Sparse Matrix Multi-Vector Multiplication (SpMM), and evaluate the impact of the sparsity, distribution of non-zero elements, and tile-traversal strategies on GPU implementations. Using these insights, we determine that operating on these sparse matrices in tiled-DCSR is well-suited to the parallel warp-synchronous execution model of GPU.

Preprocessing or storing the sparse matrix in the tiled-DCSR format, however, often requires significantly more memory storage than conventional CSR or CSC formats. Given that SpMM kernels are often bottlenecked on DRAM bandwidth, the increase in DRAM traffic can result in a slowdown for many matrices.

This work enhances a GPU's last-level cache/memory controller unit to act as a dynamic translator between the compute-optimized representation of data (tiled-DCSR) and its corresponding storage/bandwidth-optimized format (CSC).
Our approach achieves 2.26x better performance on average compared to cuSPARSE.
Archive
Back To Top Button