Authors:
Abstract: The NUMA architecture is the design choice for modern multi-CPU shared memory systems. In many ways, a NUMA system resembles a shared-nothing distributed system: memory accesses to remote NUMA domains are more expensive than local accesses.
In this work, we explore how improved data locality and reduced expensive remote communication can be achieved by exploiting "distributed" shared-memory of NUMA machines to develop shared-memory graph processing solutions optimized for NUMA systems. We introduce a novel hybrid design for memory accesses that handles the burst mode in traversal based algorithms, like BFS and SSSP, and reduces the number of remote accesses and updates. We demonstrate that our designs offer up to 84% speedup over our NUMA-oblivious framework Totem and 2.86x over shared-nothing distributed design, for BFS and SSSP algorithms.
Best Poster Finalist (BP): no
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing