Abstract: The memory topology of high-performance computing platforms is becoming more
complex. Future exascale platforms in particular are expected to
feature multiple types of memory technologies, and multiple accelerator
devices per compute node.
In this paper, we discuss the use of explicit management of the layout of data
in memory across memory nodes and devices for performance exploration purposes.
Indeed, many classic optimization techniques rely on reshaping or tiling input
data in specific ways to achieve peak efficiency on a given architecture.
With autotuning of a linear algebra code as the end goal, we present AML: a framework
to treat three memory management abstractions as first-class citizens: data
layout in memory, tiling of data for parallelism, and data movement across
memory types. By providing access to these abstractions as part
of the performance exploration design space, our framework eases the design and
validation of complex, efficient algorithms for heterogeneous platforms.
Using the Intel Knights Landing architecture in one of its most NUMA
configurations as a proxy platform, we showcase our framework by
exploring tiling and prefetching schemes for a DGEMM algorithm.
Back to MCHPC’19: Workshop on Memory Centric High Performance Computing Archive Listing