Presentation
Explicit Data Layout Management for Autotuning Exploration on Complex Memory Topologies
Event Type
Workshop
W
HPC
Memory
OS and Runtime Systems
Runtime Systems
TimeMonday, 18 November 20192:22pm - 2:41pm
Location501
DescriptionThe memory topology of high-performance computing platforms is becoming more
complex. Future exascale platforms in particular are expected to
feature multiple types of memory technologies, and multiple accelerator
devices per compute node.
In this paper, we discuss the use of explicit management of the layout of data
in memory across memory nodes and devices for performance exploration purposes.
Indeed, many classic optimization techniques rely on reshaping or tiling input
data in specific ways to achieve peak efficiency on a given architecture.
With autotuning of a linear algebra code as the end goal, we present AML: a framework
to treat three memory management abstractions as first-class citizens: data
layout in memory, tiling of data for parallelism, and data movement across
memory types. By providing access to these abstractions as part
of the performance exploration design space, our framework eases the design and
validation of complex, efficient algorithms for heterogeneous platforms.
Using the Intel Knights Landing architecture in one of its most NUMA
configurations as a proxy platform, we showcase our framework by
exploring tiling and prefetching schemes for a DGEMM algorithm.
complex. Future exascale platforms in particular are expected to
feature multiple types of memory technologies, and multiple accelerator
devices per compute node.
In this paper, we discuss the use of explicit management of the layout of data
in memory across memory nodes and devices for performance exploration purposes.
Indeed, many classic optimization techniques rely on reshaping or tiling input
data in specific ways to achieve peak efficiency on a given architecture.
With autotuning of a linear algebra code as the end goal, we present AML: a framework
to treat three memory management abstractions as first-class citizens: data
layout in memory, tiling of data for parallelism, and data movement across
memory types. By providing access to these abstractions as part
of the performance exploration design space, our framework eases the design and
validation of complex, efficient algorithms for heterogeneous platforms.
Using the Intel Knights Landing architecture in one of its most NUMA
configurations as a proxy platform, we showcase our framework by
exploring tiling and prefetching schemes for a DGEMM algorithm.
Archive