Abstract: As we are quickly approaching exascale and moving onwards towards the next challenge, we are exploring a wider range of technologies and architectures. The further out the timeframes considered, the less likely prototype hardware is available. A popular method of exploring new architectural extensions is to emulate them on existing platforms. The Arm Instruction Emulator (ArmIE) is such a tool, which we use on existing Armv8 platforms to run Arm's latest vector architecture, the Scalable Vector Extension (SVE).
To aid with porting applications towards SVE, we developed an application optimization methodology based on ArmIE that uses timing-agnostic metrics to assess application quality. We show how we have successfully optimized the High Performance Conjugate Gradient (HPCG) High Performance Computing benchmark to SVE by using our methodology, resulting in a hand-optimized intrinsics-based version.