SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

AutoFFT: A Template-Based FFT Codes Auto-Generation Framework for ARM and X86 CPUs


Authors: Zhihao Li (Institute of Computing Technology, Chinese Academy of Sciences), Haipeng Jia (Institute of Computing Technology, Chinese Academy of Sciences), Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Tun Chen (Institute of Computing Technology, Chinese Academy of Sciences), Liang Yuan (Institute of Computing Technology, Chinese Academy of Sciences), Luning Cao (Institute of Computing Technology, Chinese Academy of Sciences), Xiao Wang (Institute of Computing Technology, Chinese Academy of Sciences)

Abstract: The discrete Fourier transform (DFT) is widely used in scientific and engineering computation. This paper proposes a template-based code generation framework named AutoFFT that can automatically generate high-performance fast Fourier transform (FFT) codes. AutoFFT employs the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix as the outer parallelization framework. To further reduce the number of floating-point operations of butterflies, we explore more symmetric and periodic properties of the DFT matrix and formulate two optimized calculation templates for prime and power-of-two radices. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these two templates and transfers them to efficient assembly codes using the template optimizer. Experiments show that AutoFFT outperforms FFTW, ARMPL, and Intel MKL on average across all FFT types on ARMv8 and Intel x86-64 processors.


Presentation: file


Back to Technical Papers Archive Listing