AutoFFT: A Template-Based FFT Codes Auto-Generation Framework for ARM and X86 CPUs
Parallel Programming Languages, Libraries, and Models
TimeTuesday, 19 November 20194pm - 4:30pm
DescriptionThe discrete Fourier transform (DFT) is widely used in scientific and engineering computation. This paper proposes a template-based code generation framework named AutoFFT that can automatically generate high-performance fast Fourier transform (FFT) codes. AutoFFT employs the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix as the outer parallelization framework. To further reduce the number of floating-point operations of butterflies, we explore more symmetric and periodic properties of the DFT matrix and formulate two optimized calculation templates for prime and power-of-two radices. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these two templates and transfers them to efficient assembly codes using the template optimizer. Experiments show that AutoFFT outperforms FFTW, ARMPL, and Intel MKL on average across all FFT types on ARMv8 and Intel x86-64 processors.