Abstract
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.
We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2p3q5r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2.
Similar content being viewed by others
References
R. C. Agarwal and J. W. Cooley, Vectorized mixed radix discrete Fourier transform algorithms. Proc. IEEE, 75: 1283-1292, 1987.
R. C. Agarwal and F. G. Gustavson, and M. Zubair, M., A high performance parallel algorithm for 1-D FFT. '94, pp. 34-40, 1994.
A. Averbuch, E. Gabber, and B. Gordissky, and Y. Medan, A parallel FFT on a MIMD machine. Parallel Computing, 15: 61-74, 1990.
D. H. Bailey, FFTs in external or hierarchical memory. The Journal of Supercomputing, 4: 23-35, 1990.
G. D. Bergland, A fast Fourier transform algorithm using base 8 iterations. Math. Comp., 22: 275-279, 1968.
T. Boku, K. Itakura, H. Nakamura, and K. Nakazawa, CP-PACS: A massively parallel processor for large scale scientific calculations, Proceedings of the 1997 International Conference on Supercomputing, pp. 108-115, 1997.
J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19: 297-301, 1965.
M. Hegland, Real and complex fast Fourier transforms on the Fujitsu VPP 500. Parallel Computing, 22: 539-553, 1996.
S. L. Johnsson and R. L. Krawitz, Cooley-Tukey FFT on the connection machine. Parallel Computing, 18, 1201-1221, 1992.
Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 1.1, 1995.
K. Nakazawa, H. Nakamura, H. Imori, and S. Kawabe, Pseudo vector processor based on registerwindowed superscalar pipeline, '92, pp. 642-651, 1992.
C. M. Rader, Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE, 56: 1107-1108, 1968.
R. C. Singleton, An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. Audio Electroacoust., 17: 93-103, 1969.
P. N. Swarztrauber, FFT algorithms for vector computers. Parallel Computing, 1: 45-63, 1984.
P. N. Swarztrauber, Multiprocessor FFTs, Parallel Computing, 5: 197-210, 1987.
C. Temperton, A note on prime factor FFT algorithms, J. Comput. Phys., 52: 198-204, 1983.
C. Temperton, Self-sorting mixed-radix fast Fourier transforms, J. Comput. Phys., 52: 1-23, 1983.
C. Temperton, A generalized prime factor FFT algorithm for any N = 2p3q5r, SIAM J. Sci. Stat. Comput., 13: 676-686, 1992.
C. Van Loan, Computational frameworks for the Fast Fourier Transform, SIAM Press, Philadelphia, 1992.
S. Winograd, On computing the discrete Fourier transform, Math. Comp., 32, 175-199, 1978.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Takahashi, D., Kanada, Y. High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers. The Journal of Supercomputing 15, 207–228 (2000). https://doi.org/10.1023/A:1008160021085
Issue Date:
DOI: https://doi.org/10.1023/A:1008160021085