The Fast Fourier Transform (FFT) is widely used in modern digital signal processing systems. In order to improve the efficiency of FFT, parallel methods are often used to speed up FFT in high-speed real-time application environments. However, due to the restrictions of size, energy consumption, heat dissipation etc., traditional electronic methods are becoming unsuitable for FFT applications in certain areas, such as aviation, aerospace, etc. Due to its characteristics of a massive number of data bits and low energy consumption, the Ternary Optical Computer (TOC) is a potential solution for these special cases. To verify this possibility, we studied the design and implementation of a high-speed parallel FFT on a TOC. Through analysis of the traditional radix-2, radix-4, and radix-8 DIT FFT operation processes, several FFT algorithms with higher parallelism, implemented on TOC, are designed by taking advantage of the characteristics of TOC. The implementation processes of these algorithms and the differences between them are presented. Meanwhile, the clock cycles and hardware resources required for each algorithm are also discussed. Simulation results reveal that the FFT implementation method is accurate. The algorithms require less power and fewer clock cycles when implemented on a TOC compared to the traditional methods implemented on an FPGA. This provides a new possible solution for high-speed low-power FFT implementation.
国家自然科学基金(61572305,61103054)
中国航天科工集团二院“自主"创新项目
感谢所有三值光学计算机团队成员, 尤其是徐群博士研究生对本文所做出的贡献.
Figure 1
Operating procedures of plural vector addition/subtraction in Radix-4 FFT algorithm
Figure 2
The process of continuous multiplication and addition ($\alpha_k^j$ stands for the $j$-th intermediate result vector in the $k$-th step)
Figure 3
Sequence diagram of pipeline scheme to compute $X(i)$
Figure 4
Radix-4 DIT-FFT algorithm implementation in TOC (one adders module stands for $N/4$ MSD adders)
Figure 5
(Color online) Comparison of the clock cycles required for different FFT algorithms
a | b | T | W | T$'$ | W$'$ | M |
$-$1 | $-$1 | $-$1 | 0 | $-$1 | 0 | 1 |
$-$1 | 0 | $-$1 | 1 | 0 | $-$1 | 0 |
$-$1 | 1 | 0 | 0 | 0 | 0 | $-$1 |
0 | $-$1 | $-$1 | 1 | 0 | $-$1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 1 | $-$1 | 0 | 1 | 0 |
1 | $-$1 | 0 | 0 | 0 | 0 | $-$1 |
1 | 0 | 1 | $-$1 | 0 | 1 | 0 |
1 | 1 | 1 | 0 | 1 | 0 | 1 |
Computation | ||
Algorithm | Multiplication of a $N$/2-dimensional complex | Addition/Subtraction of |
vector and a complex matrix of order $N$/2 | $N$/2-dimensional complex vectors | |
Parallel Radix-2 FFT | 2 times | 2 times : 1 layer |
Parallel Radix-4 FFT | 4 times | 8 times : 2 layers; number of addition |
or subtraction in each layer are 4, 4 | ||
Parallel Radix-8 FFT | 16 times | 28 times : 3 layers; number of addition |
or subtraction in each layer are 12, 8, 8 |
Algorithm | Clock cycles | Data bits |
Parallel DFT | 49 | 3103774720 |
Parallel Radix-2 FFT | 53 | 776553472 |
Parallel Radix-4 FFT | 61 | 195188224 |
Parallel Radix-8 FFT | 109 | 50556672 |
Input data | MSD binary system | Decimal system | |
1 | Real part | $10u10uuu01u0uu1u0u1100uuuu1uu10u$ | 105.0817 |
Imaginary part | $1110111101u0u11000110u1u01u0u100$ | 239.1178 | |
2 | Real part | $u1100uu010u0u0u1100uuu011u011000$ | $-$37.6586 |
Imaginary part | $u0u000uu0u0u10010u110u0uu1uu01uu$ | $-$163.2776 | |
3 | Real part | $00u01u1u00110uu1uu11u0u0u01uu0uu$ | $-$26.8343 |
Imaginary part | $110uuu0uu0u0u10u1uu111u011u1u111$ | 162.3563 | |
4 | Real part | $0u00u1uu100uu0u11uu0010uuu11u011$ | $-$70.5971 |
Imaginary part | $1u0011u110000uuu0u1uu01u00u1110u$ | 75.4718 | |
5 | Real part | $010u0u00u0u00111u01uuu10001u10u0$ | 43.4004 |
Imaginary part | $u0u1u0u0u1uu1u0u1001u0u0100u10uu$ | $-$154.4237 | |
6 | Real part | $011u01uuu001u01u1u1uu0u00uu001u1$ | 80.5362 |
Imaginary part | $0011u1u0u11u1u0u01uuuu0111uu1111$ | 41.8243 |
Output data | Ternary optical computer | Electronic | Result | ||
MSD binary system | Decimal system | computer | |||
1 | Real part | $u100u01u001u0u10u101uu10000u10u01u$ | $-$283.7847 | $-$283.7847 | Correct |
Imaginary part | $1u01100uu01u000001u00100u110u0$ | 21.6407 | 21.6407 | ||
2 | Real part | $uu10000100u0010u011u00000001u00u00$ | $-$636.4482 | $-$636.4482 | Correct |
Imaginary part | $1u0u01u10u00uu1u0011u00u1u00100100u1$ | 811.3220 | 811.3220 | ||
3 | Real part | $1uu10u1u00010u0u10uu01000u0101u11u1$ | 720.7079 | 720.7079 | Correct |
Imaginary part | $1u0u11u10u01u0u10u0000u0u0100000010$ | 470.2106 | 470.2106 | ||
4 | Real part | $1uu1u01u000100u100u0010uu101u00u1$ | 82.1162 | 82.1162 | Correct |
Imaginary part | $u10u00100uu10u1u0u010u1000u1001u1u0$ | $-$626.5998 | $-$626.5998 | ||
5 | Real part | $u011u0u100000uu1000u00u001u00u1u0$ | $-$178.0396 | $-$178.0396 | Correct |
Imaginary part | $1uu1u10u11u00001u10uu110u0000001u00u$ | 698.0423 | 698.0423 | ||
6 | Real part | $1u0000u0010u01000100u1000u10u010u$ | 124.4080 | 124.4080 | Correct |
Imaginary part | $u10000u100010u01u001u010u1u011uu100$ | $-$519.6082 | $-$519.6082 |
Output data | Ternary optical computer | Electronic | Result | ||
MSD binary system | Decimal system | computer | |||
1 | Real part | $u10u100100010u100u001u10000u10u01u$ | $-$283.7847 | $-$283.7847 | Correct |
Imaginary part | $1u10u00uu0010000001001000u10u0$ | 21.6407 | 21.6407 | ||
2 | Real part | $u1u1000010u100100uu010000000010u100$ | $-$636.4482 | $-$636.4482 | Correct |
Imaginary part | $10u010uu10u011u01u1u00u1u0011uu00u1$ | 811.3220 | 811.3220 | ||
3 | Real part | $110u1u00010u0u1u0101000u01010u1u01$ | 720.7079 | 720.7079 | Correct |
Imaginary part | $1u0u11u10u01u00u0u0000u0u01000001u0$ | 470.2106 | 470.2106 | ||
4 | Real part | $1u0101u000100u10u100011u101u000u$ | 82.1162 | 82.1162 | Correct |
Imaginary part | $u10u1uu00uu10u0100uu00u000u1001u010$ | $-$626.5998 | $-$626.5998 | ||
5 | Real part | $u1u010u10000u10u0u1100u001u00u1u0$ | $-$178.0396 | $-$178.0396 | Correct |
Imaginary part | $1u10u00u0100001u1u10u10u000000010u1$ | 698.0423 | 698.0423 | ||
6 | Real part | $10000u01u0u01000100u1000u10u01u1$ | 124.4080 | 124.4080 | Correct |
Imaginary part | $u100000u001uu1001001u1uu11u1u1uu100$ | $-$519.6082 | $-$519.6082 |
Transform time (cycles) | |||||
FFT design | FFT size | ||||
16-bit | 16-bit | 16-bit | 16-bit | Device | |
16-point | 64-point | 256-point | 1024-point | ||
Electronic method 1 | 37 | 137 | 525 | 2065 | Spartan-3 |
Electronic method 2 | – | – | – | 1283 | XCV2P30 |
Electronic method 3 | 361 | – | 929 | 448805 | EP2C35 |
Electronic method 4 | – | – | 1072 | – | Vertex-II |
Electronic method 5 | – | – | 255 | – | EP3SL70 |
Electronic method 6 | – | – | 255 | – | 5VSX35T |
Minimum clock cycle in electronic methods | 36 | 137 | 255 | 1283 | – |
Our parallel Radix-4 FFT method | 43 | 49 | 55 | 61 | TOC |
Our parallel Radix-8 FFT method | 91 | 97 | 103 | 109 | TOC |
Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有
京ICP备18024590号-1