Implementation of High Speed Fast Fourier Transform using VHDL For DSP Application
Volumn 2

Implementation of High Speed Fast Fourier Transform using VHDL For DSP Application

Vanshri Kadwe#1, Prof.Supratim Saha#2, Prof.Parag Jawarkar#3

# Department of Electronics and Communication Engineering

Tulsiramji Gaikwad Patil College of Engineering

RTMNU, Nagpur, Maharashtra, India


In this paper, we present a VHDL implementation of a 16 point FFT processor using the Xilinx FPGA simulation and synthesis suite. The speed improvement factor for FFT computation versus direct computation is also presented for various point sizes ranging from 4 to 128.

Keywords— FFT, fast fourier transform, discrete fourier transform, FPGA, decimation in time, decimation in frequency


A Fast Fourier Transform (FFT) is an efficient algorithm to compute the Discrete Fourier Transform (DFT) and its inverse. There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory. The fast Fourier Transform is a highly efficient procedure for computing the DFT of a finite series and requires less number of computations than that of direct evaluation of DFT. It reduces the computations by taking advantage of the fact that the calculation of the coefficients of the DFT can be carried out iteratively. Due to this, FFT computation technique is used in digital spectral analysis, filter simulation, autocorrelation and pattern recognition [1, 2].

The FFT is based on decomposition and breaking the transform into smaller transforms and combining them to get the total transform. FFT reduces the computation time required to compute a discrete Fourier transform and improves the performance by a factor of 100 or more over direct evaluation of the DFT.

A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly; computing a DFT of N points in the obvious way, using the definition, takes O( N2 ) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations.

The difference in speed can be substantial [3, 4, 5], especially for long data sets where N may be in the thousands or millions—in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N /log (N). This huge improvement made many DFT-based algorithms practical. FFT’s are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers.

The most well known FFT algorithms depend upon the factorization of N, but there are FFT with O (N log N) complexity for all N, even for prime N. Many FFT algorithms only depend on the fact that  is an N th primitive root of unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms.

The Fast Fourier Transform algorithm exploit the two basic properties of the twiddle factor – the symmetry property and periodicity property which reduces the number of complex multiplications required to perform DFT.

FFT algorithms are based on the fundamental principle of decomposing the computation of discrete Fourier Transform of a sequence of length N into successively smaller discrete Fourier transforms. There are basically two classes of FFT algorithms.

A) Decimation In Time (DIT) algorithm

B) Decimation In Frequency (DIF) algorithm.

In decimation-in-time, the sequence for which we need the DFT is successively divided into smaller sequences and the DFTs of these sub sequences are combined in a certain pattern to obtain the required DFT of the entire sequence. In the decimation-in-frequency approach, the frequency samples of the DFT are decomposed into smaller and smaller sub sequences in a similar manner.

The number of complex multiplication and addition operations required by the simple forms both the Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) is of order N2 as there are N data points to calculate, each of which requires N complex arithmetic operations.

The discrete Fourier transform (DFT) is defined by the formula [6, 7, 8]

Where  K is an integer ranging from 0 to N − 1.

The algorithmic complexity of DFT will O(N2) and hence is not a very efficient method. If we can’t do any better than this then the DFT will not be very useful for the majority of practical DSP application. However, there are a number of different ‘Fast Fourier Transform’ (FFT) algorithms that enable the calculation the Fourier transform of a signal much faster than a DFT. As the name suggests, FFTs are algorithms for quick calculation of discrete Fourier transform of a data vector. The FFT is a DFT algorithm which reduces the number of computations needed for N points from O(N 2) to O(N log N) where log is the base-2 logarithm. If the function to be transformed is not harmonically related to the sampling frequency, the response of an FFT looks like a ‘sin c’ function (sin x) / x.

The Radix-2 DIT algorithm rearranges the DFT of the function xn into two parts: a sum over the even-numbered indices n = 2m and a sum over the odd-numbered indices n = 2m + 1:

One can factor a common multiplier  out of the second sum in the equation. It is the two sums are the DFT of the even-indexed part x2m and the DFT of odd-indexed part x2m + 1 of the function xn. Denote the DFT of the Even-indexed inputs x2m by Ek and the DFT of the Odd-indexed inputs x2m + 1 by Ok and we obtain:

However, these smaller DFTs have a length of N/2, so we need compute only N/2 outputs: thanks to the periodicity properties of the DFT, the outputs for N/2 < k < N from a DFT of length N/2 are identical to the outputs for 0< k < N/2. That is, Ek + N / 2 = Ek and Ok + N / 2 = Ok. The phase factor exp[ − 2πik / N]  called a twiddle factor which obeys the relation: exp[ − 2πi(k + N / 2) / N] = e − πiexp[ − 2πik / N] = − exp[ − 2πik / N], flipping the sign of the Ok + N / 2 terms. Thus, the whole DFT can be calculated as follows:

This result, expressing the DFT of length N recursively in terms of two DFTs of size N/2, is the core of the radix-2 DIT fast Fourier transform. The algorithm gains its speed by re-using the results of intermediate computations to compute multiple DFT outputs. Note that final outputs are obtained by a +/− combination of Ek and Okexp( − 2πik / N), which is simply a size-2 DFT; when this is generalized to larger radices below, the size-2 DFT is replaced by a larger DFT (which itself can be evaluated with an FFT).

This process is an example of the general technique of divide and conquers algorithms. In many traditional implementations, however, the explicit recursion is avoided, and instead one traverses the computational tree in breadth-first fashion [9, 10].

Fig 1.1 Decimation In Time FFT

In the DIT algorithm, the twiddle multiplication is performed before the butterfly stage whereas for the DIF algorithm, the twiddle multiplication comes after the Butterfly stage.

Fig 1.2 : Decimation In Frequency FFT

The ‘Radix 2’ algorithms are useful if N is a regular power of 2 (N=2p). If we assume that algorithmic complexity provides a direct measure of execution time and that the relevant logarithm base is 2 then as shown in table 1.1, ratio of execution times for the (DFT) vs. (Radix 2 FFT) increases tremendously with increase in N.

The term ‘FFT’ is actually slightly ambiguous, because there are several commonly used ‘FFT’ algorithms [11, 12, 13]. There are two different Radix 2 algorithms, the so-called ‘Decimation in Time’ (DIT) and ‘Decimation in Frequency’ (DIF) algorithms. Both of these rely on the recursive decomposition of an N point transform into 2 (N/2) point transforms.

Table 1.1: Comparison of Execution Times, DFT & Radix – 2 FFT


Basically FFT algorithms are developed by means of divide and conquer method , is depending on the decomposition of an N point DFT in to smaller DFT’s. If N is factored as N = r1, r2, r3 .. rL where r1=r2=…=rL=r, then rL =N. where r is called as Radix of FFT algorithm. If r= 2, then it is called as radix-2 FFT algorithm,. The basic DFT is of size of 2. The N point DFT is decimated into 2 point DFT by two ways such as Decimation In Time (DIT) and Decimation In Frequency (DIF) algorithm. Both the algorithm take the advantage of periodicity and symmetry property of the twiddle factor.

The radix-2 decimation-in-frequency FFT is an important algorithm obtained by the divide and conquers approach. The Fig. 1.2 below shows the first stage of the 8-point DIF algorithm.

Fig. 1.3: First Stage of 8 point Decimation in Frequency Algorithm.

The decimation, however, causes shuffling in data [14]. The entire process involves v = log2 N stages of decimation, where each stage involves N/2 butterflies of the type shown in the Fig. 1.3.

Fig. 1.4: Butterfly Scheme.

Here                  is the Twiddle factor. Consequently, the computation of N-point DFT via this algorithm requires (N/2) log2 N complex multiplications. For illustrative purposes, the eight-point decimation-in frequency algorithm is shown in the Figure below. We observe, as previously stated, that the output sequence occurs in bit-reversed order with respect to the input. Furthermore, if we abandon the requirement that the computations occur in place, it is also possible to have both the input and output in normal order. The 8 point Decimation In frequency algorithm is shown in Fig 1.5.

Fig. 1.5: 8 point Decimation in Frequency Algorithm


The FFT computation is accomplished in three stages. The x(0) until x(15) variables are denoted as the input values for FFT computation and X(0) until X(15) are denoted as the outputs. The pipeline architecture of the 16 point FFT is shown in Fig 1.6 consisting of butterfly schemes in it. There are two operations to complete the computation in each stage.

Fig 1.6: Architecture of 16 point FFT.

The upward arrow will execute addition operation while downward arrow will execute subtraction operation. The subtracted value is multiplied with twiddle factor value before being processed into the next stage. This operation is done concurrently and is known as butterfly process.

The implementation of FFT flow graph in the VHDL requires three stages, final computation is done and the result is sent to the variable X(0) to X(15). Equation in each stage is used to construct scheduling diagram.

For stage one, computation is accomplished in three clock cycles denoted as S0 to S2.The operation is much simpler compared with FFT. This is because FFT processed both real and imaginary value. The result from FFT is represented in real and imaginary value because of the multiplication of twiddle factor. Twiddle factor is a constant defined by the number of point used in this transform. This scheduling diagram is derived from the equations obtain in FFT signal flow graph. The rest of the scheduling diagrams can be sketched in the same way as shown in figure 1.7. Thus each stage requires a clock cycle and totally three clock cycles are needed. Scheduling diagrams are a part of behavioral modeling and Synthesis steps to translate the algorithmic description into RTL (register transfer level) in VHDL design.


A. Design Software

The implementations have been carried out using the software, Xilinx ISE 9.2i. The hardware language used is the Very High Speed Integrated Circuit Hardware Description Language (VHDL). VHDL is a widely used language for register transfer level description of hardware. It is used for design entry, compile and simulation of digital systems.

B. Interface

The architectural design consist of data inputs, control unit, clocks and the data output. The register may be of the array of four or eight variable in the type of real. The FFT implementation in VHDL consists of three states such as start, load and run.

Fig 1.7: Scheduling Diagram.

C. Design Summary


This project describes the efficient use of VHDL code for the implementation of radix 2 based FFT architecture and the wave form result of the various stages has been obtained successfully. The accuracy in obtained results has been increased with the help of efficient coding in VHDL. The accuracy in results depends upon the equations obtained from the butterfly diagram and then on the correct drawing of scheduling diagrams based on these equations.


  1. Y.Jung, H.Yoon, J.Kim, “New Efficient FFT Algorithm and Pipeline Implementation Results for OFDM/DMT Applications,” IEEE Transactions on Consumer Electronics, vol.49, No.1, 2013, pp. 14-20.
  2. M.Petrov, M. Glesner, “Optimal FFT Architecture Selection for OFDM Receivers on FPGA,” In Proc. of  2015 IEEE Intern. Conf. on Field Programmable Technology, pp. 313–314, PI. 0-7803-9407-0, Singapore.
  3. H.L.Lin, H.Lin, Y.C. Chen and R.C. Chang, “A Novel Pipelined Fast Fourier Transform Architecture for Double Rate OFDM Systems,”  IEEE workshop on signal processing systems design and implementation,  pp. 7-11, ISBN 03-8504-7, Texas, USA, 2014.
  4. C. Eleanor Chu, G. Alan, Inside the FFT black box Serial and Parallel Fast Fourier Transform Algorithms, part II, CRC Press LLC, Boca Raton, 2015.
  5. K. Maharatna, E. Grass, U. Jagdhold, “A Low-Power 64-point FFT/IFFT Architecture for Wireless Broadband Communication,” 7th Intern. Conference on Mobile Multimedia Communication (MoMuC) 2015, Tokyo, Japan. 2A-2-1-2A-2-4.
  6. W.Li, L.Wanhammar, “Complex multiplication reduction in FFT processor,”  SSoCC’02, Falkenberg, Sweden, Mar. 2012.
  7. W. Li and L. Wanhammar, “An FFT processor based on 16-point module,” In Proc. of NorChip Conf., pp. 125–130, Stockholm, Sweden, Nov. 2011. 
  8. M.Arioua, S.Belkouch, M.M.Hassani, “Complex multiplication reduction in pipeline FFT architecture,”  In Proc. of  20th Intern. Conf. on Computer Theory and Applications (ICCTA), Alexandria, Egypt, Oct. 2010.
  9. B.Wang, Q. Zhang, T.Ao, M.Huang, “Design of Pipelined FFT Processor Based on FPGA,” In Proc. of the 2nd Intern. Conf. on Computer Modeling and Simulation (ICCMS’10), pp. 432–435, ISBN 978-1-4244-5642-0, Jan. 2010, Sanya, Hainan.
  10. P.Verma, H. Kaur, M.Singh, M, B.Singh, “ VHDL Implementation of FFT/IFFT Blocks for OFDM,” In Proc. of Intern. Conf. on Advances in Recent Technologies in Communication and Computing, pp. 186-188, PI. 978-1-4244-5104-3, Kerala, 2009.
  11. U.M. Baese, Digital Signal Processing with Field Programmable Gate Arrays, 3rd ed. Springer, 2007.
  12. S. He and M. Torkelson, “A new approach to pipeline FFT processor,” In Proc. of the 10th Intern. Parallel Processing Symp. (IPPS), pp. 766–770, Honolulu, Hawaii, USA, April 1996.
  13. J.W.Cooley, J.W.Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Computation, vol.19, pp. 297-301, 1965.

Related posts



Solar Street light using microcontroller




Leave a Comment