FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications

Mahmud Benhamid, Masuri Othman

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    4 Citations (Scopus)

    Abstract

    this paper proposes a novel fully parallel FFT architecture based on Canonical Signed Digit (CSD) multiplier-less targeting wireless communication applications, such as IEEE802.15.3a Wireless Personal Area Network (WPAN) baseband. The proposed architecture has the advantages of high throughput, less latency, and smaller area. The multiplier-less architecture uses shift-and-add operations to realize the complex multiplier and uses the CSD to optimize these operations. The design has been coded in Verilog HDL targeting Xilinx Virtex-II FPGA series. It is fully implemented and tested on real hardware using Virtex-II FG456 prototype board. Based on this architecture, the implementation of 8-points FFT on Virtex-II can run at a maximum clock frequency of about 400 MHz which lead to about 3.2 GS/s throughput with a latency of 6 clock cycles using 16,580 equivalent gates. Comparison with a conventional parallel architecture design of the same size can run only at a maximum clock frequency of 220 MHz or 1.76 GS/s throughput with a latency of 12 clock cycles using 77,418 equivalent gates for the design. The resulting throughput increases by about 82% while the equivalent gates and latency decrease by about 79% and 50% respectively.

    Original languageEnglish
    Title of host publicationIEEE International Conference on Semiconductor Electronics, Proceedings, ICSE
    Pages641-645
    Number of pages5
    DOIs
    Publication statusPublished - 2006
    Event2006 IEEE International Conference on Semiconductor Electronics, ICSE 2006 - Kuala Lumpur
    Duration: 29 Nov 20061 Dec 2006

    Other

    Other2006 IEEE International Conference on Semiconductor Electronics, ICSE 2006
    CityKuala Lumpur
    Period29/11/061/12/06

    Fingerprint

    Fast Fourier transforms
    Field programmable gate arrays (FPGA)
    Clocks
    Throughput
    Communication
    Computer hardware description languages
    Personal communication systems
    Parallel architectures
    Hardware

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    Benhamid, M., & Othman, M. (2006). FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications. In IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE (pp. 641-645). [4266695] https://doi.org/10.1109/SMELEC.2006.380712

    FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications. / Benhamid, Mahmud; Othman, Masuri.

    IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE. 2006. p. 641-645 4266695.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Benhamid, M & Othman, M 2006, FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications. in IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE., 4266695, pp. 641-645, 2006 IEEE International Conference on Semiconductor Electronics, ICSE 2006, Kuala Lumpur, 29/11/06. https://doi.org/10.1109/SMELEC.2006.380712
    Benhamid M, Othman M. FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications. In IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE. 2006. p. 641-645. 4266695 https://doi.org/10.1109/SMELEC.2006.380712
    Benhamid, Mahmud ; Othman, Masuri. / FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications. IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE. 2006. pp. 641-645
    @inproceedings{c1822f61b57c44a5bd8c8d989e7b749b,
    title = "FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications",
    abstract = "this paper proposes a novel fully parallel FFT architecture based on Canonical Signed Digit (CSD) multiplier-less targeting wireless communication applications, such as IEEE802.15.3a Wireless Personal Area Network (WPAN) baseband. The proposed architecture has the advantages of high throughput, less latency, and smaller area. The multiplier-less architecture uses shift-and-add operations to realize the complex multiplier and uses the CSD to optimize these operations. The design has been coded in Verilog HDL targeting Xilinx Virtex-II FPGA series. It is fully implemented and tested on real hardware using Virtex-II FG456 prototype board. Based on this architecture, the implementation of 8-points FFT on Virtex-II can run at a maximum clock frequency of about 400 MHz which lead to about 3.2 GS/s throughput with a latency of 6 clock cycles using 16,580 equivalent gates. Comparison with a conventional parallel architecture design of the same size can run only at a maximum clock frequency of 220 MHz or 1.76 GS/s throughput with a latency of 12 clock cycles using 77,418 equivalent gates for the design. The resulting throughput increases by about 82{\%} while the equivalent gates and latency decrease by about 79{\%} and 50{\%} respectively.",
    author = "Mahmud Benhamid and Masuri Othman",
    year = "2006",
    doi = "10.1109/SMELEC.2006.380712",
    language = "English",
    isbn = "0780397312",
    pages = "641--645",
    booktitle = "IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE",

    }

    TY - GEN

    T1 - FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applications

    AU - Benhamid, Mahmud

    AU - Othman, Masuri

    PY - 2006

    Y1 - 2006

    N2 - this paper proposes a novel fully parallel FFT architecture based on Canonical Signed Digit (CSD) multiplier-less targeting wireless communication applications, such as IEEE802.15.3a Wireless Personal Area Network (WPAN) baseband. The proposed architecture has the advantages of high throughput, less latency, and smaller area. The multiplier-less architecture uses shift-and-add operations to realize the complex multiplier and uses the CSD to optimize these operations. The design has been coded in Verilog HDL targeting Xilinx Virtex-II FPGA series. It is fully implemented and tested on real hardware using Virtex-II FG456 prototype board. Based on this architecture, the implementation of 8-points FFT on Virtex-II can run at a maximum clock frequency of about 400 MHz which lead to about 3.2 GS/s throughput with a latency of 6 clock cycles using 16,580 equivalent gates. Comparison with a conventional parallel architecture design of the same size can run only at a maximum clock frequency of 220 MHz or 1.76 GS/s throughput with a latency of 12 clock cycles using 77,418 equivalent gates for the design. The resulting throughput increases by about 82% while the equivalent gates and latency decrease by about 79% and 50% respectively.

    AB - this paper proposes a novel fully parallel FFT architecture based on Canonical Signed Digit (CSD) multiplier-less targeting wireless communication applications, such as IEEE802.15.3a Wireless Personal Area Network (WPAN) baseband. The proposed architecture has the advantages of high throughput, less latency, and smaller area. The multiplier-less architecture uses shift-and-add operations to realize the complex multiplier and uses the CSD to optimize these operations. The design has been coded in Verilog HDL targeting Xilinx Virtex-II FPGA series. It is fully implemented and tested on real hardware using Virtex-II FG456 prototype board. Based on this architecture, the implementation of 8-points FFT on Virtex-II can run at a maximum clock frequency of about 400 MHz which lead to about 3.2 GS/s throughput with a latency of 6 clock cycles using 16,580 equivalent gates. Comparison with a conventional parallel architecture design of the same size can run only at a maximum clock frequency of 220 MHz or 1.76 GS/s throughput with a latency of 12 clock cycles using 77,418 equivalent gates for the design. The resulting throughput increases by about 82% while the equivalent gates and latency decrease by about 79% and 50% respectively.

    UR - http://www.scopus.com/inward/record.url?scp=35148858113&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=35148858113&partnerID=8YFLogxK

    U2 - 10.1109/SMELEC.2006.380712

    DO - 10.1109/SMELEC.2006.380712

    M3 - Conference contribution

    AN - SCOPUS:35148858113

    SN - 0780397312

    SN - 9780780397316

    SP - 641

    EP - 645

    BT - IEEE International Conference on Semiconductor Electronics, Proceedings, ICSE

    ER -