Optimization of FPGA-based CNN accelerators using metaheuristics

Sait, Sadiq M.; El-Maleh, Aiman; Altakrouri, Mohammad; Shawahna, Ahmad

doi:10.1007/s11227-022-04787-8

Optimization of FPGA-based CNN accelerators using metaheuristics

Published: 27 September 2022

Volume 79, pages 4493–4533, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sadiq M. Sait ORCID: orcid.org/0000-0002-4796-0581^1,2,
Aiman El-Maleh^1,2,
Mohammad Altakrouri¹ &
…
Ahmad Shawahna¹

639 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to other computing technologies such as graphics processing units (GPUs). The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver the optimal performance more challenging. This is because of the exponential increase in the design variables that must be considered when implementing a \(\text{Multi-CLP}\) accelerator as CNN’s complexity increases. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a \(\text {Multi-CLP}\) accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors (DSPs), block random access memories (BRAMs), and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The \(\text {SA-/TS-based}\) \(\text {Multi-CLP}\) achieves \(1.31{\times}~-~2.37{\times}\) higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL

Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Article 01 October 2023

Generating Efficient FPGA-based CNN Accelerators from High-Level Descriptions

Article 26 July 2022

Data availability

The pre-trained models that support the findings of this study are taken from Pytorch Torchvision without any fine-tuning. The model architectures are available in “https://pytorch.org/vision/0.8/models.html”.

References

Hu X, Lu X, Hori C (2014) Mandarin speech recognition using convolution neural network with augmented tone features. In: The 9th International Symposium on Chinese Spoken Language Processing. pp 15–18 https://doi.org/10.1109/ISCSLP.2014.6936674
Khalil-Hani M, Sung LS (2014) A convolutional neural network approach for face verification. In: 2014 International Conference on High Performance Computing Simulation (HPCS). pp 707–714 https://doi.org/10.1109/HPCSim.20146903759
Farfade S S, M J Saberian, Li-J Li (2015) Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. pp 643–650 https://doi.org/10.1145/2671188.2749408
Zheng J, Wang Y, Zeng W (2015) CNN based vehicle counting with virtual coil in traffic surveillance video. In: 2015 IEEE International Conference on Multimedia Big Data. pp 280–281. https://doi.org/10.1109/BigMM.2015.56
Wang R, Xu Z (2015) A pedestrian and vehicle rapid identification model based on convolutional neural network. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. pp 1–4. https://doi.org/10.1145/2808492.2808524
Lau MM, Lim KH, Gopalai AA (2015) Malaysia traffic sign recognition with convolutional neural network. In: 2015 IEEE International Conference on Digital Signal Processing DSP. pp 1006–1010. https://doi.org/10.1109/ICDSP.2015.7252029
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process syst 25
Shawahna A, Sait SM, V A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859
Article Google Scholar
Feng X, Jiang Y, Yang X et al (2019) Computer vision algorithms and hardware implementations: a survey. Integration 69:309–320. https://doi.org/10.1016/j.vlsi.2019.07.005
Article Google Scholar
Ghimire D, Kil D, Kim S (2022) A survey on Efficient convolutional neural networks and hardware acceleration. Electronics. https://doi.org/10.3390/electronics11060945
Article Google Scholar
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: International conference on artificial neural networks. Springer. 8681:281–290. https://doi.org/10.1007/978-3-319-11179-7_36
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:1704.04861
Horng GJ, Liu MX, Chen CC (2020) The smart image recognition mechanism for crop harvesting system in intelligent agriculture. IEEE Sensor J 20(5):2766–2781. https://doi.org/10.1109/JSEN.2019.2954287
Article Google Scholar
Jiang H, Li X, Safara F (2021) IoT-based agriculture: deep learning in detecting apple fruit diseases. Microprocess Microsyst. https://doi.org/10.1016/j.micpro
Article Google Scholar
Li H, Fan X, Jiao L, et al (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL). pp 1–9. https://doi.org/10.1109/FPL.2016.7577308
Zhang C, Li P, Guang Y, et al. 2015 Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. pp 161–170. https://doi.org/10.1145/2684746.2689060
Suda N, Chandra V, Dasika G, et al. (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. pp 16–25. https://doi.org/10.1145/2847263.2847276
Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). pp 535–547. https://doi.org/10.1145/3079856.3080221
Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). pp 535–547. https://doi.org/10.1109/HPCA.2017.29
Osman IH, Kelly JP (1996) Metaheuristics: an overview. Meta-heur. https://doi.org/10.1007/978-1-4613-1361-8_1
Article Google Scholar
Rere LMR, Fanany MI, Arymurthy AM (2015) Simulated annealing algorithm for deep learning. Proc Comput Sci 72:137–144. https://doi.org/10.1016/j.procs
Article Google Scholar
Iandola FN, Han S,Moskewicz MW, et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size. In: arXiv preprint arXiv:1409.1556
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprintarXiv:1409.1556
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9
Shawahna A, Sait SM, El-Maleh A et al (2022) FxP-QNet: a post-training quantizer for the design of mixed low-precision DNNs with dynamic fixed-point representation. IEEE Access 10:30202–30231. https://doi.org/10.1109/ACCESS.2022.3157893
Article Google Scholar
Cho M, Kim Y (2021) FPGA-based convolutional neural network accelerator with resource optimized approximate multiply accumulate unit. Electronics. https://doi.org/10.3390/electronics10222859
Article Google Scholar
Pouchet LN, Zhang P, Sadayappan P, et al. (2013) Polyhedral-based data reuse optimization for configurable computing. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. pp 29–38. https://doi.org/10.1145/2435264.2435273
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65
Article Google Scholar
Xilinx. Vivado Design Suite Product Guide: Floating- Point Operator v7.1 [Online]. Available:https://docs.xilinx.com/v/u/en-US/pg060-floatingpoint (2020)
Xilinx. User Guide: 7 Series FPGAs Memory Resources [Online]. Available:https://docs.xilinx.com/v/u/en -US/ug473_7Series_Memory_Resources (2019)
Sait Sadiq M, Habib Y (1999) Iterative computer algorithms with applications in engineering: solving combinatorial optimization problems. IEEE, Los Alamitos, CA . p 387
Sait SM, Youssef H (1999) VLSI physical design automation: theory and practice. World Scientific, 6
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671
Article MATH Google Scholar
Cerny V (1985) Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J optimiz Theory Appl 45(1):41–51. https://doi.org/10.1007/BF00940812
Article MATH Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, et al (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114
Article MATH Google Scholar
Youssef H, Sait SM, Adiche H (2001) Evolutionary algorithms, simulated annealing and tabu search: a comparative study. Eng Appl Artif Intell 14(2):167–181
Article Google Scholar
Glover F (1989) Tabu search—part I. ORSA J comput 1(3):190–206. https://doi.org/10.1287/ijoc.1.3.190
Article MATH Google Scholar
Glover F (1990) Tabu search—part II. ORSA J comput 2(1):4–32
Article MATH Google Scholar
Glover F, Laguna M. (1998) "Tabu search”. In: Handbook of combinatorial optimization. Springer., pp. 2093–2229 https://doi.org/10.1007/978-1-4613-0303-9_33
Glover F, Laguna M (1998) “Tabu search”. In: Handbook of combinatorial optimization. Springer: 2093–2229. https://doi.org/10.1007/978-1-4613-0303-9_33
Russakovsky O, Deng J, Hao S et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article Google Scholar
Xilinx.(2019) User Guide: VC707 Evaluation Board for the Virtex-7 FPGA [Online]. Available: https://docs.xilinx.com/v/u/en-US/ug885_VC707_ Eval_Bd
Xilinx.(2019) User Guide: VC709 Evaluation Board for the Virtex-7 FPGA [Online]. Available: https://docs.xilinx.com/v/u/en-US/ug887-vc709-eval-board-v7-fpga
Garcia P, Bhowmik D, Stewart R et al (2019) Optimized memory allocation and power minimization for FPGA-based image processing. J Imaging 5(1):7. https://doi.org/10.3390/jimaging5010007
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia, for all support.

Author information

Authors and Affiliations

Department of Computer Engineering, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia
Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri & Ahmad Shawahna
Interdisciplinary Research Center for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia
Sadiq M. Sait & Aiman El-Maleh

Authors

Sadiq M. Sait
View author publications
You can also search for this author in PubMed Google Scholar
Aiman El-Maleh
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Altakrouri
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Shawahna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sadiq M. Sait.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest that could have appeared to influence the work reported in this manuscript.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sait, S.M., El-Maleh, A., Altakrouri, M. et al. Optimization of FPGA-based CNN accelerators using metaheuristics. J Supercomput 79, 4493–4533 (2023). https://doi.org/10.1007/s11227-022-04787-8

Download citation

Accepted: 22 August 2022
Published: 27 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11227-022-04787-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Optimization of FPGA-based CNN accelerators using metaheuristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL

Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Generating Efficient FPGA-based CNN Accelerators from High-Level Descriptions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimization of FPGA-based CNN accelerators using metaheuristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL

Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Generating Efficient FPGA-based CNN Accelerators from High-Level Descriptions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation