Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware

Trinh, Quang-Kien; Duong, Quang-Manh; Dao, Thi-Nga; Nguyen, Van-Thanh; Nguyen, Hong-Phong

doi:10.1007/978-3-030-63083-6_9

Quang-Kien Trinh¹⁷,
Quang-Manh Duong¹⁷,
Thi-Nga Dao¹⁷,
Van-Thanh Nguyen¹⁸ &
…
Hong-Phong Nguyen¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 334))

Included in the following conference series:

International Conference on Industrial Networks and Intelligent Systems

500 Accesses

Abstract

In recent years, neural networks based algorithms have been widely applied in computer vision applications. FPGA technology emerges as a promising choice for hardware acceleration owing to high-performance and flexibility; energy-efficiency compared to CPU and GPU; fast development round. FPGA recently has gradually become a viable alternative to the GPU/CPU platform.

This work conducts a study on the practical implementation of neural network accelerators based-on reconfigurable hardware (FPGA). This systematically analyzes utilization-accuracy-performance trade-offs in the hardware implementations of neural networks using FPGAs and discusses the feasibility of applying those designs in reality.

We have developed a highly generic architecture for implementing a single neural network layer, which eventually permits further construct arbitrary networks. As a case study, we implemented a neural network accelerator on FPGA for MNIST and CIFAR-10 dataset. The major results indicate that the hardware design outperforms by at least 1500 times when the parallel coefficient \( p \) is 1 and maybe faster up to 20,000 times when that is 16 compared to the implementation on the software while the accuracy degradations in all cases are negligible, i.e., about 0.1% lower. Regarding resource utilization, modern FPGA undoubtedly can accommodate those designs, e.g., 2-layer design with \( p \) equals 4 for MNIST and CIFAR occupied 26% and 32% of LUT on Kintex-7 XC7K325T respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques

Article 29 March 2024

Optimizing Neural Networks for Efficient FPGA Implementation: A Survey

Article 11 January 2021

An Anatomization of FPGA-Based Neural Networks

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS 25, pp. 1106–1114. Curran Associates, Inc. (2012)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Neural Information Processing Systems, pp. 1988–1996 (2014)
Google Scholar
Ji, S., Xu, W.: 3D convolutional neural networks for automatic human action recognition. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Abdel-Hamid, O.: Convolutional neural networks for speech recognition. In: Audio Speech & Language Processing (2014)
Google Scholar
https://github.com/Xilinx/chaidnn. Accessed 31 Mar 2020
https://www.xilinx.com/support/documentation/whitepapers/wp504-accel-dNeuralnetworks.pdf. Accessed 31 Mar 2020
http://www.deephi.com/technology/dnndk. Accessed 31 Mar 2020
Abadi, M., et al.: Tensorflow: Large-scale ML on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015a)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Google Scholar
Guo, K., Zeng, S., Yu, J., Wang, Y., Yang, H.: A survey of FPGA-based neural network accelerator (2017). arXiv:1712.08934v3
Liang, S., Yin, S., Liu, L., Luk, W., Wei, S.: FP-BNN: binarized neural network on FPGA. Neurocomputing 275, 1072–1086 (2017). Accessed 18 Oct 2017. https://doi.org/10.1016/j.neucom.2017.09.046
NVIDIA, Tesla K40 GPU Active Accelerator, NVIDIA (2013)
Google Scholar
Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Int. Solid-State Circ. Conf. (ISSCC) (2016)
Google Scholar
Ovtcharov, K., Ruwase, O., Kim, J.-Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep CNNs using specialized hardware. In: Microsoft Research Whitepaper, vol. 2, no. 11 (2015)
Google Scholar
Putnam, A., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: International Symposium on Computer Architecture (ISCA), p. 1324 (2014)
Google Scholar
Bettoni, M., Urgese, G., Kobayashi, Y., Macii, E., Acquaviva, A.: A convolutional neural network fully implemented on FPGA for embedded platforms. In: 2017 New Generation of CAS (NGCAS), Genova, pp. 49–52 (2017). https://doi.org/10.1109/ngcas.2017.16
Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., Marr, D.: Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, pp. 77–84 (2016). https://doi.org/10.1109/fpt.2016.7929192
Nurvitadhi, E., Sim, J., Sheffield, D., Mishra, A., Krishnan, S., Marr, D.: Accelerating recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, pp. 1–4 (2016). https://doi.org/10.1109/fpl.2016.7577314
http://yann.lecun.com/exdb/mnist/. Accessed 31 Mar 2020
Krizhevsky, A.: CIFAR-10 AND CIFAR-100 DATASETS (2009). https://www.cs.toronto.edu/~kriz/cifar.html
https://becominghuman.ai/best-languages-for-machine-learning-in-2020-6034732dd24. Accessed 31 Mar 2020
https://opensource.com/article/18/5/top-8-open-source-ai-technologies-machine-learning. Accessed 31 Mar 2020
Feng, J., He, X., Teng, Q., Ren, C., Chen, H., Li, Y.: Reconstruction of porous media from extremely limited information using conditional generative adversarial networks. Phys. Rev. E. 100, 033308 (2019). https://doi.org/10.1103/physreve.100.033308
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM (2017)
Google Scholar
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_1/ug937-vivado-design-suite-simulation-tutorial.pdf. Accessed 06 Jun 2020
Aurelien Gron.: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1^st edn. O’Reilly Media (2017)
Google Scholar
https://www.xilinx.com/support/documentation/selection-guides/7-series-product-selection-guide.pdf. Accessed 06 Jun 2020
https://www.xilinx.com/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf. Accessed 06 Jun 2020
https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf. Accessed 06 Jun 2020

Download references

Acknowledgment

This research is funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2018.310.

Author information

Authors and Affiliations

Le Quy Don Technical University, Hanoi, Vietnam
Quang-Kien Trinh, Quang-Manh Duong, Thi-Nga Dao & Hong-Phong Nguyen
Posts and Telecommunications Institute of Technology, Hanoi, Vietnam
Van-Thanh Nguyen

Authors

Quang-Kien Trinh
View author publications
You can also search for this author in PubMed Google Scholar
Quang-Manh Duong
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Nga Dao
View author publications
You can also search for this author in PubMed Google Scholar
Van-Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Phong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quang-Manh Duong .

Editor information

Editors and Affiliations

Faculty of Electrical and Electronics Engineering, Duy Tan University, Da Nang, Vietnam
Nguyen-Son Vo
Le Quy Don Technical University, Hanoi, Vietnam
Van-Phuc Hoang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trinh, QK., Duong, QM., Dao, TN., Nguyen, VT., Nguyen, HP. (2020). Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware. In: Vo, NS., Hoang, VP. (eds) Industrial Networks and Intelligent Systems. INISCOM 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 334. Springer, Cham. https://doi.org/10.1007/978-3-030-63083-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-63083-6_9
Published: 21 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63082-9
Online ISBN: 978-3-030-63083-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques

Optimizing Neural Networks for Efficient FPGA Implementation: A Survey

An Anatomization of FPGA-Based Neural Networks

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques

Optimizing Neural Networks for Efficient FPGA Implementation: A Survey

An Anatomization of FPGA-Based Neural Networks

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation