HFPQ: deep neural network compression by hardware-friendly pruning-quantization

Fan, YingBo; Pang, Wei; Lu, ShengLi

doi:10.1007/s10489-020-01968-x

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

Published: 23 February 2021

Volume 51, pages 7016–7028, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1034 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Multiple Residual Quantization of Pruning

Differentiable Joint Pruning and Quantization for Hardware Efficiency

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Article 31 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition. IEEE , p 2012
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi v (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Closing the gap to human-level performance in face verification. Deepface. In: IEEE computer vision and pattern recognition (CVPR), vol 5, p 6
Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) Ganfit: generative adversarial network fitting for high fidelity 3d face reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1155–1164
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, pp 1974–1982
He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 578–587
Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K (2017) Learning to reason: end-to-end module networks for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 804–813
Shrestha R, Kafle K, Kanan C (2019) Answer them all! Toward universal visual question answering models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10472–10481
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Advances in neural information processing systems, pp 1379–1387
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
Tung F, Mori G (2018) Clip-q: deep network compression learning by in-parallel pruning-quantization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7873–7882
Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017) Incremental network quantization: towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
Liu Z, Xu J, Peng X, Xiong R (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in neural information processing systems, pp 1043–1053
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598– 605
Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE, pp 293–299
Polyak A, Wolf L (2015) Channel-level acceleration of deep face representations. IEEE Access 3:2163–2175
Article Google Scholar
Luo J-H, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066
Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D (2019) Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2790–2799
Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5456–5464
Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Wang P, Cheng J (2017) Fixed-point factorized networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4012–4020
Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1(2):239–242
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images, In Citeseer
O Russakovsky, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Li J, He L, Ren S, Mao R (2018) Data fine-pruning: a simple way to accelerate neural network training. In: Zhang F, Zhai J, Snir M, Jin H, Kasahara H, Valero M (eds) Network and parallel computing. Springer International Publishing, Cham, pp 114–125
Pöllot M, Zhang R, Kaup A (2020) An efficient alternative to network pruning through ensemble learning. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4027–4031
He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866
Luo J., Wu J., Lin W. (2017) Thinet: a filter level pruning method for deep neural network compression. In: 2017 IEEE international conference on computer vision (ICCV), pp 5068–5076

Download references

Acknowledgements

This work was supported by the Department of Science and Technology of Jiangsu Province, China (BE2018002-2, BE2018002-3)

Author information

Authors and Affiliations

National ASIC System Engineering Research Center, Southeast University, No.2 SipailouRoad, Xuanwu District, Nanjing, Jiangsu Province, China
YingBo Fan, Wei Pang & ShengLi Lu

Authors

YingBo Fan
View author publications
You can also search for this author inPubMed Google Scholar
Wei Pang
View author publications
You can also search for this author inPubMed Google Scholar
ShengLi Lu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to ShengLi Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, Y., Pang, W. & Lu, S. HFPQ: deep neural network compression by hardware-friendly pruning-quantization. Appl Intell 51, 7016–7028 (2021). https://doi.org/10.1007/s10489-020-01968-x

Download citation

Accepted: 21 September 2020
Published: 23 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s10489-020-01968-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Residual Quantization of Pruning

Differentiable Joint Pruning and Quantization for Hardware Efficiency

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now