Abstract
This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition. IEEE , p 2012
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi v (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Closing the gap to human-level performance in face verification. Deepface. In: IEEE computer vision and pattern recognition (CVPR), vol 5, p 6
Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) Ganfit: generative adversarial network fitting for high fidelity 3d face reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1155–1164
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, pp 1974–1982
He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 578–587
Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K (2017) Learning to reason: end-to-end module networks for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 804–813
Shrestha R, Kafle K, Kanan C (2019) Answer them all! Toward universal visual question answering models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10472–10481
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Advances in neural information processing systems, pp 1379–1387
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
Tung F, Mori G (2018) Clip-q: deep network compression learning by in-parallel pruning-quantization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7873–7882
Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017) Incremental network quantization: towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
Liu Z, Xu J, Peng X, Xiong R (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in neural information processing systems, pp 1043–1053
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598– 605
Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE, pp 293–299
Polyak A, Wolf L (2015) Channel-level acceleration of deep face representations. IEEE Access 3:2163–2175
Luo J-H, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066
Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D (2019) Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2790–2799
Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5456–5464
Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Wang P, Cheng J (2017) Fixed-point factorized networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4012–4020
Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1(2):239–242
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images, In Citeseer
O Russakovsky, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Li J, He L, Ren S, Mao R (2018) Data fine-pruning: a simple way to accelerate neural network training. In: Zhang F, Zhai J, Snir M, Jin H, Kasahara H, Valero M (eds) Network and parallel computing. Springer International Publishing, Cham, pp 114–125
Pöllot M, Zhang R, Kaup A (2020) An efficient alternative to network pruning through ensemble learning. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4027–4031
He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866
Luo J., Wu J., Lin W. (2017) Thinet: a filter level pruning method for deep neural network compression. In: 2017 IEEE international conference on computer vision (ICCV), pp 5068–5076
Acknowledgements
This work was supported by the Department of Science and Technology of Jiangsu Province, China (BE2018002-2, BE2018002-3)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, Y., Pang, W. & Lu, S. HFPQ: deep neural network compression by hardware-friendly pruning-quantization. Appl Intell 51, 7016–7028 (2021). https://doi.org/10.1007/s10489-020-01968-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01968-x