FPD: Feature Pyramid Knowledge Distillation

Wang, Qi; Liu, Lu; Yu, Wenxin; Zhang, Zhiqiang; Liu, Yuxin; Cheng, Shiyu; Zhang, Xuewen; Gong, Jun

doi:10.1007/978-3-031-30105-6_9

Qi Wang¹²,
Lu Liu¹²,
Wenxin Yu¹²,
Zhiqiang Zhang¹³,
Yuxin Liu¹²,
Shiyu Cheng¹²,
Xuewen Zhang¹² &
…
Jun Gong¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13623))

Included in the following conference series:

International Conference on Neural Information Processing

1704 Accesses
2 Citations

Abstract

Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms

Article 03 January 2023

Cross-Layer Fusion for Feature Distillation

Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification

Article 02 May 2024

References

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3779–3787 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Passalis, N., Tefas, A.: Probabilistic knowledge transfer for deep representation learning. CoRR, abs/1803.10837 1(2), 5 (2018)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhou, Z., Zhuge, C., Guan, X., Liu, W.: Channel distillation: channel-wise attention for knowledge distillation. arXiv preprint arXiv:2006.01683 (2020)

Download references

Acknowledgements

This research is supported by the Sichuan Science and Technology Program(No.2020YFS0307), Mianyang Science and Technology Program(2020YFZJ016), Sichuan Provincial M. C. Integration Office Program, and IEDA laboratory of SWUST.

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, Sichuan, China
Qi Wang, Lu Liu, Wenxin Yu, Yuxin Liu, Shiyu Cheng & Xuewen Zhang
Hosei Univeristy, Tokyo, Japan
Zhiqiang Zhang
Southwest Automation Research Institute, Chengdu, China
Jun Gong

Authors

Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xuewen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Yu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q. et al. (2023). FPD: Feature Pyramid Knowledge Distillation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-30105-6_9
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics