Abstract
Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3779–3787 (2019)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Passalis, N., Tefas, A.: Probabilistic knowledge transfer for deep representation learning. CoRR, abs/1803.10837 1(2), 5 (2018)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhou, Z., Zhuge, C., Guan, X., Liu, W.: Channel distillation: channel-wise attention for knowledge distillation. arXiv preprint arXiv:2006.01683 (2020)
Acknowledgements
This research is supported by the Sichuan Science and Technology Program(No.2020YFS0307), Mianyang Science and Technology Program(2020YFZJ016), Sichuan Provincial M. C. Integration Office Program, and IEDA laboratory of SWUST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q. et al. (2023). FPD: Feature Pyramid Knowledge Distillation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-30105-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)