Abstract
Adversarial training of lightweight models faces poor effectiveness problem due to the limited model size and the difficult optimization of loss with hard labels. Adversarial distillation is a potential solution to the problem, in which the knowledge from large adversarially pre-trained teachers is used to guide the lightweight models’ learning. However, adversarially pre-training teachers is computationally expensive due to the need for iterative gradient steps concerning the inputs. Additionally, the reliability of guidance from teachers diminishes as lightweight models become more robust. In this paper, we propose an adversarial distillation method called Sample-Adaptive Multi-teacher Dynamic Rectification Adversarial Distillation (SA-MDRAD). First, an adversarial distillation framework of distilling logits and features from the heterogeneous standard pre-trained teachers is developed to reduce pre-training expenses and improve knowledge diversity. Second, the knowledge of teachers is distilled into the lightweight model after sample-aware dynamic rectification and adaptive fusion based on teachers’ predictions to improve the reliability of knowledge. Experiments are conducted to evaluate the performance of the proposed method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. The results demonstrate that our SA-MDRAD is more effective than existing adversarial distillation methods in enhancing the robustness of lightweight image classification models against various adversarial attacks.
Similar content being viewed by others
Data availability
The data that support the findings of this study are openly available and can be derived from the following resources available in the public domain at https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz, and http://cs231n.stanford.edu/tiny-imagenet-200.zip.
References
Amik F, Tasin A, Ahmed S, Elahi M, Mohammed N(2022)Dynamic Rectification Knowledge Distillation.Preprint. https://doi.org/10.48550/arXiv.2201.11319
Boban S, Ivan P, Jovan B, Boban B, Danilo O(2022)Single and multiple drones detection and identification using RF based deep learning algorithm.Expert Systems with Application 187:115928–115943
Bojia Z, Shihao Z, Xingjun M, Yu-Gang J. (2021). Revisiting adversarial robustness distillation: Robust soft labels make student better. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision(ICCV 2021), Montreal, QC, Canada. https://doi.org/10.48550/arXiv.2108.07969
Cao G, Wang Z, Dong X, Zhang Z, Guo H, Qin Z, Ren K(2022)Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training.Preprint. https://doi.org/10.48550/arXiv.2206.02158
Carlini N, Wagner D. (2017). Towards evaluating the robustness of neural networks. Paper presented at the Proceedings - IEEE Symposium on Security and Privacy. https://doi.org/10.1109/SP.2017.49, arXiv:1608.04644
Chen T, Zhang Z, Liu S, Chang S, Wang Z. (2021). Robust overfitting may be mitigated by properly learned smoothening. Paper presented at the International Conference on Learning Representations
Cohen J, Rosenfeld E, Kolter JZ. (2019). Certified Adversarial Robustness via Randomized Smoothing. Paper presented at the International Conference on Machine Learning(ICML). https://doi.org/10.48550/arXiv.1902.02918
Francesco, C., Matthias, H.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. Paper presented at the International Conference on Machine Learning (2020). https://doi.org/10.48550/arXiv.2003.01690
Geoffrey H, Oriol V, Jeff D. (2014). Distilling the knowledge in a neural network. Paper presented at the NIPS 2014 Deep Learning Workshop. https://doi.org/10.4140/TCP.n.2015.249
Goldblum, M., Fowl, L., Feizi, S., Goldstein, T.: Adversarially robust distillation. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence (2020). https://doi.org/10.1609/aaai.v34i04.5816
Goodfellow IJ, Shlens J, Szegedy C. (Explaining and harnessing adversarial examplesICML, 2015
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016). https://doi.org/10.1109/CVPR.2016.90
Huang B, Chen M, Wang Y, Lu J, Cheng M, Wang W. (2023). Boosting accuracy and robustness of student models via adaptive adversarial distillationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202324668–24677
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.: Densely connected convolutional networks. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.48550/arXiv.1608.06993
Huang K, Sui T, Wu H(2022)3D human pose estimation with multi-scale graph convolution and hierarchical body pooling.MULTIMEDIA SYST 28 (2):403–412. https://doi.org/10.1007/s00530-021-00808-3
Huang, Z., Shen, X., Jun, X., Liu, T., Tian, X., Li, H., Deng, B., Huang, J., Hua, X.: Revisiting Knowledge Distillation: An Inheritance and Exploration Framework. Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition (2021). https://doi.org/10.48550/arXiv.2107.00181
Ioffe S, Szegedy C. (Batch normalization: Accelerating deep network training by reducing internal covariate shiftInternational conference on machine learning, 2015. pmlr, p 448–456
J Z, J Y, B H, J Z, T L, G N, J Z, J X, Yang H.: Reliable adversarial distillation with unreliable teachers. Paper presented at the International Conference On Learning Representations (2022). https://doi.org/10.48550/arXiv.2106.04928
Jeong J, Shin J. (2020). Consistency regularization for certified robustness of smoothed classifiers. Paper presented at the Advances in Neural Information Processing Systems(NIPS). https://doi.org/10.48550/arXiv.2006.04062
Jia, X., Wei, X., Cao, X., Hassan, F.: Comdefend: An efficient image compression model to defend adversarial examples. Paper presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.48550/arXiv.1811.12673
Jung J, Jang H, Song J, Lee J. (2023). PeerAiD: Improving Adversarial Distillation from a Specialized Peer TutorProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202424482–24491
Khattak S, Jan S, Ahmad I, Wadud Z, Khan FQ(2021)An effective security assessment approach for Internet banking services via deep analysis of multimedia data.MULTIMEDIA SYST 27 (4):733–751. https://doi.org/10.1007/s00530-020-00680-7
Krizhevsky A, Hinton G(2009)Learning multiple layers of features from tiny images.Preprint
Li T, Han Y(2023)Improving transferable adversarial attack for vision transformers via global attention and local drop.MULTIMEDIA SYST. https://doi.org/10.1007/s00530-023-01157-z
Liu, C., Salzmann, M., Lin, T., Tomioka, R., Usstrunk, S.: On the loss landscape of adversarial training: Identifying challenges and how to overcome them. Paper presented at the Conference and Workshop on Neural Information Processing Systems (2020). https://doi.org/10.48550/arXiv.2006.08403
Liu S, Tang Y, Tian Y, Su H(2023)Visual driving assistance system based on few-shot learning.MULTIMEDIA SYST 29 (5):2853–2863. https://doi.org/10.1007/s00530-021-00830-5
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. (2018). Towards deep learning models resistant to adversarial attacks. Paper presented at the 6th International Conference on Learning Representations(ICLR). https://doi.org/10.48550/arXiv.1706.06083
Maroto J, Ortiz-Jiménez G, Frossard P(2022)On the benefits of knowledge distillation for adversarial robustness.ArXiv abs/2203.07159
Nakkiran, P.: Adversarial Robustness May Be at Odds With Simplicity. Paper presented at the (2019). https://doi.org/10.48550/arXiv.1901.00532
Pouransari H, Ghili S. (2015). Tiny imagenet visual recognition challenge. Paper presented at the CS231N
Rame, A., Cord, M.: DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation. Paper presented at the International Conference On Learning Representations (2021). https://doi.org/10.48550/arXiv.2101.05544
Romero, A., Ballas, N., Ebrahimi Kahou, S., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. Paper presented at the International Conference on Learning Representations (2015). https://doi.org/10.48550/arXiv.1412.6550
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. Paper presented at the 018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00474
Shiji Z, Jie Y, Zhenlong S, Bo Z, Xingxing W. (2022). Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation. Paper presented at the European Conference on Computer Vision(ECCV)
Simonyan K, Zisserman A(2014)Very deep convolutional networks for large-scale image recognition.Preprint. https://doi.org/10.48550/arXiv.1409.1556
Su, D., Zhang, H., Chen, H., Yi, J., Chen, P.Y., Gao, Y.: Is Robustness the Cost of Accuracy? – A Comprehensive Study on the Robustness of 18 Deep Image Classification Models. Paper presented at the European Conference on Computer Vision (2018). https://doi.org/10.48550/arXiv.1808.01688
Sukumar A, Subramaniyaswamy V, Ravi L, Vijayakumar V, Indragandhi V(2021)Robust image steganography approach based on RIWT-Laplacian pyramid and histogram shifting using deep learning.MULTIMEDIA SYST 27 (4):651–666. https://doi.org/10.1007/s00530-020-00665-6
Sun, B., Tsai, N., Liu, F., Yu, R., Hao, S.: Adversarial defense by stratified convolutional sparse coding. Paper presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.48550/arXiv.1812.00037
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R(2013)Intriguing properties of neural networks.CoRRabs/1312.6199
Ta N, Chen H, Liu X, Jin N(2023)LET-Net: locally enhanced transformer network for medical image segmentation.MULTIMEDIA SYST. https://doi.org/10.1007/s00530-023-01165-z
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. Paper presented at the International Conference On Learning Representations (2020). https://doi.org/10.48550/arXiv.1910.10699
Tripathy JK, Chakkaravarthy SS, Satapathy SC, Sahoo M, Vaidehi V(2022)ALBERT-based fine-tuning model for cyberbullying analysis.MULTIMEDIA SYST 28 (6):1941–1949. https://doi.org/10.1007/s00530-020-00690-5
Wang Y, Zou D, Yi J. (Improving adversarial robustness requires revisiting misclassified examples
Wang Y, Zou D, Yi J, James B, Ma X. (2020). Improving adversarial robustness requires revisiting misclassified examples. Paper presented at the International Conference On Learning Representations
Wu, B., Chen, J., Cai, D., He, X., Gu, Q.: Does network width really help adversarial robustness? Paper presented at the (2020). https://doi.org/10.48550/arXiv.2010.01279
Yang H, Zhang J, Dong H, Inkawhich N, Gardner AB, Touchet A, Wilkes W, Berry H, Li HH(2020)DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles.ArXivabs/2009.14720
Ye M, Xu J, Nan G, Wang Y(2023)Anomaly detection based on multi-teacher knowledge distillation.J SYST ARCHITECT 138:102861. https://doi.org/10.1016/j.sysarc.2023.102861
Yuan J, He Z. (2020). Ensemble generative cleaning with feedback loops for defending adversarial attacks. Paper presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). https://doi.org/10.1007/s11042-021-10760-z
Zagoruyko S, Komodakis N. (2016). Wide residual networks. Paper presented at the European Conference on Computer Vision 2018 camera ready Machine Learning. https://doi.org/10.48550/arXiv.1605.07146
Zhang H, Chen D, Wang C. (2022). Confidence-Aware Multi-Teacher Knowledge Distillation. Paper presented at the 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP). https://doi.org/10.48550/arXiv.2201.00007
Zhang H, Wang Q, Feng G(2023)Artistic image adversarial attack via style perturbation.MULTIMEDIA SYST. https://doi.org/10.1007/s00530-023-01183-x
Zhang, H., Yu, Y., Jiao, J., Xing, E., Jordan, M.: Theoretically principled trade-off between robustness and accuracy. Paper presented at the International conference on machine learning (ICML) (2019). https://doi.org/10.48550/arXiv.1901.08573
Zhang K, Zhou H, Bian H, Zhang W, Yu N(2022)Certified defense against patch attacks via mask-guided randomized smoothing.Science China Information Sciences 65
Zhibo, W., Hengchang, G., Zhifei, Z., Wenxin, L., Zhan, Q., Kui, R.: Feature importance-aware transferable adversarial attacks. Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.48550/arXiv.2107.14185
Acknowledgements
This research work was supported in National Key Research and Development Program of China (2021YFB1006201); the Major Science and Technology Project of Henan Province in China (221100211200-02).
Funding
National Key Research and Development Program of China, 2021YFB1006201, Major Science and Technology Project of Henan Province in China, 221100211200-02.
Author information
Authors and Affiliations
Contributions
SL: proposed the idea and wrote the main manuscript text. XY: performed the data analysis. GC: performed the validation. WL: acquisition of data. HH: methodology. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no competing interests related to the content of this article.
Additional information
Communicated by Haojie Li.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Yang, X., Cheng, G. et al. SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation. Multimedia Systems 30, 225 (2024). https://doi.org/10.1007/s00530-024-01416-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01416-7