Abstract
The use of convolutional neural networks trained with the margin-based softmax function allows achieving the highest accuracy in the face recognition problem. The development of embedded systems such as smart intercoms has increased interest in lightweight neural networks. Thus, lightweight neural network models, trained using the margin-based softmax function, were proposed for the face identification problem. In the present paper, we propose a distillation method that allows obtaining greater accuracy than other methods for the face recognition problem on LFW, AgeDB-30, and Megaface datasets. The main idea of our approach is to use the class centers of the teacher network to initialize the student network. Then the student network is trained to produce biometric vectors the angles from which to the class centers are equal to the angles in the teacher network.
Similar content being viewed by others
REFERENCES
Chen, S., Liu, Y., Gao, X., and Han, Z., Mobilefacenets: Efficient CNNs for accurate real-time face verification on mobile devices, in Chin. Conf. Biometric Recognit., Cham: Springer, 2018, pp. 428–438.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C., Mobilenetv2: Inverted residuals and linear bottlenecks, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (2018), pp. 4510–4520.
Deng, J., Guo, J., Xue, N., and Zafeiriou, S., Arcface: Additive angular margin loss for deep face recognition, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2019), pp. 4690–4699.
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L., Sphereface: Deep hypersphere embedding for face recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 212–220.
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., and Liu, W., Cosface: Large margin cosine loss for deep face recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2018), pp. 5265–5274.
Huang, G.B., Mattar, M., Berg, T., and Learned-Miller, E., Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Workshop Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008).
Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., and Zafeiriou, S., Agedb: The first manually collected, in-the-wild age database, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (2017), pp. 51–59.
Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., and Brossard, E., The megaface benchmark: 1 million faces for recognition at scale, Proc. IEEE Conf. Comput. Vis. PatternRecognit. (2016), pp. 4873–4882.
Hinton, G., Vinyals, O., and Dean, J., Distilling the knowledge in a neural network, 2015. arXiv:1503.02531.
Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., and Ramabhadran, B., Efficient knowledge distillation from an ensemble of teachers, Interspeech, 2017, pp. 3697–3701.
Sau, B.B. and Balasubramanian, V.N., Deep model compression: Distilling knowledge from noisy teachers, 2016. arXiv:1610.09650.
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., and Anandkumar, A., Born again neural networks, Int. Conf. Mach. Learn. PMLR (2018), pp. 1607–1616.
Huang, Z. and Wang, N., Like what you like: Knowledge distill via neuron selectivity transfer, 2017. arXiv:1707.01219.2017.
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y., Fitnets: Hints for thin deep nets, 2014. arXiv:1412.6550.
Chen, H., Wang, Y., Xu, C., Xu, C., and Tao, D., Learning student networks via feature embedding, IEEE Trans. Neural Networks Learn. Syst., 2020, vol. 32, no. 1, pp. 25–35.
Park, W., Kim, D., Lu, Y., and Cho M., Relational knowledge distillation, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2019), pp. 3967–3976.
Feng, Y., Wang, H., Hu, H.R., Yu, L., Wang, W., and Wang S., Triplet distillation for deep face recognition, 2020 IEEE Int. Conf. Image Process. (ICIP) (2020), pp. 808–812.
Duong, C.N., Luu, K., Quach, K.G., and Le, N., Shrinkteanet: Million-scale lightweight face recognition via shrinking teacher–student networks, 2019. arXiv:1905.10620.
Nekhaev, D., Milyaev, S., and Laptev, I., Margin based knowledge distillation for mobile face recognition, in Twelfth Int. Conf. Mach. Vis. (ICMV 2019), Int. Soc. Opt. Photonics, 2020, vol. 11433, 114330O.
He, K., Zhang, X., Ren, S., and Sun, J., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016), pp. 770–778.
Zhang, K., Zhang, Z., Li, Z., and Qiao, Y., Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Process. Lett., 2016, vol. 23, no. 10, pp. 1499–1503.
Guo, Y., Zhang, L., Hu, Y., He, X., and Gao, J., Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in Eur. Conf. Comput. Vis., Cham: Springer, 2016, pp. 87–102.
Ng, H.W. and Winkler, S., A data-driven approach to cleaning large face datasets, 2014 IEEE Int. Conf. Image Process. (ICIP) (2014), pp. 343–347.
Robbins, H. and Monro, S., A stochastic approximation method, Ann. Math. Stat., 1951, pp. 400–407.
Grabovoy, A.V. and Strijov, V.V., Bayesian distillation of deep learning models, Autom. Remote Control, 2021, vol. 82, no. 11, pp. 1846–1856.
Grabovoy, A.V. and Strijov, V.V., Probabilistic interpretation of the distillation problem, Autom. Remote Control, 2022, vol. 83, no. 1, pp. 123–137.
MarginDistillation: distillation for margin-based softmax. https://github.com/david-svitov/margindistillation. Accessed January 8, 2022.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by V. Potapchouck
Rights and permissions
About this article
Cite this article
Svitov, D.V., Alyamkin, S.A. Distilling Face Recognition Models Trained Using Margin-Based Softmax Function. Autom Remote Control 83, 1517–1526 (2022). https://doi.org/10.1134/S00051179220100046
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S00051179220100046