Abstract
Facial expression recognition under classroom scenes can help the teacher to understand students’ classroom learning status and improve teaching effectiveness. Aiming at the problem of low expression recognition accuracy in classroom scenarios, a novel multi-scale facial expression recognition algorithm based on improved Res2Net is proposed. Firstly, a bi-directional residual BiRes2Net module is proposed to achieve bi-directional multi-scale expression feature extraction at the fine-grained level, while a short-directed connection path is introduced to make the network have the self-closing capability and avoid extracting redundant information of expressions; Then the Fine-Grained Coordinate Attention (FGCA) mechanism is embedded to extract expression spatial location features and channel features at a fine-grained level by making full use of the prior knowledge of facial expressions; Finally, a multi-classification Focalloss loss function is used to alleviate the imbalance of expression data, and different weights are assigned to expression samples with different recognition difficulty so that the network is biased towards difficult sample feature extraction. The experimental results show that the recognition accuracy of the proposed method is 79.47%, 94.06%, and 96.67% in RAF-DB, JAFFE, and CK+ datasets respectively, and up to 72.71% in real classroom scenes, which are better than other comparative algorithms significantly.
Similar content being viewed by others
Data availability statements
The classroom scenario dataset we created is available from the corresponding author on reasonable request. The access links of the publicly available dataset are as follows:
RAF-DB: http://www.whdeng.cn/RAF/model1.html
Abbreviations
- BiRes2Net:
-
Bi-directional residual Res2Net Module
- FGCA:
-
Fine-Grained Coordinate Attention
- RAF-DB:
-
Real-world Affective Faces Database
- JAFFE:
-
The Japanses Female Facial Expression Database
- CK +:
-
The Extended Cohn-Kanade Dataset.
- EMFACS:
-
Emotional Facial Action Coding System
- SCN:
-
Self-Cure Convolutional Neural Network
- ICID:
-
Inter-Domain Facial Expression Recognition Feature Fusion Network
- IC:
-
Intra-category Common feature
- ID:
-
Inter-category Distinction feature
- FDRL:
-
Feature Decomposition and Reconstruction Learning
- FDN:
-
Feature Decomposition Network
- FRN:
-
Feature Reconstruction Network
- DLP-CNN:
-
Deep locality-preserving CNN
- DMFA-ResNet:
-
deep multiscale fusion attention residual network
- CERT:
-
the Computer Expression Recognition Toolbox
- SE:
-
Squeeze-and-Excitation
- CA:
-
Coordinate Attention
- NE:
-
Natural
- DI:
-
Disgust
- FE:
-
Fear
- AN:
-
Anger
- HA:
-
Happiness
References
Dimitrios K, Viktoriia S, Stefanos Z. (2021) Distribution matching for heterogeneous multi-task learning: a large-scale face study. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition(CVPR), https://doi.org/10.48550/arXiv.2015.03790
Gao SH, Cheng MM, Zhao K et al (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/tpami.2019.2938758
Gao T, Zhaochen Y, Ting C et al (2021) Deep multi-scale fusion attention residual face expression recognition network[J]. J Intell Syst 17(2):393–401. https://doi.org/10.11992/tis.202107028
Gupta SK, Ashwin TS, Guddeti RMR (2019) Students' affective content analysis in smart classroom environment using deep learning techniques. Multimed Tools Appl 78(18):25321–25348. https://doi.org/10.1007/s11042-019-7651-z
Hou Q, Zhou D, Feng J. (2021) Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 13713-13722. https://doi.org/10.1109/cvpr46437.2021.01350
Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 7132–7141. https://doi.org/10.1109/cvpr.2018.00745
Ji Y, Hu Y, Yang Y et al (2019) Cross-domain facial expression recognition via an intra-category common feature and inter-category distinction feature fusion network. Neurocomput 333:231–239. https://doi.org/10.1016/j.neucom.2018.12.037
Li, D (2021) Research on facial expression recognition based on capsule network. Southwest University. https://doi.org/10.27684/d.cnki.gxndx.2021.003154
Li S, Deng W (2019) Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(01):356–370. https://doi.org/10.1109/tip.2018.2868382
Li S, Deng W, Du J P (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 2852-2861. https://doi.org/10.1109/cvpr.2017.277
Li Y, Zeng J, Shan S et al (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(05):2439–2450. https://doi.org/10.1109/tip.2018.2886767
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. Proceed IEEE Int Conf Comput Vis:2980–2988. https://doi.org/10.1109/iccv.2017.324
Lucey, P, Cohn, JF, Kanade, T, et al (2010) The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, IEEE, pp 94–101. https://doi.org/10.1109/cvprw.2010.5543262
Lyons, M, Akamatsu, S, Kamachi, M, Gyoba, J (1998) Coding facial expressions with gabor wavelets. In Proceedings Third IEEE international conference on automatic face and gesture recognition, IEEE, pp. 200–205. https://doi.org/10.1109/AFGR.1998.670949
Minaee S, Minaei M, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors. 21(9):3046. https://doi.org/10.3390/s21093046
Li Peng (2020) Research on an end-to-end student emotion recognition system to assist university classroom teaching. University of Electronic Science and Technology. https://doi.org/10.27005/d.cnki.gdzku.2020.003411
Radlak K, Smolka B (2016) High dimensional local binary patterns for facial expression recognition in the wild. Mediterranean Electrotechnical Conference(MELECON), pp. 1–5. https://doi.org/10.1109/melcon.2016.7495381
Renneberg B, Heyn K, Gebhard R et al (2005) Facial expression of emotions in borderline personality disorder and depression. J Behav Ther Exp Psychiatry 36(03):183–196. https://doi.org/10.1016/j.jbtep.2005.05.002
Ruan D, Yan Y, Lai S, et al (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 7660-7669. https://doi.org/10.1109/cvpr46437.2021.00757
Selvaraju R R, Cogswell M, Das A, et al (2017) Grade-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision(ICCV), pp 618-626. https://doi.org/10.1109/iccv.2017.74
Sherly Alphonse A, Dharma D (2017) A novel monogenic directional pattern and pseudo-Voigt kernel for facilitating the identification of facial emotions. J Vis Commun Image Represent 49(03):459–470. https://doi.org/10.1016/j.jvcir.2017.10.008
Song Y, Gao S, Zeng H, et al (2021) Multi-scale depth-separable expression recognition with embedded attention mechanism. J Beijing Univ Aeronaut Astronaut https://doi.org/10.13700/j.bh.1001-5965.2021.0114
Stewart A, Bosch N, Chen H, et al (2017) Face forward: detecting mind wandering from video during narrative film comprehension. International conference on artificial intelligence, pp 359-370. https://doi.org/10.1007/978-3-319-61425-0_30
Su C, Wang L, Lan VJ (2021) A fine-grained expression recognition model based on multi-scale hierarchical bilinear pooling network. Comput Eng 47(12):299–307. https://doi.org/10.19678/j.issn.1000-3428.0060133
Sun Y, Wen G (2017) Cognitive facial expression recognition with constrained dimensionality reduction. Neurocomput 100(230):397–408. https://doi.org/10.1016/j.neucom.2016.12.043
Sun W, Zhao H, Jin Z (2018) A visual attention based ROI detection method for facial expression recognition. Neurocomputing 296(01):12–22. https://doi.org/10.1016/j.neucom.2018.03.034
Tan M, Pang R, Le Q V (2020) Efficientdet: scalable and efficient object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 10781-10790. https://doi.org/10.1109/cvpr42600.2020.01079
van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605
Vemulapalli R, Agarwala A (2019) A compact embedding for facial expression similarity. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 5683-5692. https://doi.org/10.1109/cvpr.2019.00583
Wang K, Peng X, Yang J, et al (2020) Suppressing uncertainties for large-scale facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 6897-6906. https://doi.org/10.1109/cvpr42600.2020.00693
Whitehill J, Serpell Z, Lin YC et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(01):86–98. https://doi.org/10.1109/taffc.2014.2316163
Yao L, Wan Y, Ni H, Xu B; (2021) Action unit classification for facial expression recognition using active learning and SVM . Multimed Tools Appl, https://doi.org/10.1007/s11042-021-10836-w
Yongqiang LV (2021) Research on face expression recognition in natural scenes. Huazhong Normal University. https://doi.org/10.27159/d.cnki.ghzsu.2021.002034
Yu Z (2018) Emotion recognition based on small resolution faces and its application in information-based teaching. Shanghai Jiaotong University. https://doi.org/10.27307/d.cnki.gsjtu.2018.004755
Zhang P, Kong W, Teng J (2022) Face expression recognition based on multi-scale feature attention mechanism. Comput Eng Appl 58(01):182–189. https://doi.org/10.19304/j.issn1000-7180.2021.0799
Zhu R, Sang G, Zhao Q (2016) Discriminative feature adaptation for cross-domain facial expression recognition. 2016 international conference on biometrics (ICB), IEEE, pp 1-7. https://doi.org/10.1109/icb.2016.7550085
Acknowledgements
This work was supported in part by Postgraduate Innovation Fund Project of Xi’an Polytechnic University (chx2022012).
Author information
Authors and Affiliations
Contributions
Meihua Gu and Jing Feng designed the research, performed the research, Yalu Chu analyzed the data, all authors contributed to the writing and revisions.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Compliance with ethical standards
Informed consentᅟ
Consent for publication
Not applicable.
The ethics agreement
Code of Ethics for Socio-Economic Research and Declaration of Helsinki.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gu, M., Feng, J. & Chu, Y. A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes. Multimed Tools Appl 83, 16525–16542 (2024). https://doi.org/10.1007/s11042-023-16115-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16115-0