A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes | Multimedia Tools and Applications Skip to main content
Log in

A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Facial expression recognition under classroom scenes can help the teacher to understand students’ classroom learning status and improve teaching effectiveness. Aiming at the problem of low expression recognition accuracy in classroom scenarios, a novel multi-scale facial expression recognition algorithm based on improved Res2Net is proposed. Firstly, a bi-directional residual BiRes2Net module is proposed to achieve bi-directional multi-scale expression feature extraction at the fine-grained level, while a short-directed connection path is introduced to make the network have the self-closing capability and avoid extracting redundant information of expressions; Then the Fine-Grained Coordinate Attention (FGCA) mechanism is embedded to extract expression spatial location features and channel features at a fine-grained level by making full use of the prior knowledge of facial expressions; Finally, a multi-classification Focalloss loss function is used to alleviate the imbalance of expression data, and different weights are assigned to expression samples with different recognition difficulty so that the network is biased towards difficult sample feature extraction. The experimental results show that the recognition accuracy of the  proposed method is 79.47%, 94.06%, and 96.67% in RAF-DB, JAFFE, and CK+ datasets respectively, and up to 72.71% in real classroom scenes, which are better than other comparative algorithms significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability statements

The classroom scenario dataset we created is available from the corresponding author on reasonable request. The access links of the publicly available dataset are as follows:

RAF-DB: http://www.whdeng.cn/RAF/model1.html

JAFFE: http://www.kasrl.org/jaffe.html

CK+: http://www.consortium.ri.cmu.edu/ckagree/

Abbreviations

BiRes2Net:

Bi-directional residual Res2Net Module

FGCA:

Fine-Grained Coordinate Attention

RAF-DB:

Real-world Affective Faces Database

JAFFE:

The Japanses Female Facial Expression Database

CK +:

The Extended Cohn-Kanade Dataset.

EMFACS:

Emotional Facial Action Coding System

SCN:

Self-Cure Convolutional Neural Network

ICID:

Inter-Domain Facial Expression Recognition Feature Fusion Network

IC:

Intra-category Common feature

ID:

Inter-category Distinction feature

FDRL:

Feature Decomposition and Reconstruction Learning

FDN:

Feature Decomposition Network

FRN:

Feature Reconstruction Network

DLP-CNN:

Deep locality-preserving CNN

DMFA-ResNet:

deep multiscale fusion attention residual network

CERT:

the Computer Expression Recognition Toolbox

SE:

Squeeze-and-Excitation

CA:

Coordinate Attention

NE:

Natural

DI:

Disgust

FE:

Fear

AN:

Anger

HA:

Happiness

References

  1. Dimitrios K, Viktoriia S, Stefanos Z. (2021) Distribution matching for heterogeneous multi-task learning: a large-scale face study. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition(CVPR), https://doi.org/10.48550/arXiv.2015.03790

  2. Gao SH, Cheng MM, Zhao K et al (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/tpami.2019.2938758

    Article  Google Scholar 

  3. Gao T, Zhaochen Y, Ting C et al (2021) Deep multi-scale fusion attention residual face expression recognition network[J]. J Intell Syst 17(2):393–401. https://doi.org/10.11992/tis.202107028

    Article  Google Scholar 

  4. Gupta SK, Ashwin TS, Guddeti RMR (2019) Students' affective content analysis in smart classroom environment using deep learning techniques. Multimed Tools Appl 78(18):25321–25348. https://doi.org/10.1007/s11042-019-7651-z

    Article  Google Scholar 

  5. Hou Q, Zhou D, Feng J. (2021) Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 13713-13722. https://doi.org/10.1109/cvpr46437.2021.01350

  6. Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 7132–7141. https://doi.org/10.1109/cvpr.2018.00745

  7. Ji Y, Hu Y, Yang Y et al (2019) Cross-domain facial expression recognition via an intra-category common feature and inter-category distinction feature fusion network. Neurocomput 333:231–239. https://doi.org/10.1016/j.neucom.2018.12.037

    Article  Google Scholar 

  8. Li, D (2021) Research on facial expression recognition based on capsule network. Southwest University. https://doi.org/10.27684/d.cnki.gxndx.2021.003154

  9. Li S, Deng W (2019) Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(01):356–370. https://doi.org/10.1109/tip.2018.2868382

    Article  MathSciNet  Google Scholar 

  10. Li S, Deng W, Du J P (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 2852-2861. https://doi.org/10.1109/cvpr.2017.277

  11. Li Y, Zeng J, Shan S et al (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(05):2439–2450. https://doi.org/10.1109/tip.2018.2886767

    Article  MathSciNet  Google Scholar 

  12. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. Proceed IEEE Int Conf Comput Vis:2980–2988. https://doi.org/10.1109/iccv.2017.324

  13. Lucey, P, Cohn, JF, Kanade, T, et al (2010) The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, IEEE, pp 94–101. https://doi.org/10.1109/cvprw.2010.5543262

  14. Lyons, M, Akamatsu, S, Kamachi, M, Gyoba, J (1998) Coding facial expressions with gabor wavelets. In Proceedings Third IEEE international conference on automatic face and gesture recognition, IEEE, pp. 200–205. https://doi.org/10.1109/AFGR.1998.670949

  15. Minaee S, Minaei M, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors. 21(9):3046. https://doi.org/10.3390/s21093046

    Article  Google Scholar 

  16. Li Peng (2020) Research on an end-to-end student emotion recognition system to assist university classroom teaching. University of Electronic Science and Technology. https://doi.org/10.27005/d.cnki.gdzku.2020.003411

  17. Radlak K, Smolka B (2016) High dimensional local binary patterns for facial expression recognition in the wild. Mediterranean Electrotechnical Conference(MELECON), pp. 1–5. https://doi.org/10.1109/melcon.2016.7495381

  18. Renneberg B, Heyn K, Gebhard R et al (2005) Facial expression of emotions in borderline personality disorder and depression. J Behav Ther Exp Psychiatry 36(03):183–196. https://doi.org/10.1016/j.jbtep.2005.05.002

    Article  Google Scholar 

  19. Ruan D, Yan Y, Lai S, et al (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 7660-7669. https://doi.org/10.1109/cvpr46437.2021.00757

  20. Selvaraju R R, Cogswell M, Das A, et al (2017) Grade-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision(ICCV), pp 618-626. https://doi.org/10.1109/iccv.2017.74

  21. Sherly Alphonse A, Dharma D (2017) A novel monogenic directional pattern and pseudo-Voigt kernel for facilitating the identification of facial emotions. J Vis Commun Image Represent 49(03):459–470. https://doi.org/10.1016/j.jvcir.2017.10.008

    Article  Google Scholar 

  22. Song Y, Gao S, Zeng H, et al (2021) Multi-scale depth-separable expression recognition with embedded attention mechanism. J Beijing Univ Aeronaut Astronaut https://doi.org/10.13700/j.bh.1001-5965.2021.0114

  23. Stewart A, Bosch N, Chen H, et al (2017) Face forward: detecting mind wandering from video during narrative film comprehension. International conference on artificial intelligence, pp 359-370. https://doi.org/10.1007/978-3-319-61425-0_30

  24. Su C, Wang L, Lan VJ (2021) A fine-grained expression recognition model based on multi-scale hierarchical bilinear pooling network. Comput Eng 47(12):299–307. https://doi.org/10.19678/j.issn.1000-3428.0060133

    Article  Google Scholar 

  25. Sun Y, Wen G (2017) Cognitive facial expression recognition with constrained dimensionality reduction. Neurocomput 100(230):397–408. https://doi.org/10.1016/j.neucom.2016.12.043

    Article  Google Scholar 

  26. Sun W, Zhao H, Jin Z (2018) A visual attention based ROI detection method for facial expression recognition. Neurocomputing 296(01):12–22. https://doi.org/10.1016/j.neucom.2018.03.034

    Article  Google Scholar 

  27. Tan M, Pang R, Le Q V (2020) Efficientdet: scalable and efficient object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 10781-10790. https://doi.org/10.1109/cvpr42600.2020.01079

  28. van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  29. Vemulapalli R, Agarwala A (2019) A compact embedding for facial expression similarity. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 5683-5692. https://doi.org/10.1109/cvpr.2019.00583

  30. Wang K, Peng X, Yang J, et al (2020) Suppressing uncertainties for large-scale facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition(CVPR), pp 6897-6906. https://doi.org/10.1109/cvpr42600.2020.00693

  31. Whitehill J, Serpell Z, Lin YC et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(01):86–98. https://doi.org/10.1109/taffc.2014.2316163

    Article  Google Scholar 

  32. Yao L, Wan Y, Ni H, Xu B; (2021) Action unit classification for facial expression recognition using active learning and SVM . Multimed Tools Appl, https://doi.org/10.1007/s11042-021-10836-w

  33. Yongqiang LV (2021) Research on face expression recognition in natural scenes. Huazhong Normal University. https://doi.org/10.27159/d.cnki.ghzsu.2021.002034

  34. Yu Z (2018) Emotion recognition based on small resolution faces and its application in information-based teaching. Shanghai Jiaotong University. https://doi.org/10.27307/d.cnki.gsjtu.2018.004755

  35. Zhang P, Kong W, Teng J (2022) Face expression recognition based on multi-scale feature attention mechanism. Comput Eng Appl 58(01):182–189. https://doi.org/10.19304/j.issn1000-7180.2021.0799

    Article  Google Scholar 

  36. Zhu R, Sang G, Zhao Q (2016) Discriminative feature adaptation for cross-domain facial expression recognition. 2016 international conference on biometrics (ICB), IEEE, pp 1-7. https://doi.org/10.1109/icb.2016.7550085

Download references

Acknowledgements

This work was supported in part by Postgraduate Innovation Fund Project of Xi’an Polytechnic University (chx2022012).

Author information

Authors and Affiliations

Authors

Contributions

Meihua Gu and Jing Feng designed the research, performed the research, Yalu Chu analyzed the data, all authors contributed to the writing and revisions.

Corresponding author

Correspondence to Meihua Gu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Compliance with ethical standards

Informed consentᅟ

Consent for publication

Not applicable.

The ethics agreement

Code of Ethics for Socio-Economic Research and Declaration of Helsinki.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, M., Feng, J. & Chu, Y. A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes. Multimed Tools Appl 83, 16525–16542 (2024). https://doi.org/10.1007/s11042-023-16115-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16115-0

Keywords

Navigation