Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator | SpringerLink
Skip to main content

Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator

  • Conference paper
  • First Online:
Business Intelligence (CBI 2022)

Abstract

Facial Emotion Recognition (FER) is an important and challenging task in computer vision due to different issues such as quality of images, the correlation between same expression, computational complexity, and it requires a large amount of data. This paper presents a novel approach to the FER task. We are motivated by the success of Vision Transformer (ViT) and the Convolutional Neural Network (CNN) on image classification in general and facial emotion recognition.

The Swin Transformer (ST) is a hierarchical transformer that uses shifted windows to compute representation. The advantages of ST include limiting self-attention computing, and has linear computational complexity to image size. This paper studies and compares both ST and Deep CNN architecture when merged by different merging layers. The proposed approach is tested on the FER2013 and CK+ data sets. Experimental results demonstrate the high performance of the Average Merging Layer (AML), and our method outperforms state-of-the-art methods on FER2013 and CK+.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8579
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10724
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971). https://doi.org/10.1037/h0030377

  2. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2323 (1998). https://doi.org/10.1109/5.726791

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

  4. Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild (2021). arXiv:2106.04520

  5. Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021). arXiv:2103.14030

  6. Hung, J.C., Lin, K.C., Lai, N.X.: Recognizing learning emotion based on convolutional neural networks and transfer learning. Appl. Soft Comput. J. 84, 105724 (2019). https://doi.org/10.1016/j.asoc.2019.105724

    Article  Google Scholar 

  7. Rzayeva, Z., Alasgarov, E.: Facial emotion recognition using deep convolutional neural networks. Int. J. Adv. Sci. Technol. 29(6 Special Issue), 2020–2025 (2020)

    Google Scholar 

  8. Connie, T., Al-Shabi, M., Cheah, W.P., Goh, M.: Facial expression recognition using a hybrid CNN–SIFT aggregator. In: Phon-Amnuaisuk, S., Ang, S.-P., Lee, S.-Y. (eds.) MIWAI 2017. LNCS (LNAI), vol. 10607, pp. 139–149. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69456-6_12

    Chapter  Google Scholar 

  9. Alfakih, A., Yang, S., Hu, T.: Distributed computing and artificial intelligence. In: 16th International Conference, Multi-view Cooperative Deep Convolutional Network for Facial Recognition with Small Samples Learning, vol. 290 (2019). https://doi.org/10.1007/978-3-030-23887-2

  10. Aouayeb, M., Hamidouche, W., Soladie, C., Kpalma, K., Seguier, R.: Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition, pp. 1–13 (2021). arXiv:2107.03107

  11. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262

  12. Riaz, M.N., Shen, Y., Sohail, M., Guo, M.: eXnet: an efficient approach for emotion recognition in the wild. Sensors (Switzerland) 20(4), 1087 (2020). https://doi.org/10.3390/s20041087

  13. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. 36(2), 405–412 (2019). https://doi.org/10.1007/s00371-019-01630-9

    Article  Google Scholar 

  14. Wang, Y., Li, Y., Song, Y., Rong, X.: The influence of the activation function in a convolution neural network model of facial expression recognition. Appl. Sci. 10(5), 1897 (2020). https://doi.org/10.3390/app10051897

  15. Huang, Q., Huang, C., Wang, X., Jiang, F.: Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. (Ny). 580, 35–54 (2021). https://doi.org/10.1016/j.ins.2021.08.043

Download references

Acknowledgments

This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA), and the CNRST of Morocco (project 22).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachid Bousaid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bousaid, R., El Hajji, M., Es-Saady, Y. (2022). Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator. In: Fakir, M., Baslam, M., El Ayachi, R. (eds) Business Intelligence. CBI 2022. Lecture Notes in Business Information Processing, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-031-06458-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06458-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06457-9

  • Online ISBN: 978-3-031-06458-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics