Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator

Bousaid, Rachid; El Hajji, Mohamed; Es-Saady, Youssef

doi:10.1007/978-3-031-06458-6_5

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 449))

Included in the following conference series:

International Conference on Business Intelligence

734 Accesses

Abstract

Facial Emotion Recognition (FER) is an important and challenging task in computer vision due to different issues such as quality of images, the correlation between same expression, computational complexity, and it requires a large amount of data. This paper presents a novel approach to the FER task. We are motivated by the success of Vision Transformer (ViT) and the Convolutional Neural Network (CNN) on image classification in general and facial emotion recognition.

The Swin Transformer (ST) is a hierarchical transformer that uses shifted windows to compute representation. The advantages of ST include limiting self-attention computing, and has linear computational complexity to image size. This paper studies and compares both ST and Deep CNN architecture when merged by different merging layers. The proposed approach is tested on the FER2013 and CK+ data sets. Experimental results demonstrate the high performance of the Average Merging Layer (AML), and our method outperforms state-of-the-art methods on FER2013 and CK+.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8579; Price includes VAT (Japan)

Softcover Book: JPY 10724; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Facial Expression Recognition Using a Hybrid CNN–SIFT Aggregator

Meaningful Learning for Deep Facial Emotional Features

Article 07 September 2021

Four-layer ConvNet to facial emotion recognition with minimal epochs and the significance of data diversity

Article Open access 28 April 2022

References

Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971). https://doi.org/10.1037/h0030377
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2323 (1998). https://doi.org/10.1109/5.726791
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild (2021). arXiv:2106.04520
Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021). arXiv:2103.14030
Hung, J.C., Lin, K.C., Lai, N.X.: Recognizing learning emotion based on convolutional neural networks and transfer learning. Appl. Soft Comput. J. 84, 105724 (2019). https://doi.org/10.1016/j.asoc.2019.105724
Article Google Scholar
Rzayeva, Z., Alasgarov, E.: Facial emotion recognition using deep convolutional neural networks. Int. J. Adv. Sci. Technol. 29(6 Special Issue), 2020–2025 (2020)
Google Scholar
Connie, T., Al-Shabi, M., Cheah, W.P., Goh, M.: Facial expression recognition using a hybrid CNN–SIFT aggregator. In: Phon-Amnuaisuk, S., Ang, S.-P., Lee, S.-Y. (eds.) MIWAI 2017. LNCS (LNAI), vol. 10607, pp. 139–149. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69456-6_12
Chapter Google Scholar
Alfakih, A., Yang, S., Hu, T.: Distributed computing and artificial intelligence. In: 16th International Conference, Multi-view Cooperative Deep Convolutional Network for Facial Recognition with Small Samples Learning, vol. 290 (2019). https://doi.org/10.1007/978-3-030-23887-2
Aouayeb, M., Hamidouche, W., Soladie, C., Kpalma, K., Seguier, R.: Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition, pp. 1–13 (2021). arXiv:2107.03107
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262
Riaz, M.N., Shen, Y., Sohail, M., Guo, M.: eXnet: an efficient approach for emotion recognition in the wild. Sensors (Switzerland) 20(4), 1087 (2020). https://doi.org/10.3390/s20041087
Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. 36(2), 405–412 (2019). https://doi.org/10.1007/s00371-019-01630-9
Article Google Scholar
Wang, Y., Li, Y., Song, Y., Rong, X.: The influence of the activation function in a convolution neural network model of facial expression recognition. Appl. Sci. 10(5), 1897 (2020). https://doi.org/10.3390/app10051897
Huang, Q., Huang, C., Wang, X., Jiang, F.: Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. (Ny). 580, 35–54 (2021). https://doi.org/10.1016/j.ins.2021.08.043

Download references

Acknowledgments

This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA), and the CNRST of Morocco (project 22).

Author information

Authors and Affiliations

IRF-SIC Laboratory, Ibn Zohr University, B.P 8106, Agadir, Morocco
Rachid Bousaid, Mohamed El Hajji & Youssef Es-Saady
CRMEF-SM, Avenue My Abdallah BP N, Inezgane, Morocco
Mohamed El Hajji

Authors

Rachid Bousaid
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El Hajji
View author publications
You can also search for this author in PubMed Google Scholar
Youssef Es-Saady
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rachid Bousaid .

Editor information

Editors and Affiliations

Sultan Moulay Slimane University, Beni-Mellal, Morocco
Mohamed Fakir
Sultan Moulay Slimane University, Beni Mellal, Morocco
Mohamed Baslam
Sultan Moulay Slimane University, Beni-Mellal, Morocco
Rachid El Ayachi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bousaid, R., El Hajji, M., Es-Saady, Y. (2022). Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator. In: Fakir, M., Baslam, M., El Ayachi, R. (eds) Business Intelligence. CBI 2022. Lecture Notes in Business Information Processing, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-031-06458-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-06458-6_5
Published: 13 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06457-9
Online ISBN: 978-3-031-06458-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Facial Expression Recognition Using a Hybrid ViT-CNN Aggregator