Abstract
Recent studies have established the success of deep learning models in facial emotion recognition. However, such models are often not well suited to tackle one of the most commonly encountered problems of imbalanced classes. In real datasets, various emotion classes are found to be highly underrepresented leading to dramatic reduction in the performance of classification models. In the current study, a residual variational autoencoder-based model has been proposed to address the problem of imbalanced facial emotion recognition. Firstly, in order to capture the most important features in the form of embeddings, a variational autoencoder equipped with residual connections has been trained in an unsupervised fashion to obtain the most effective latent space representation of all input images. After the training phase, only the encoder part of the actual autoencoder is used to transform all labeled facial images into a latent vector form. Next, the imbalanced latent vectors are resampled using well-known algorithms to tackle the imbalanced classes. In this context, three major types of algorithms viz., Undersampling, Oversampling, and Hybrid are used for the same. To establish the quality of the proposed method, various well-known classifiers are trained and tested in terms of test phase confusion matrix-based performance indicators. All hyperparameters are selected by employing the Grid search method. In addition, to understand the effect of oversampling minority class samples, a separate study is conducted by observing classifier performance against varying degrees of oversampling. Experimental results and extensive comparative studies have shown that the residual variational autoencoder model combined with SMOTE-ENN hybrid resampling technique can boost the classifier performance to a greater extent.
Similar content being viewed by others
Data Availability
The datasets analysed during the current study are available from the corresponding author on reasonable request
References
Abdul-Hadi MH, Waleed J (2020) Human speech and facial emotion recognition technique using svm. In 2020 International Conference on Computer Science and Software Engineering (CSASE), pp 191–196. IEEE
Alamgir, Alam M, et al (2022) An artificial intelligence driven facial emotion recognition system using hybrid deep belief rain optimization. Multimedia Tools App pp 1–28
Allognon SOC, de S Britto A, Koerich AL (2020) Continuous emotion recognition via deep convolutional autoencoder and support vector regressor. In 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE
Arora M, Kumar M (2021) Autofer: Pca and pso based automatic facial emotion recognition. Multimedia Tools Appl 80(2):3039–3049
Arora M, Kumar M, Garg NK (2018) Facial emotion recognition system based on pca and gradient features. National Academy Sci Lett 41(6):365–368
Arora S, Risteski A, Zhang Y (2017) Theoretical limitations of encoder-decoder gan architectures. arXiv preprint arXiv:1711.02651
Arora V, Sun M, Wang C (2019) Deep embeddings for rare audio event detection with imbalanced data. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3297–3301. IEEE
Banerjee A, Bhattacharjee M, Ghosh K, Chatterjee S (2020) Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media. Multimedia Tools Appl 79(47):35995–36031
Banerjee A, Ghosh K, Sarkar A, Bhattacharjee M, Chatterjee S (2021) Effects of class imbalance problem in convolutional neural network based image classification. In Advances in Smart Communication Technology and Information Processing: OPTRONIX 2020, pp 181–191. Springer
Batista GE, Bazzan ALC, Monard MC, et al (2003) Balancing training data for automated annotation of keywords: a case study. In WOB, pp 10–18
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter 6(1):20–29
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Machine Learning Research 13(2)
Calderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarría-Estrada LF, Molina-Cabello MA (2021) Correcting data imbalance for semi-supervised covid-19 detection using x-ray chest images. Appl Soft Comput 111:107692
Chatterjee S, Das AK, Nayak J, Pelusi D (2022) Improving facial emotion recognition using residual autoencoder coupled affinity based overlapping reduction. Mathematics 10(3):406
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artificial Int Research 16:321–357
Chen L, Zhou M, Su W, Wu M, She J, Hirota K (2018) Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction. Inform Sci 428:49–61
Chen L, Su W, Wu M, Pedrycz W, Hirota K (2020) A fuzzy deep neural network with sparse autoencoder for emotional intention understanding in human-robot interaction. IEEE Trans Fuzzy Syst 28(7):1252–1264
Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP), pp 1–4. IEEE
Chen L, Wu M, Pedrycz W, Hirota K (2021) Deep sparse autoencoder network for facial emotion recognition. In Emotion Recognition and Understanding for Emotional Human-Robot Interaction Systems, pp 25–39. Springer
Christy A, Vaithyasubramanian S, Jesudoss A, Praveena MDA (2020) Multimodal speech emotion recognition and classification using convolutional neural network techniques. Int J Speech Technol 23:381–388
Deeb H, Sarangi A, Mishra D, Sarangi SK (2022) Human facial emotion recognition using improved black hole based extreme learning machine. Multimedia Tools Appl pp 1–24
Dino HI, Abdulrazzaq MB (2019) Facial expression classification based on svm, knn and mlp classifiers. In 2019 International Conference on Advanced Science and Engineering (ICOASE), pp 70–75. IEEE
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Int 20(1):18–36
Fard AP, Mahoor MH (2022) Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 10:26756–26768
Farzaneh AH, Qi X (2021) Facial expression recognition in the wild via deep attentive center loss. In Proceedings of the IEEE/CVF winter conference on applications of computer vision pp 2402–2411
Gautam KS, Thangavel SK (2019) Video analytics-based facial emotion recognition system for smart buildings. Int J Comput Appl pp 1–10
Ghosh K, Banerjee A, Chatterjee S, Bhattacharjee M, Sarkar A (2021) Oversampling using fuzzy rough set theory in imbalanced neural based diabetic patient readmission prediction: A hybrid approach. In 2021 International Conference on Computer Communication and Informatics (ICCCI), pp 1–5. IEEE
Ghosh K, Banerjee A, Chatterjee S, Sen S (2019) Imbalanced twitter sentiment analysis using minority oversampling. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), pp 1–5. IEEE
Ghosh K, Bellinger C, Corizzo R, Krawczyk B, Japkowicz N (2021) On the combined effect of class imbalance and concept complexity in deep learning. In 2021 IEEE International Conference on Big Data (Big Data), pp 4859–4868. IEEE
Ghosh K, Sarkar A, Banerjee A, Chatterjee S (2021) Performance improvement of convolutional neural network using random under sampling. In Advances in Smart Communication Technology and Information Processing: OPTRONIX 2020, pp 207–217. Springer
Green MC, Plumbley MD (2021) Federated learning with highly imbalanced audio data. arXiv preprint arXiv:2105.08550
Haddad J, Lézoray O, Hamel P (2020) 3d-cnn for facial emotion recognition in videos. In International Symposium on Visual Computing, pp 298–309. Springer
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pp 878–887. Springer
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328. IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu M, Wang H, Wang X, Yang J, Wang R (2019) Video facial emotion recognition based on local enhanced motion history image and cnn-ctslstm networks. J Visual Commun Image Representation 59:176–185
Huang C, Trabelsi A, Qin X, Farruque N, Mou L, Zaiane OR (2021) Seq2emo: A sequence to multi-label emotion classification model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4717–4724
Imani M, Montazer GA (2019) A survey of emotion recognition methods with emphasis on e-learning environments. J Netw Comput Appl 147:102423
Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74
Jang J, Kim Y, Choi K, Suh S (2021) Sequential targeting: A continual learning approach for data imbalance in text classification. Expert Syst Appl 179:115067
Japkowicz N, Stephen S (2002) The class imbalance problem: A systematic study. Int Data Analysis 6(5):429–449
Jiang M, Francis SM, Srishyla D, Conelea C, Zhao Q, Jacob S (2019) Classifying individuals with asd through facial emotion recognition and eye-tracking. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 6063–6068. IEEE
Kim DH, Song BC (2021) Contrastive adversarial learning for person independent facial emotion recognition. In Proceedings of the AAAI Conference on Artificial Intelligence 35:5948–5956
Kumov V, Samorodov A (2020) Recognition of genetic diseases based on combined feature extraction from 2d face images. In 2020 26th Conference of Open Innovations Association (FRUCT), pp 1–7. IEEE
Lakshmi D, Ponnusamy R (2021) Facial emotion recognition using modified hog and lbp features with deep stacked autoencoders. Microprocessors and Microsystems 82:103834
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, pp 63–66. Springer
Lee S-C, Chen K-W, Liu C-C, Kuo C-J, Hsueh I-P, Hsieh C-L (2021) Using machine learning to improve the discriminative power of the ferd screener in classifying patients with schizophrenia and healthy adults. J Affective Disorders
Lee S-C, Liu C-C, Kuo C-J, Hsueh I-P, Hsieh C-L (2020) Sensitivity and specificity of a facial emotion recognition test in classifying patients with schizophrenia. J Affect Disord 275:224–229
Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
Li X, Li X, Li Z, Xiong X, Khyam MO, Sun C (2021) Robust vehicle detection in high-resolution aerial images with imbalanced data. IEEE Trans Artificial Int
Lin C-J, Lin C-H, Wang S-H, Wu C-H (2019) Multiple convolutional neural networks fusion using improved fuzzy integral for facial emotion recognition. Appl Sci 9(13):2593
Lopes N, Silva A, Khanal SR, Reis A, Barroso J, Filipe V, Sampaio J (2018) Facial emotion recognition in the elderly using a svm classifier. In 2018 2nd International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW) pp 1–5. IEEE
Ma T, Benon K, Arnold B, Yu K, Yang Y, Hua Q, Wen Z, Paul AK (2020) Bottleneck feature extraction-based deep neural network model for facial emotion recognition. In International Conference on Mobile Networks and Management pp 30–46. Springer
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affective Comput 10(1):18–31
Naruniec J, Helminger L, Schroers C, Weber RM (2020) High-resolution neural face swapping for visual effects. In Computer Graphics Forum, vol 39, pp 173–184. Wiley Online Library
Ngo QT, Yoon S (2020) Facial expression recognition based on weighted-cluster loss and deep transfer learning using a highly imbalanced dataset. Sensors 20(9):2639
Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowledge Eng Soft Data Paradigms 3(1):4–21
Nguyen D, Nguyen DT, Zeng R, Nguyen TT, Tran S, Nguyen TK, Sridharan S, Fookes C (2021) Deep auto-encoders with sequential learning for multimodal dimensional emotion recognition. IEEE Trans Multimedia
Nnamoko N, Korkontzelos I (2020) Efficient treatment of outliers and class imbalance for diabetes prediction. Artificial Int Medicine 104:101815
Ottl S, Amiriparian S, Gerczuk M, Karas V, Schuller B (2020) Group-level speech emotion recognition utilising deep spectrum features. In Proceedings of the 2020 International Conference on Multimodal Interaction, pp 821–826
Panda MR, Kar SS, Nanda AK, Priyadarshini R, Panda S, Bisoy SK (2021) Feedback through emotion extraction using logistic regression and cnn. The Visual Computer pp 1–13
Panda R, Malheiro RM, Paiva RP (2020) Audio features for music emotion recognition: a survey. IEEE Trans Affective Comput
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Machine Learning Research 12:2825–2830
Pise A, Vadapalli H, Sanders I (2020) Facial emotion recognition using temporal relational network: an application to e-learning. Multimedia Tools Appl pp 1–21
Pouyanfar S, Wang T, Chen SC (2019) A multi-label multimodal deep learning framework for imbalanced data classification. In 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pp 199–204. IEEE
Rajotte J-F, Mukherjee S, Robinson C, Ortiz A, West C, Ferres JL, Ng RT (2021) Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary. arXiv preprint arXiv:2101.07235
Richardson AM, Lidbury BA (2017) Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines. BMC medical Informatics and Decision Making 17(1):1–11
Ruiz-Garcia A, Palade V, Elshaw M, Awad M (2020) Generative adversarial stacked autoencoders for facial pose normalization and emotion recognition. In 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE
Sajjad M, Kwon S et al (2020) Clustering-based speech emotion recognition by incorporating learned features and deep bilstm. IEEE Access 8:79861–79875
Sengupta S, Athwale A, Gulati T, Zelek J, Lakshminarayanan V (2020) Funsyn-net: enhanced residual variational auto-encoder and image-to-image translation network for fundus image synthesis. In Medical Imaging 2020: Image Processing, vol 11313, p 113132M. International Society for Optics and Photonics
Sivasangari A, Ajitha P, Rajkumar I, Poonguzhali S (2019) Emotion recognition system for autism disordered people. J Ambient Int Humanized Comput pp 1–7
Sujanaa J, Palanivel S, Balasubramanian M (2021) Emotion recognition using support vector machine and one-dimensional convolutional neural network. Multimedia Tools Appl pp 1–15
Talpur BA, O’Sullivan D (2020) Multi-class imbalance in text classification: A feature engineering approach to detect cyberbullying in twitter. In Informatics, vol 7, pp 52. Multidisciplinary Digital Publishing Institute
Tarnowski P, Kołodziej M, Majkowski A, Rak RJ (2017) Emotion recognition using facial expressions. Procedia Comput Sci 108:1175–1184
Vinay A, Bharadwaj A, Srinivasan A, Murthy KNB, Natarajan S (2018) Root orb–an improved algorithm for face recognition. In Emerging Trends in Engineering, Science and Technology for Society, Energy and Environment pp 881–888. CRC Press
Vinay A, Kamath VR, Varun M, Natarajan S, Murthy KNB, et al. (2018) Aggregation of lark vectors for facial image classification. In International Conference on Mathematical Modelling and Scientific Computation pp 427–448. Springer
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernetics 3:408–421
Wu J-L, He Y, Yu L-C, Lai KR (2020) Identifying emotion labels from psychiatric social texts using a bi-directional lstm-cnn model. IEEE Access 8:66638–66646
Xu C, Yan C, Jiang M, Alenezi F, Alhudhaif A, Alnaim N, Polat K, Wu W (2022) A novel facial emotion recognition method for stress inference of facial nerve paralysis patients. Expert Syst Appl 197:116705
Yang D-Q, Li T, Liu M-T, Li X-W, Chen B-H (2021) A systematic study of the class imbalance problem: Automatically identifying empty camera trap images using convolutional neural networks. Ecological Informatics, pp 101350
Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727
Yi W, Sun Y, He S (2018) Data augmentation using conditional gans for facial emotion recognition. In 2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama), pp 710–714. IEEE
Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649
Zepf S, Hernandez J, Schmitt A, Minker W, Picard RW (2020) Driver emotion recognition for intelligent vehicles: a survey. ACM Comput Surv (CSUR) 53(3):1–30
Zhang H (2020) Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8:164130–164143
Zhang Y, Chan W, Jaitly N (2016) Very deep convolutional networks for end-to-end speech recognition
Zhao JJ, Ma RL, Zhang XL (2017) Speech emotion recognition based on decision tree and improved svm mixed model. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology 37(4):386–390
Zheng M, Li T, Zheng X, Yu Q, Chen C, Zhou D, Lv C, Yang W (2021) Uffdfr: Undersampling framework with denoising, fuzzy c-means clustering, and representative sample selection for imbalanced data classification. Inform Sci 576:658–680
Funding
The authors declare that no funding was received for conducting this study or preparing the manuscript
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests/conflict of interests to declare that are relevant to the content of this article
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chatterjee, S., Maity, S., Ghosh, K. et al. Majority biased facial emotion recognition using residual variational autoencoders. Multimed Tools Appl 83, 13659–13688 (2024). https://doi.org/10.1007/s11042-023-15888-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15888-8