Abstract
Facial expression recognition (FER) is a challenging classification task. Due to the subjectivity and ambiguity of performers and spectators, compound facial expression is hard to be represented by one-hot label. In this paper, a simple but efficient method, named real emotion seeker (RES), is proposed to recalibrate the annotation of sample to latent expression distribution besides one-hot label. In particular, subjective implicit knowledge is transformed into posterior distribution which is specific to each FER data set through Bayesian inference, thus enhancing universality and authenticity. The posterior distribution is then combined with one-hot label to form the recalibrated annotation as an additional supervision, guiding the prediction more realistic. Our proposed method is independent of the backbone network and can improve the accuracy significantly by an average of 3.16% with no burden for training and inference. Extensive experiments show that RES can obtain consistent prediction with human subjective intuition. Results on three in-the-wild data sets demonstrate that our approach achieves advanced results with 90.38% on RAF-DB, 90.34% on FERPlus and 62.63% on AffectNet.
Similar content being viewed by others
References
Mehrabian, A.: Communication without words. Commun. Theory 6, 193–200 (2008)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, pp. 94–101 (2010). IEEE
Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of 3rd Internatinal Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, p. 65. Paris, France (2010)
Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593. IEEE (2017)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Plutchik, R.: A general psychoevolutionary theory of emotion. In: Theories of Emotion, pp. 3–33. Elsevier (1980)
Geng, X.: Label distribution learning. IEEE Trans. Knowl. Data Eng 28(7), 1734–1748 (2016)
Zhou, Y., Xue, H., Geng, X.: Emotion distribution recognition from facial expressions. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1247–1250 (2015)
Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13984–13993 (2020)
She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2021)
Vo, T.-H., Lee, G.-S., Yang, H.-J., Kim, S.-H.: Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8, 131988–132001 (2020)
Ghimire, D., Lee, J.: Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6), 7714–7734 (2013)
Happy, S., George, A., Routray, A.: A real time facial expression classification system using local binary patterns. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5. IEEE (2012)
Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570 (2016)
Ding, H., Zhou, P., Chellappa, R.: Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9. IEEE (2020)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Song, L., Gong, D., Li, Z., Liu, C., Liu, W.: Occlusion robust face recognition based on mask learning with pairwise differential siamese network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 773–782 (2019)
Zhang, F., Zhang, T., Mao, Q., Xu, C.: A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Trans. Image Process. 29, 6574–6589 (2020). https://doi.org/10.1109/TIP.2020.2991549
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Geometry guided pose-invariant facial expression recognition. IEEE Trans. Image Process. 29, 4445–4460 (2020). https://doi.org/10.1109/TIP.2020.2972114
Zhang, X., Zhang, F., Xu, C.: Joint expression synthesis and representation learning for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1681–1695 (2022). https://doi.org/10.1109/TCSVT.2021.3056098
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Xue, F., Wang, Q., Guo, G.: Transfer: Learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
Geng, X., Xia, Y.: Head pose estimation based on multivariate label distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1837–1842 (2014)
Su, K., Geng, X.: Soft facial landmark detection by label distribution learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5008–5015 (2019)
Smith-Miles, K., Geng, X.: Revisiting facial age estimation with new insights from instance space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2689–2697 (2022)
Zhang, H., Zhang, Y., Geng, X.: Practical age estimation using deep label distribution learning. Front. Comput. Sci. 15(3), 1–6 (2021)
Xu, N., Liu, Y.-P., Geng, X.: Label enhancement for label distribution learning. IEEE Trans. Knowl. Data Eng. 33, 1632–1643 (2021)
Xu, N., Shu, J., Liu, Y., Geng, X.: Variational label enhancement. In: International Conference on Machine Learning, pp. 10597–10606. PMLR (2020)
Li, Y., Zhang, M., Geng, X.: Leveraging implicit relative labeling-importance information for effective multi-label learning. In: 2015 IEEE International Conference on Data Mining, pp. 251–260. IEEE (2015)
Hou, P., Geng, X., Zhang, M.: Multi-label manifold learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: a holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249 (2019)
Li, J., Socher, R., Hoi, S.C.H.: Dividemix: Learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HJgExaVtwr
Wu, G., Gong, S.: Peer collaborative learning for online knowledge distillation. In: AAAI (2021)
Roschelle, J.: Learning in Interactive Environments: Prior Knowledge and New Experience. Princeton, Citeseer (1997)
Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., et al.: Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Farzaneh, A.H., Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 406–407 (2020)
Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7660–7669 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grant 62071216 and U1936202.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, Z., She, J. & Shen, Q. Real emotion seeker: recalibrating annotation for facial expression recognition. Multimedia Systems 29, 139–151 (2023). https://doi.org/10.1007/s00530-022-00986-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00986-8