Abstract
In this paper, we explored a multi-feature based classification framework for the Multimodal Emotion Recognition Challenge, which is part of the Chinese Conference on Pattern Recognition (CCPR 2016). The task of the challenge is to recognize one of eight facial emotions in short video segments extracted from Chinese films, TV plays and talk shows. In our framework, both traditional methods and Deep Convolutional Neural Network (DCNN) methods are used to extract various features. With different features, different classifiers are trained to predict video emotion labels. Moreover, a decision-level fusion method is explored to aggregate these different prediction results. According to the results on the competition database, our method shows better effectiveness on Chinese facial emotion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1424–1445 (2000)
Ebrahimi Kahou, S., et al.: Recurrent neural networks for emotion recognition in video. In: ACM International Conference on Multimodal Interaction (2015)
Li, Y., et al.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)
Bao, W., et al.: Building a Chinese natural emotional audio-visual database. In: International Conference on Signal Processing (2015)
Eyben, F., et al.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: ACM International Conference on Multimedia (2013)
Zhao, G., Pietikinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Proc. CVPR 1, 511 (2001)
Li, H., et al.: A convolutional neural network cascade for face detection. In: Computer Vision and Pattern Recognition (2015)
Szarvas, M., et al.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of Intelligent Vehicles Symposium. IEEE (2015)
Kim, B.K., et al.: Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In: ACM on International Conference on Multimodal Interaction (2015)
Sun, B., et al.: Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: The International Conference (2014)
Sun, B., et al.: Combining multimodal features within a fusion network for emotion recognition in the wild. In: ACM on International Conference on Multimodal Interaction (2015)
Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision & Pattern Recognition (2013)
Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of Intelligent Vehicles Symposium. IEEE (2015)
Carrier, P., et al.: FER-2013 face database. Technical report, 1365, Université de Montréal (2013)
Eyben, F., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 12(2), 190–202 (2016)
Fan, R.E., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(12), 1871–1874 (2010)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. Eprint Arxiv, pp. 675–678 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Chen, S., Jin, Q.: Multi-modal dimensional emotion recognition using recurrent neural networks. In: International Workshop on Audio/visual Emotion Challenge (2015)
Chollet (2015). GitHub. https://github.com/fchollet/keras. Accessed 10 June 2016
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Computer Science (2015)
Acknowledgements
This work is supported by the National Education Science Twelfth Five-Year Plan Key Issues of the Ministry of Education (DCA140229).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, B., Xu, Q., He, J., Yu, L., Li, L., Wei, Q. (2016). Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_51
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)