Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning | SpringerLink
Skip to main content

Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning

  • Conference paper
  • First Online:
Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Abstract

In this paper, we explored a multi-feature based classification framework for the Multimodal Emotion Recognition Challenge, which is part of the Chinese Conference on Pattern Recognition (CCPR 2016). The task of the challenge is to recognize one of eight facial emotions in short video segments extracted from Chinese films, TV plays and talk shows. In our framework, both traditional methods and Deep Convolutional Neural Network (DCNN) methods are used to extract various features. With different features, different classifiers are trained to predict video emotion labels. Moreover, a decision-level fusion method is explored to aggregate these different prediction results. According to the results on the competition database, our method shows better effectiveness on Chinese facial emotion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1424–1445 (2000)

    Article  Google Scholar 

  2. Ebrahimi Kahou, S., et al.: Recurrent neural networks for emotion recognition in video. In: ACM International Conference on Multimodal Interaction (2015)

    Google Scholar 

  3. Li, Y., et al.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)

    Google Scholar 

  4. Bao, W., et al.: Building a Chinese natural emotional audio-visual database. In: International Conference on Signal Processing (2015)

    Google Scholar 

  5. Eyben, F., et al.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: ACM International Conference on Multimedia (2013)

    Google Scholar 

  6. Zhao, G., Pietikinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)

    Article  Google Scholar 

  7. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  8. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Proc. CVPR 1, 511 (2001)

    Google Scholar 

  9. Li, H., et al.: A convolutional neural network cascade for face detection. In: Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  10. Szarvas, M., et al.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of Intelligent Vehicles Symposium. IEEE (2015)

    Google Scholar 

  11. Kim, B.K., et al.: Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In: ACM on International Conference on Multimodal Interaction (2015)

    Google Scholar 

  12. Sun, B., et al.: Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: The International Conference (2014)

    Google Scholar 

  13. Sun, B., et al.: Combining multimodal features within a fusion network for emotion recognition in the wild. In: ACM on International Conference on Multimodal Interaction (2015)

    Google Scholar 

  14. Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision & Pattern Recognition (2013)

    Google Scholar 

  15. Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of Intelligent Vehicles Symposium. IEEE (2015)

    Google Scholar 

  16. Carrier, P., et al.: FER-2013 face database. Technical report, 1365, Université de Montréal (2013)

    Google Scholar 

  17. Eyben, F., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 12(2), 190–202 (2016)

    Article  Google Scholar 

  18. Fan, R.E., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(12), 1871–1874 (2010)

    MATH  Google Scholar 

  19. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  20. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. Eprint Arxiv, pp. 675–678 (2014)

    Google Scholar 

  21. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  22. Chen, S., Jin, Q.: Multi-modal dimensional emotion recognition using recurrent neural networks. In: International Workshop on Audio/visual Emotion Challenge (2015)

    Google Scholar 

  23. Chollet (2015). GitHub. https://github.com/fchollet/keras. Accessed 10 June 2016

  24. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Computer Science (2015)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Education Science Twelfth Five-Year Plan Key Issues of the Ministry of Education (DCA140229).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Sun, B., Xu, Q., He, J., Yu, L., Li, L., Wei, Q. (2016). Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics