Emotion Recognition in Sound | SpringerLink
Skip to main content

Emotion Recognition in Sound

  • Conference paper
  • First Online:
Advances in Neural Computation, Machine Learning, and Cognitive Research (NEUROINFORMATICS 2017)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 736))

Included in the following conference series:

Abstract

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational experiment was done based on Radvess open dataset including 8 different emotions: “neutral”, “calm”, “happy,” “sad,” “angry,” “scared”, “disgust”, “surprised”. Our best accuracy result 71% was produced by combination “melspectrogram + convolution neural network VGG-16”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 21449
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Krofto, E.: Kak YAndeks raspoznaet muzyku s mikrofona. In: Yet another Conference 2013, Moscow (2013). (in Russian)

    Google Scholar 

  2. Wang, A.: An industrial strength audio search algorithm. In: ISMIR, vol. 2003, pp. 7–13 (2003)

    Google Scholar 

  3. Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)

    Article  Google Scholar 

  4. Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298 (2016)

  5. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ortony, A., Turner, T.J.: What’s basic about basic emotions? Psychol. Rev. 97(3), 315 (1990)

    Article  Google Scholar 

  7. Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44(4), 695–729 (2005)

    Article  Google Scholar 

  8. Russell, J.A., Ward, L.M., Pratt, G.: Affective quality attributed to environments: a factor analytic study. Environ. Behav. 13(3), 259–288 (1981)

    Article  Google Scholar 

  9. Livingstone, S.R., Peck, K., Russo, F.A.: Ravdess: the Ryerson audio-visual database of emotional speech and song. In: Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (2012)

    Google Scholar 

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  11. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)

    Google Scholar 

  12. Zhang, Z.: Feature-based facial expression recognition: sensitivity analysis and experiments with a multilayer perceptron. Int. J. Pattern Recognit. Artif. Intell. 13(06), 893–911 (1999)

    Article  Google Scholar 

  13. Tsai, T.J., Morgan, N.: Longer features: they do a speech detector good. In: INTERSPEECH, pp. 1356–1359 (2012)

    Google Scholar 

  14. Eyben, F., Böck, S., Schuller, B.W., Graves, A.: Universal onset detection with bidirectional long short-term memory neural networks. In: ISMIR, pp. 589–594 (2010)

    Google Scholar 

  15. Ramachandran, A., Vasudevan, S., Naganathan, V.: Deep learning for music era classification. http://varshaan.github.io/Media/ml_report.pdf. Accessed 23 June 2017

  16. Ishaq, M.: Voice activity detection and garbage modelling for a mobile automatic speech recognition application. https://aaltodoc.aalto.fi/handle/123456789/24702. Accessed 23 June 2017

Download references

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (Grant №17-05-0007) and by the Russian Academic Excellence Project “5-100”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasiya S. Popova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Popova, A.S., Rassadin, A.G., Ponomarenko, A.A. (2018). Emotion Recognition in Sound. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research. NEUROINFORMATICS 2017. Studies in Computational Intelligence, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-66604-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66604-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66603-7

  • Online ISBN: 978-3-319-66604-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics