Emotion Recognition in Sound

Popova, Anastasiya S.; Rassadin, Alexandr G.; Ponomarenko, Alexander A.

doi:10.1007/978-3-319-66604-4_18

Anastasiya S. Popova⁵,
Alexandr G. Rassadin⁵ &
Alexander A. Ponomarenko⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 736))

Included in the following conference series:

International Conference on Neuroinformatics

1721 Accesses
40 Altmetric

Abstract

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational experiment was done based on Radvess open dataset including 8 different emotions: “neutral”, “calm”, “happy,” “sad,” “angry,” “scared”, “disgust”, “surprised”. Our best accuracy result 71% was produced by combination “melspectrogram + convolution neural network VGG-16”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 21449; Price includes VAT (Japan)

Hardcover Book: JPY 21449; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach

Deep neural network architectures for audio emotion recognition performed on song and speech modalities

Article 28 December 2023

Emotion Recognition from Speech Signal Using Deep Learning

References

Krofto, E.: Kak YAndeks raspoznaet muzyku s mikrofona. In: Yet another Conference 2013, Moscow (2013). (in Russian)
Google Scholar
Wang, A.: An industrial strength audio search algorithm. In: ISMIR, vol. 2003, pp. 7–13 (2003)
Google Scholar
Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)
Article Google Scholar
Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298 (2016)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Article MathSciNet MATH Google Scholar
Ortony, A., Turner, T.J.: What’s basic about basic emotions? Psychol. Rev. 97(3), 315 (1990)
Article Google Scholar
Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44(4), 695–729 (2005)
Article Google Scholar
Russell, J.A., Ward, L.M., Pratt, G.: Affective quality attributed to environments: a factor analytic study. Environ. Behav. 13(3), 259–288 (1981)
Article Google Scholar
Livingstone, S.R., Peck, K., Russo, F.A.: Ravdess: the Ryerson audio-visual database of emotional speech and song. In: Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)
Google Scholar
Zhang, Z.: Feature-based facial expression recognition: sensitivity analysis and experiments with a multilayer perceptron. Int. J. Pattern Recognit. Artif. Intell. 13(06), 893–911 (1999)
Article Google Scholar
Tsai, T.J., Morgan, N.: Longer features: they do a speech detector good. In: INTERSPEECH, pp. 1356–1359 (2012)
Google Scholar
Eyben, F., Böck, S., Schuller, B.W., Graves, A.: Universal onset detection with bidirectional long short-term memory neural networks. In: ISMIR, pp. 589–594 (2010)
Google Scholar
Ramachandran, A., Vasudevan, S., Naganathan, V.: Deep learning for music era classification. http://varshaan.github.io/Media/ml_report.pdf. Accessed 23 June 2017
Ishaq, M.: Voice activity detection and garbage modelling for a mobile automatic speech recognition application. https://aaltodoc.aalto.fi/handle/123456789/24702. Accessed 23 June 2017

Download references

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (Grant №17-05-0007) and by the Russian Academic Excellence Project “5-100”.

Author information

Authors and Affiliations

Higher School of Economics, National Research University, Nizhniy Novgorod, Russian Federation
Anastasiya S. Popova, Alexandr G. Rassadin & Alexander A. Ponomarenko

Authors

Anastasiya S. Popova
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr G. Rassadin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander A. Ponomarenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasiya S. Popova .

Editor information

Editors and Affiliations

Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Boris Kryzhanovsky
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Witali Dunin-Barkowski
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Vladimir Redko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popova, A.S., Rassadin, A.G., Ponomarenko, A.A. (2018). Emotion Recognition in Sound. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research. NEUROINFORMATICS 2017. Studies in Computational Intelligence, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-66604-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-66604-4_18
Published: 29 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66603-7
Online ISBN: 978-3-319-66604-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Emotion Recognition in Sound

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach

Deep neural network architectures for audio emotion recognition performed on song and speech modalities

Emotion Recognition from Speech Signal Using Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Emotion Recognition in Sound

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach

Deep neural network architectures for audio emotion recognition performed on song and speech modalities

Emotion Recognition from Speech Signal Using Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation