An Improved Convolutional Neural Network for Speech Emotion Recognition | SpringerLink
Skip to main content

An Improved Convolutional Neural Network for Speech Emotion Recognition

  • Conference paper
  • First Online:
Recent Advances in Soft Computing and Data Mining (SCDM 2022)

Abstract

The speech emotion recognition is a challenging and an exigent task in the field of data science. Existing studies have only focused on one-dimensional Convolutional Neural Network (CNN) architecture for speech emotion recognition. This one-dimensional architecture’s speech recognition accuracy is low when dealt with RAVDESS, TESS and URDU datasets using non-optimal parameters. To overcome this problem, this research work proposed an efficient two-dimensional CNN architecture with an optimized combination of parameters to achieve better accuracy. The proposed method is compared with Support Vector Machine (SVM) and one-dimensional CNN using RAVDESS, TESS and URDU datasets based on accuracy. Based on the conducted experiments, it can be seen that, the proposed method has outperformed with an accuracy of 76.08% and 99.68%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 32031
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 40039
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 40039
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011)

    Article  Google Scholar 

  2. Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2227–2231. IEEE (March 2017)

    Google Scholar 

  3. Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)

    Google Scholar 

  4. Iqbal, U., Ghazali, R.: Chebyshev multilayer perceptron neural network with Levenberg Marquardt-back propagation learning for classification tasks. In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol. 549, pp. 162–170. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51281-5_17

  5. Mohmad Hassim, Y.M., Ghazali, R.: Using artificial bee colony to improve functional link neural network training. In Applied Mechanics and Materials, vol. 263, pp. 2102–2108. Trans Tech Publications Ltd. (2013)

    Google Scholar 

  6. Cheng, H., Tang, X.: Speech emotion recognition based on interactive convolutional neural network. In 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), pp. 163–167. IEEE (September 2020)

    Google Scholar 

  7. Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)

    Article  Google Scholar 

  8. Zayene, B., Jlassi, C., Arous, N.: 3D convolutional recurrent global neural network for speech emotion recognition. In 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5. IEEE (September 2020)

    Google Scholar 

  9. Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391 (2018)

    Google Scholar 

  10. Maqsood, A., Iqbal, U., Shoukat, I.A., Latif, Z., Kanwal, A.: Fibonacci polynomial based multilayer perceptron neural network for classification of medical data. In: AIP Conference Proceedings, vol. 2355, no. 1, p. 040005. AIP Publishing LLC (May 2021)

    Google Scholar 

  11. Iqbal, U., Ghazali, R., Shah, H.: Fibonacci polynomials based functional link neural network for classification tasks. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2018. AISC, vol. 700, pp. 234–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72550-5_23

  12. Iqbal, U., Ghazali, R., Mushtaq, M.F., Kanwal, A.: Functional expansions based multilayer perceptron neural network for classification task. Computación y Sistemas 22(4), 1625–1635 (2018)

    Article  Google Scholar 

  13. Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)

    Google Scholar 

  14. Bhavan, A., Chauhan, P., Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019)

    Google Scholar 

  15. Zeng, Y., Mao, H., Peng, D., Yi, Z.: Spectrogram based multi-task audio classification. Multimed. Tools Appl. 78(3), 3705–3722 (2017). https://doi.org/10.1007/s11042-017-5539-3

    Article  Google Scholar 

  16. Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research. NEUROINFORMATICS 2017. SCI, vol. 736, pp. 117–124. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66604-4_18

  17. Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8. IEEE (December 2016)

    Google Scholar 

  18. Dupuis, K., Pichora-Fuller, M.K.: Toronto emotional speech set (TESS)-Younger talker_Happy (2010)

    Google Scholar 

  19. Sundarprasad, N.: Speech emotion detection using machine learning techniques (2018)

    Google Scholar 

  20. Venkataramanan, K., Rajamohan, H.R.: Emotion recognition from speech (2019). arXiv preprint arXiv:1912.10458

  21. Krishnan, P.T., Raj, A.N.J., Rajangam, V.: Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell. Syst. 1–16 (2021)

    Google Scholar 

  22. Latif, S., Qayyum, A., Usman, M., Qadir, J.: Cross lingual speech emotion recognition: Urdu vs. western languages. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 88–93. IEEE (December 2018)

    Google Scholar 

  23. Latif, S., Qadir, J., Bilal, M.: Unsupervised adversarial domain adaptation for cross-lingual speech emotion recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 732–737. IEEE (September 2019)

    Google Scholar 

Download references

Acknowledgement

This research was supported by the Universiti Tun Hussein Onn Malaysia (UTHM) through the Multidisciplinary Research Grant (MDR) (Vote H494).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Umer Iqbal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Butt, S.A., Iqbal, U., Ghazali, R., Shoukat, I.A., Lasisi, A., Al-Saedi, A.K.Z. (2022). An Improved Convolutional Neural Network for Speech Emotion Recognition. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_19

Download citation

Publish with us

Policies and ethics