Abstract
The speech emotion recognition is a challenging and an exigent task in the field of data science. Existing studies have only focused on one-dimensional Convolutional Neural Network (CNN) architecture for speech emotion recognition. This one-dimensional architecture’s speech recognition accuracy is low when dealt with RAVDESS, TESS and URDU datasets using non-optimal parameters. To overcome this problem, this research work proposed an efficient two-dimensional CNN architecture with an optimized combination of parameters to achieve better accuracy. The proposed method is compared with Support Vector Machine (SVM) and one-dimensional CNN using RAVDESS, TESS and URDU datasets based on accuracy. Based on the conducted experiments, it can be seen that, the proposed method has outperformed with an accuracy of 76.08% and 99.68%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011)
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2227–2231. IEEE (March 2017)
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)
Iqbal, U., Ghazali, R.: Chebyshev multilayer perceptron neural network with Levenberg Marquardt-back propagation learning for classification tasks. In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol. 549, pp. 162–170. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51281-5_17
Mohmad Hassim, Y.M., Ghazali, R.: Using artificial bee colony to improve functional link neural network training. In Applied Mechanics and Materials, vol. 263, pp. 2102–2108. Trans Tech Publications Ltd. (2013)
Cheng, H., Tang, X.: Speech emotion recognition based on interactive convolutional neural network. In 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), pp. 163–167. IEEE (September 2020)
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
Zayene, B., Jlassi, C., Arous, N.: 3D convolutional recurrent global neural network for speech emotion recognition. In 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5. IEEE (September 2020)
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391 (2018)
Maqsood, A., Iqbal, U., Shoukat, I.A., Latif, Z., Kanwal, A.: Fibonacci polynomial based multilayer perceptron neural network for classification of medical data. In: AIP Conference Proceedings, vol. 2355, no. 1, p. 040005. AIP Publishing LLC (May 2021)
Iqbal, U., Ghazali, R., Shah, H.: Fibonacci polynomials based functional link neural network for classification tasks. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2018. AISC, vol. 700, pp. 234–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72550-5_23
Iqbal, U., Ghazali, R., Mushtaq, M.F., Kanwal, A.: Functional expansions based multilayer perceptron neural network for classification task. Computación y Sistemas 22(4), 1625–1635 (2018)
Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Bhavan, A., Chauhan, P., Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019)
Zeng, Y., Mao, H., Peng, D., Yi, Z.: Spectrogram based multi-task audio classification. Multimed. Tools Appl. 78(3), 3705–3722 (2017). https://doi.org/10.1007/s11042-017-5539-3
Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research. NEUROINFORMATICS 2017. SCI, vol. 736, pp. 117–124. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66604-4_18
Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8. IEEE (December 2016)
Dupuis, K., Pichora-Fuller, M.K.: Toronto emotional speech set (TESS)-Younger talker_Happy (2010)
Sundarprasad, N.: Speech emotion detection using machine learning techniques (2018)
Venkataramanan, K., Rajamohan, H.R.: Emotion recognition from speech (2019). arXiv preprint arXiv:1912.10458
Krishnan, P.T., Raj, A.N.J., Rajangam, V.: Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell. Syst. 1–16 (2021)
Latif, S., Qayyum, A., Usman, M., Qadir, J.: Cross lingual speech emotion recognition: Urdu vs. western languages. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 88–93. IEEE (December 2018)
Latif, S., Qadir, J., Bilal, M.: Unsupervised adversarial domain adaptation for cross-lingual speech emotion recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 732–737. IEEE (September 2019)
Acknowledgement
This research was supported by the Universiti Tun Hussein Onn Malaysia (UTHM) through the Multidisciplinary Research Grant (MDR) (Vote H494).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Butt, S.A., Iqbal, U., Ghazali, R., Shoukat, I.A., Lasisi, A., Al-Saedi, A.K.Z. (2022). An Improved Convolutional Neural Network for Speech Emotion Recognition. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-00828-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00827-6
Online ISBN: 978-3-031-00828-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)