{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T04:33:24Z","timestamp":1741754004386,"version":"3.38.0"},"reference-count":26,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDT"],"published-print":{"date-parts":[[2023,11,20]]},"abstract":"Emotion recognition is one of the most important components of human-computer interaction, and it is something that can be performed with the use of voice signals. It is not possible to optimise the process of feature extraction as well as the classification process at the same time while utilising conventional approaches. Research is increasingly focusing on many different types of \u201cdeep learning\u201d in an effort to discover a solution to these difficulties. In today\u2019s modern world, the practise of applying deep learning algorithms to categorization problems is becoming increasingly important. However, the advantages available in one model is not available in another model. This limits the practical feasibility of such approaches. The main objective of this work is to explore the possibility of hybrid deep learning models for speech signal-based emotion identification. Two methods are explored in this work: CNN and CNN-LSTM. The first model is the conventional one and the second is the hybrid model. TESS database is used for the experiments and the results are analysed in terms of various accuracy measures. An average accuracy of 97% for CNN and 98% for CNN-LSTM is achieved with these models.<\/jats:p>","DOI":"10.3233\/idt-230216","type":"journal-article","created":{"date-parts":[[2023,8,13]],"date-time":"2023-08-13T19:06:53Z","timestamp":1691953613000},"page":"1435-1453","source":"Crossref","is-referenced-by-count":0,"title":["Hybrid deep learning models based emotion recognition with speech signals"],"prefix":"10.1177","volume":"17","author":[{"given":"M. Kalpana","family":"Chowdary","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, MLR Institute of Technology, Dundigal, Hyderabad, India"}]},{"given":"E. Anu","family":"Priya","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, VIT University, Vellore, Tamil Nadu, India"}]},{"given":"Daniela","family":"Danciulescu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Craiova, Craiova, Romania"}]},{"given":"J.","family":"Anitha","sequence":"additional","affiliation":[{"name":"Department of ECE, Karunya Institute of Technology and Sciences, Coimbatore, India"}]},{"given":"D. Jude","family":"Hemanth","sequence":"additional","affiliation":[{"name":"Department of ECE, Karunya Institute of Technology and Sciences, Coimbatore, India"}]}],"member":"179","reference":[{"key":"10.3233\/IDT-230216_ref1","doi-asserted-by":"crossref","unstructured":"Lugovi\u0107 S, Dunder I, Horvat M. Techniques and applications of emotion recognition in speech. In 2016 39th international convention on information and communication technology, electronics and microelectronics (mipro) (pp. 1278-1283). IEEE. 2016 May.","DOI":"10.1109\/MIPRO.2016.7522336"},{"issue":"3","key":"10.3233\/IDT-230216_ref2","first-page":"44","article-title":"Survey on speech emotion recognition: Features, classification schemes, and databases","author":"El Ayadi","year":"2011","journal-title":"Pattern Recognition."},{"issue":"9","key":"10.3233\/IDT-230216_ref3","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1016\/j.specom.2006.04.003","article-title":"Emotional speech recognition: Resources, features, and methods","volume":"48","author":"Ververidis","year":"2006","journal-title":"Speech Communication."},{"issue":"1","key":"10.3233\/IDT-230216_ref4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.9790\/4200-04110105","article-title":"Zero crossing rate and Energy of the Speech Signal of Devanagari Script","volume":"4","author":"Shete","year":"2014","journal-title":"IOSR-JVSP."},{"key":"10.3233\/IDT-230216_ref5","first-page":"19","article-title":"MFCC and its applications in speaker recognition","author":"Tiwari","year":"2010","journal-title":"International Journal on Emerging Technologies"},{"key":"10.3233\/IDT-230216_ref6","doi-asserted-by":"crossref","first-page":"4898","DOI":"10.1109\/ICMLC.2005.1527805","article-title":"August. Speech emotion recognition based on HMM and SVM","volume":"8","author":"Lin","year":"2005","journal-title":"2005 international conference on machine learning and cybernetics"},{"key":"10.3233\/IDT-230216_ref7","doi-asserted-by":"crossref","unstructured":"Xu S, Liu Y, Liu X. December. Speaker recognition and speech emotion recognition based on GMM. In 3rd International Conference on Electric and Electronics (EEIC 2013) (pp. 434-436), 2013.","DOI":"10.2991\/eeic-13.2013.102"},{"issue":"9-10","key":"10.3233\/IDT-230216_ref8","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1016\/j.specom.2011.06.004","article-title":"Emotion recognition using a hierarchical binary decision tree approach","volume":"53","author":"Lee","year":"2011","journal-title":"Speech Communication"},{"issue":"9-10","key":"10.3233\/IDT-230216_ref9","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1016\/j.specom.2011.06.004","article-title":"Emotion recognition using a hierarchical binary decision tree approach","volume":"53","author":"Lee","year":"2011","journal-title":"Speech Communication"},{"issue":"2","key":"10.3233\/IDT-230216_ref10","first-page":"341","article-title":"Emotion recognition from speech","volume":"3","author":"Sapra","year":"2013","journal-title":"International Journal of Emerging Technology and Advanced Engineering"},{"key":"10.3233\/IDT-230216_ref11","doi-asserted-by":"crossref","unstructured":"Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX. Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing 2019; 115: 213-237.","DOI":"10.1016\/j.ymssp.2018.05.050"},{"key":"10.3233\/IDT-230216_ref12","unstructured":"Qazi H, Kaushik BN. A Hybrid Technique using CNN + LSTM for Speech Emotion Recognition."},{"issue":"5","key":"10.3233\/IDT-230216_ref13","doi-asserted-by":"crossref","first-page":"5571","DOI":"10.1007\/s11042-017-5292-7","article-title":"Deep features-based speech emotion recognition for smart affective services","volume":"78","author":"Badshah","year":"2019","journal-title":"Multimedia Tools and Applications"},{"key":"10.3233\/IDT-230216_ref14","unstructured":"O\u2019Shea K, Nash R. An introduction to convolutional neural networks. arXiv Preprint arXiv1511.08458; 2015."},{"key":"10.3233\/IDT-230216_ref15","unstructured":"Agarap AF. Deep learning using rectified linear units (relu). arXiv Preprint arXiv1803.08375, 2018."},{"key":"10.3233\/IDT-230216_ref16","doi-asserted-by":"crossref","unstructured":"Park S, Kwak N. November. Analysis on the dropout effect in convolutional neural networks. In Asian conference on computer vision (pp.\u00a0189-204); Springer, Cham, 2016.","DOI":"10.1007\/978-3-319-54184-6_12"},{"key":"10.3233\/IDT-230216_ref17","unstructured":"Liu W, Wen Y, Yu Z, Yang M. Large-margin softmax loss for convolutional neural networks. In ICML (Vol. 2, No. 3, p.\u00a07), 2016 June."},{"key":"10.3233\/IDT-230216_ref18","unstructured":"Zhang Z, Sabuncu M. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems 2018; 31."},{"issue":"8","key":"10.3233\/IDT-230216_ref19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"10.3233\/IDT-230216_ref20","doi-asserted-by":"crossref","unstructured":"Pulver A, Lyu S. LSTM with working memory. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp.\u00a0845-851). IEEE, 2017 May.","DOI":"10.1109\/IJCNN.2017.7965940"},{"key":"10.3233\/IDT-230216_ref21","doi-asserted-by":"crossref","first-page":"103345","DOI":"10.1016\/j.compbiomed.2019.103345","article-title":"Brain tumor classification using deep CNN features via transfer learning","volume":"111","author":"Deepak","year":"2019","journal-title":"Computers in Biology and Medicine"},{"key":"10.3233\/IDT-230216_ref22","first-page":"1","article-title":"Impact of autoencoder based compact representation on emotion detection from audio","author":"Patel","year":"2021","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"key":"10.3233\/IDT-230216_ref23","doi-asserted-by":"crossref","unstructured":"Salian B, Narvade O, Tambewagh R, Bharne S. Speech Emotion Recognition using Time Distributed CNN and LSTM. In ITM Web of Conferences (Vol. 40, p.\u00a003006). EDP Sciences, 2021.","DOI":"10.1051\/itmconf\/20214003006"},{"issue":"4","key":"10.3233\/IDT-230216_ref24","doi-asserted-by":"crossref","first-page":"1919","DOI":"10.1007\/s40747-021-00295-z","article-title":"Emotion classification from speech signal based on empirical mode decomposition and non-linear features","volume":"7","author":"Krishnan","year":"2021","journal-title":"Complex & Intelligent Systems"},{"key":"10.3233\/IDT-230216_ref25","unstructured":"Huang A, Bao P. Human vocal sentiment analysis. arXiv preprint arXiv1905.08632, 2019."},{"issue":"11","key":"10.3233\/IDT-230216_ref26","doi-asserted-by":"crossref","first-page":"1577","DOI":"10.3844\/jcssp.2018.1577.1587","article-title":"Deep learning models for speech emotion recognition","volume":"14","author":"Praseetha","year":"2018","journal-title":"Journal of Computer Science"}],"container-title":["Intelligent Decision Technologies"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDT-230216","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T10:43:07Z","timestamp":1741689787000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDT-230216"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,20]]},"references-count":26,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/idt-230216","relation":{},"ISSN":["1872-4981","1875-8843"],"issn-type":[{"type":"print","value":"1872-4981"},{"type":"electronic","value":"1875-8843"}],"subject":[],"published":{"date-parts":[[2023,11,20]]}}}