Abstract
Speech Emotion Recognition (SER) represents one of the emerging fields in human-computer interaction. Quality of the human-computer interface that mimics human speech emotions relies heavily on the types of features used and also on the classifier employed for recognition. The main purpose of this paper is to present a wide range of features employed for speech emotion recognition and the acoustic characteristics of those features. Also in this paper, we analyze the performance in terms of some important parameters such as: precision, recall, F-measure and recognition rate of the features using two of the commonly used emotional speech databases namely Berlin emotional database and Danish emotional database. Emotional speech recognition is being applied in modern human-computer interfaces and the overview of 10 interesting applications is also presented in this paper to illustrate the importance of this technique.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zeng, Z., Roisman, M. P. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio,visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.
Vogt, T., Andre, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In C. Peter & R. Beale (Eds.), LNCS: Vol. 4868. Affect and emotion in HCI (pp. 75–91).
Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.
Frantzidis, C. A., Bratsas, C., et al. (2010). On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine, 14(2), 309–318.
Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.
Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 719–722). University of Karlsruhe (TH), Karlsruhe, Germany.
Murugappan, M., Rizon, M., Nagarajan, R., Yaacob, S., Zunaidi, I., & Hazry, D. (2007). EEG feature extraction for classifying emotions using FCM and FKM. International Journal of Computers and Communications, 2(1), 21–25.
Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.
Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 792–796).
Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.
International Conference on Information Technology and Computer Science (2009). The Research on Emotion recognition from ECG signal. In International conference on information technology and computer science, Kiev, July 25–26
Han, M.-J., Hsu, J.-H., & Song, K.-T. (2008). A new information fusion method for bimodal robotic emotion recognition. Journal of Computers, 3(7), 39–47.
Chibelushi, C. C., Deravi, F., & Mason, J. S. D. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.
Elwakdy, M., Elsehely, E., Eltokhy, M., & Elhennawy, A. (2008). Speech recognition using a wavelet transform to establish fuzzy inference system through subtractive clustering and neural network (ANFIS). International Journal of Circuits, Systems and Signal Processing, 4(2), 264–273.
Ranjan, S. (2010). Exploring the discrete wavelet transform as a tool for Hindi speech recognition. International Journal of Computer Theory and Engineering, 2(4), 642–645.
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Am Mittag German audio-visual emotional Speech Database. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.
Wollmer, M., Metallinou, A., Eyben, F., Schuller, B., & Narayanan, S. (2010). Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In International speech communication association, Makuhari, Chiba, Japan, 26–30 September.
Firoz Shah, A., Raji Sukumar, A., & Babu Anto, P. (2010). Discrete wavelet transforms and artificial neural networks for speech emotion recognition. International Journal of Computer Theory and Engineering, 2(3), 319–322.
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., & Wendemuth, A. (2009). Acoustic emotion recognition: a benchmark comparison of performances. In IEEE workshop on automatic speech recognition and understanding, Merano, Italy, 13–20 December (pp. 552–557).
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1–2), 151–170.
Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.
Xiao, Z., Dellandrea, E., Dou, W., Chen, L., & Ecole Centrale de Lyon (2007). Automatic hierarchical classification of emotional speech. In Ninth IEEE international symposium on multimedia 2007—workshops (pp. 291–296).
Camelin, N., Bechet, F., Damnati, G., & De Mori, R. (2010). Detection and interpretation of opinion expressions in spoken surveys. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 369–381.
Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.
Visser, E., Otsuka, M., & Lee, T.-W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Communication, 41, 393–407.
Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden Markov models. In EUROSPEECH 2001 Scandinavia, 7th European conference on speech communication and technology, 2nd INTERSPEECH Event, Aalborg, Denmark, 3–7 September.
Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In INTERSPEECH 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2009). Emotion recognition from speech: putting ASR in the loop. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2009), Taipei, Taiwan, 19–24 April.
Xiao, Z., Dellandrea, E., & Chen, L. (2009). Recognition of emotions in speech by a hierarchical approach. In Affective computing and intelligent interaction and workshops 2009 (ACII 2009), 3rd international conference, Amsterdam, 10–12 September.
Khanchandani, K. B., & Hussain, M. A. (2009). Emotion recognition using multilayer perceptron and generalized feed forward neural network. IEEE Journal of Scientific and Industrial Research, 68, 367–371.
Sobol-Shikler, T., & Robinson, P. (2010). Classification of complex information: inference of co-occurring affective states from their expressions in speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1284–1297.
Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not music training, predicts recognition of emotional speech prosody. Emotion, 8(6), 838–849. Copyright 2008 by the American Psychological Association.
Erro, D., Navas, E. Hernáez, I., & Saratxaga, I. (2010). Emotion conversion based on prosodic unit selection. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 974–983.
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In International speech communication association, acoustics, speech, and signal processing. Proceedings (ICASSP’04), IEEE international conference, Quebec, Canada, 17–21 May.
Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In IEEE international conference on acoustics, speech, and signal processing, (ICASSP), Honolulu, 15 April.
Truong, K. P., & Raaijmakers, S. (2008). Automatic recognition of spontaneous emotions in speech using acoustic and lexical features. In LNCS: Vol. 5237. MLMI 2008 (pp. 161–172).
Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In IEEE international conference on acoustics, speech, and signal processing, Quebec, Canada, 17–21 May.
Schuller, B., Vlasenko, B., Arsic, D., Rigoll, G., & Wendemuth, A. (2008). Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.
Fujie, S. Yagi, D., Matsusaka, Y., Kikuchi, H., & Kobayashi, T. (2004). Spoken dialogue system using prosody as para-linguistic information. In Indian science congress association archive, speech prosody 2004, international conference, Nara, Japan, 23–26 March.
Ringeval, F., & Chetouani, M. (2008). A vowel based approach for acted emotion recognition. In INTERSPEECH 2008 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.
Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.
Wöllmer, M., Schuller, B., Eyben, F., & Rigoll, G. (2010). Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing, 4(5), 867–881.
Tarng, W., Chen, Y.-Y., Li, C.-L., Hsie, K.-R., & Chen, M. (2010). Applications of support vector machines on smart phone systems for emotional speech recognition. World Academy of Science, Engineering and Technology, 72, 106–113.
Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.
Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269.
Moore, E., II, Clements, M. A., Peifer, J. W., Weisser, L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Biomedical Engineering, 55(1), 96–107.
Jang, K.-D., & Kwon, O.-W. (2006). Speech emotion recognition for affective human-robot interaction. In SPECOM’2006, St. Petersburg, 25–29 June (pp. 419–422).
Park, J.-S., Kim, J.-H., & Oh, Y.-H. (2009). Feature vector classification based speech emotion recognition for service robots. IEEE Transactions on Consumer Electronics, 55(3), 1590–1596.
Wang, Y., & Guan, L. (2008). Recognizing human emotional state from audiovisual signals. IEEE Transactions on Multimedia, 10(4), 659–668.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ramakrishnan, S., El Emary, I.M.M. Speech emotion recognition approaches in human computer interaction. Telecommun Syst 52, 1467–1478 (2013). https://doi.org/10.1007/s11235-011-9624-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-011-9624-z