Speech emotion recognition approaches in human computer interaction

Ramakrishnan, S.; El Emary, Ibrahiem M. M.

doi:10.1007/s11235-011-9624-z

Speech emotion recognition approaches in human computer interaction

Published: 02 September 2011

Volume 52, pages 1467–1478, (2013)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

S. Ramakrishnan¹ &
Ibrahiem M. M. El Emary²

2503 Accesses
3 Altmetric
Explore all metrics

Abstract

Speech Emotion Recognition (SER) represents one of the emerging fields in human-computer interaction. Quality of the human-computer interface that mimics human speech emotions relies heavily on the types of features used and also on the classifier employed for recognition. The main purpose of this paper is to present a wide range of features employed for speech emotion recognition and the acoustic characteristics of those features. Also in this paper, we analyze the performance in terms of some important parameters such as: precision, recall, F-measure and recognition rate of the features using two of the commonly used emotional speech databases namely Berlin emotional database and Danish emotional database. Emotional speech recognition is being applied in modern human-computer interfaces and the overview of 10 interesting applications is also presented in this paper to illustrate the importance of this technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Zeng, Z., Roisman, M. P. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio,visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.
Article Google Scholar
Vogt, T., Andre, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In C. Peter & R. Beale (Eds.), LNCS: Vol. 4868. Affect and emotion in HCI (pp. 75–91).
Chapter Google Scholar
Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.
Article Google Scholar
Frantzidis, C. A., Bratsas, C., et al. (2010). On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine, 14(2), 309–318.
Article Google Scholar
Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.
Article Google Scholar
Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 719–722). University of Karlsruhe (TH), Karlsruhe, Germany.
Google Scholar
Murugappan, M., Rizon, M., Nagarajan, R., Yaacob, S., Zunaidi, I., & Hazry, D. (2007). EEG feature extraction for classifying emotions using FCM and FKM. International Journal of Computers and Communications, 2(1), 21–25.
Google Scholar
Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.
Article Google Scholar
Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 792–796).
Google Scholar
Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.
Article Google Scholar
International Conference on Information Technology and Computer Science (2009). The Research on Emotion recognition from ECG signal. In International conference on information technology and computer science, Kiev, July 25–26
Google Scholar
Han, M.-J., Hsu, J.-H., & Song, K.-T. (2008). A new information fusion method for bimodal robotic emotion recognition. Journal of Computers, 3(7), 39–47.
Article Google Scholar
Chibelushi, C. C., Deravi, F., & Mason, J. S. D. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.
Article Google Scholar
Elwakdy, M., Elsehely, E., Eltokhy, M., & Elhennawy, A. (2008). Speech recognition using a wavelet transform to establish fuzzy inference system through subtractive clustering and neural network (ANFIS). International Journal of Circuits, Systems and Signal Processing, 4(2), 264–273.
Google Scholar
Ranjan, S. (2010). Exploring the discrete wavelet transform as a tool for Hindi speech recognition. International Journal of Computer Theory and Engineering, 2(4), 642–645.
Google Scholar
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Am Mittag German audio-visual emotional Speech Database. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.
Google Scholar
Wollmer, M., Metallinou, A., Eyben, F., Schuller, B., & Narayanan, S. (2010). Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In International speech communication association, Makuhari, Chiba, Japan, 26–30 September.
Google Scholar
Firoz Shah, A., Raji Sukumar, A., & Babu Anto, P. (2010). Discrete wavelet transforms and artificial neural networks for speech emotion recognition. International Journal of Computer Theory and Engineering, 2(3), 319–322.
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., & Wendemuth, A. (2009). Acoustic emotion recognition: a benchmark comparison of performances. In IEEE workshop on automatic speech recognition and understanding, Merano, Italy, 13–20 December (pp. 552–557).
Google Scholar
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.
Article Google Scholar
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1–2), 151–170.
Article Google Scholar
Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.
Article Google Scholar
Xiao, Z., Dellandrea, E., Dou, W., Chen, L., & Ecole Centrale de Lyon (2007). Automatic hierarchical classification of emotional speech. In Ninth IEEE international symposium on multimedia 2007—workshops (pp. 291–296).
Chapter Google Scholar
Camelin, N., Bechet, F., Damnati, G., & De Mori, R. (2010). Detection and interpretation of opinion expressions in spoken surveys. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 369–381.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.
Article Google Scholar
Visser, E., Otsuka, M., & Lee, T.-W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Communication, 41, 393–407.
Article Google Scholar
Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden Markov models. In EUROSPEECH 2001 Scandinavia, 7th European conference on speech communication and technology, 2nd INTERSPEECH Event, Aalborg, Denmark, 3–7 September.
Google Scholar
Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In INTERSPEECH 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.
Google Scholar
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2009). Emotion recognition from speech: putting ASR in the loop. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2009), Taipei, Taiwan, 19–24 April.
Google Scholar
Xiao, Z., Dellandrea, E., & Chen, L. (2009). Recognition of emotions in speech by a hierarchical approach. In Affective computing and intelligent interaction and workshops 2009 (ACII 2009), 3rd international conference, Amsterdam, 10–12 September.
Google Scholar
Khanchandani, K. B., & Hussain, M. A. (2009). Emotion recognition using multilayer perceptron and generalized feed forward neural network. IEEE Journal of Scientific and Industrial Research, 68, 367–371.
Google Scholar
Sobol-Shikler, T., & Robinson, P. (2010). Classification of complex information: inference of co-occurring affective states from their expressions in speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1284–1297.
Article Google Scholar
Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not music training, predicts recognition of emotional speech prosody. Emotion, 8(6), 838–849. Copyright 2008 by the American Psychological Association.
Article Google Scholar
Erro, D., Navas, E. Hernáez, I., & Saratxaga, I. (2010). Emotion conversion based on prosodic unit selection. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 974–983.
Article Google Scholar
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In International speech communication association, acoustics, speech, and signal processing. Proceedings (ICASSP’04), IEEE international conference, Quebec, Canada, 17–21 May.
Google Scholar
Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In IEEE international conference on acoustics, speech, and signal processing, (ICASSP), Honolulu, 15 April.
Google Scholar
Truong, K. P., & Raaijmakers, S. (2008). Automatic recognition of spontaneous emotions in speech using acoustic and lexical features. In LNCS: Vol. 5237. MLMI 2008 (pp. 161–172).
Google Scholar
Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In IEEE international conference on acoustics, speech, and signal processing, Quebec, Canada, 17–21 May.
Google Scholar
Schuller, B., Vlasenko, B., Arsic, D., Rigoll, G., & Wendemuth, A. (2008). Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.
Google Scholar
Fujie, S. Yagi, D., Matsusaka, Y., Kikuchi, H., & Kobayashi, T. (2004). Spoken dialogue system using prosody as para-linguistic information. In Indian science congress association archive, speech prosody 2004, international conference, Nara, Japan, 23–26 March.
Google Scholar
Ringeval, F., & Chetouani, M. (2008). A vowel based approach for acted emotion recognition. In INTERSPEECH 2008 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.
Google Scholar
Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.
Article Google Scholar
Wöllmer, M., Schuller, B., Eyben, F., & Rigoll, G. (2010). Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing, 4(5), 867–881.
Article Google Scholar
Tarng, W., Chen, Y.-Y., Li, C.-L., Hsie, K.-R., & Chen, M. (2010). Applications of support vector machines on smart phone systems for emotional speech recognition. World Academy of Science, Engineering and Technology, 72, 106–113.
Google Scholar
Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.
Article Google Scholar
Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269.
Article Google Scholar
Moore, E., II, Clements, M. A., Peifer, J. W., Weisser, L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Biomedical Engineering, 55(1), 96–107.
Article Google Scholar
Jang, K.-D., & Kwon, O.-W. (2006). Speech emotion recognition for affective human-robot interaction. In SPECOM’2006, St. Petersburg, 25–29 June (pp. 419–422).
Google Scholar
Park, J.-S., Kim, J.-H., & Oh, Y.-H. (2009). Feature vector classification based speech emotion recognition for service robots. IEEE Transactions on Consumer Electronics, 55(3), 1590–1596.
Article Google Scholar
Wang, Y., & Guan, L. (2008). Recognizing human emotional state from audiovisual signals. IEEE Transactions on Multimedia, 10(4), 659–668.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Tech. Dep., Dr. Mahalingam College of Eng. & Tech., Udumalai Road, Pollachi, 642003, India
S. Ramakrishnan
Faculty of Information Technology, King Abdulaziz University, P.O. Box 18388, Jeddah, King Saudi Arabia
Ibrahiem M. M. El Emary

Authors

S. Ramakrishnan
View author publications
You can also search for this author inPubMed Google Scholar
Ibrahiem M. M. El Emary
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to S. Ramakrishnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramakrishnan, S., El Emary, I.M.M. Speech emotion recognition approaches in human computer interaction. Telecommun Syst 52, 1467–1478 (2013). https://doi.org/10.1007/s11235-011-9624-z

Download citation

Published: 02 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11235-011-9624-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition approaches in human computer interaction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Emotion Recognition Systems: A Comprehensive Review on Different Methodologies

Speech Emotion Recognition: A Comprehensive Survey

Speech Emotion Recognition: A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Speech emotion recognition approaches in human computer interaction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Emotion Recognition Systems: A Comprehensive Review on Different Methodologies

Speech Emotion Recognition: A Comprehensive Survey

Speech Emotion Recognition: A Review

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now