Negative Emotion Recognition in Spoken Dialogs | SpringerLink
Skip to main content

Abstract

Increasing attention has been directed to the study of the automatic emotion recognition in human speech recently. This paper presents an approach for recognizing negative emotions in spoken dialogs at the utterance level. Our approach mainly includes two parts. First, in addition to the traditional acoustic features, linguistic features based on distributed representation are extracted from the text transcribed by an automatic speech recognition (ASR) system. Second, we propose a novel deep learning model, multi-feature stacked denoising autoencoders (MSDA), which can fuse the high-level representations of the acoustic and linguistic features along with contexts to classify emotions. Experimental results demonstrate that our proposed method yields an absolute improvement over the traditional method by 5.2 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.audeering.com/research/opensmile

  2. 2.

    http://ictclas.nlpir.org/

  3. 3.

    https://code.google.com/p/word2vec/

  4. 4.

    https://catalog.ldc.upenn.edu/LDC2011T13

  5. 5.

    http://download.wikipedia.com/zhwiki/

  6. 6.

    http://www.sogou.com/labs/dl/ca.html

  7. 7.

    https://github.com/cjlin1/libsvm

  8. 8.

    http://taku910.github.io/crfpp/

  9. 9.

    http://deeplearning.net/software/theano/

References

  1. Allauzen, C., Mohri, M., Riley, M., Roark, B.: A generalized construction of integrated speech recognition transducers. In: ICASSP, vol. 1, pp. 761–764 (2004)

    Google Scholar 

  2. Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    Article  Google Scholar 

  3. Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Personal. Soc. Psychol. 70(3), 614–636 (1996)

    Article  Google Scholar 

  4. Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for emotion recognition. Speech Commun. 52(7), 613–625 (2010)

    Article  Google Scholar 

  5. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: ICSLP, vol. 3, pp. 1970–1973 (1996)

    Google Scholar 

  6. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M.L., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: ICASSP, pp. 8604–8608 (2013)

    Google Scholar 

  7. He, H., Garcia, E.A.: Learning from imbalanced data. TKDE 21(9), 1263–1284 (2009)

    Google Scholar 

  8. Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. In: ACL (2014)

    Google Scholar 

  9. Hoefel, G., Elkan, C.: Learning a two-stage SVM/CRF sequence classifier. In: CIKM, pp. 271–278 (2008)

    Google Scholar 

  10. Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: EMNLP, pp. 36–45 (2008)

    Google Scholar 

  11. Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)

    Article  Google Scholar 

  12. Liscombe, J., Riccardi, G., Hakkani-Tür, D.: Using context to improve emotion detection in spoken dialog systems. In: Eurospeech (2005)

    Google Scholar 

  13. Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In: ACL (2004)

    Google Scholar 

  14. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.: Approaching automatic recognition of emotion from voice: a rough benchmark. In: ITRW (2000)

    Google Scholar 

  15. Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.F., Besacier, L.: Step-by-step and integrated approaches in broadcast news speaker diarization. Comput. Speech Lang. 20(2), 303–330 (2006)

    Article  Google Scholar 

  16. Metze, F., Polzehl, T., Wagner, M.: Fusion of acoustic and linguistic features for emotion detection. In: ICSC, pp. 153–160 (2009)

    Google Scholar 

  17. Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech commun. 49(2), 98–112 (2007)

    Article  Google Scholar 

  18. Pérez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance-level multimodal sentiment analysis. In: ACL, pp. 973–982 (2013)

    Google Scholar 

  19. Raaijmakers, S., Truong, K., Wilson, T.: Multimodal subjectivity analysis of multiparty conversation. In: EMNLP, pp. 466–474 (2008)

    Google Scholar 

  20. Reynolds, D.A., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: ICASSP, vol. 5, pp. 953–956 (2005)

    Google Scholar 

  21. Rozgić, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Vembu, A.N., Prasad, R.: Emotion recognition using acoustic and lexical features. In: Interspeech (2012)

    Google Scholar 

  22. Sánchez-Gutiérrez, M.E., Albornoz, E.M., Martinez-Licona, F., Rufiner, H.L., Goddard, J.: Deep learning for emotional speech recognition. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 311–320. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  23. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013)

    Google Scholar 

  24. Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional space improves emotion recognition. In: Interspeech (2002)

    Google Scholar 

  25. Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech commun. 48(9), 1162–1181 (2006)

    Article  Google Scholar 

  26. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: ICML, pp. 1096–1103 (2008)

    Google Scholar 

  27. Zhou, G., He, T., Zhao, J.: Bridging the language gap: learning distributed semantics for cross-lingual sentiment classification. In: Zong, C., Nie, J.-Y., Zhao, D., Feng, Y. (eds.) NLPCC 2014. CCIS, vol. 496, pp. 138–149. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Acknowledgments

Our work is supported by National High Technology Research and Development Program of China (863 Program) (No. 2015AA015402), National Natural Science Foundation of China (No. 61370117 & No. 61433015) and Major National Social Science Fund of China (No. 12&ZD227).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, X., Wang, H., Li, L., Zhao, M., Li, Q. (2015). Negative Emotion Recognition in Spoken Dialogs. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics