Improving Speaker Recognition by Training on Emotion-Added Models | SpringerLink
Skip to main content

Improving Speaker Recognition by Training on Emotion-Added Models

  • Conference paper
Affective Computing and Intelligent Interaction (ACII 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Abstract

In speaker recognition applications, the changes of emotional states are main causes of errors. The ongoing work described in this contribution attempts to enhance the performance of automatic speaker recognition (ASR) systems on emotional speech. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The method includes classifying the emotional states by acoustical features and generating emotion-added model based on the emotion grouping. Experimental works are performed on Emotional Prosody Speech (EPS) corpus and show significant improvement in EERs and IRs compared with baseline and comparative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Scherer, K.R., Johnstone, T., Klasmeyer, G., Banziger, T.: Can Automatic Speaker Verification be Improved by Training the Algorithms on Emotional Speech? In: Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China (2000)

    Google Scholar 

  2. Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64, 460–475 (1976)

    Article  Google Scholar 

  3. Scherer, K.R., Johnstone, T., Banziger, T.: Verification of emotionally stressed speakers: The problem of individual differences. In: Proc. of SPECOM 1998 (1998)

    Google Scholar 

  4. Klasmeyer, G., Johnstone, T., Banziger, T., Sappok, C., Scherer, K.R.: Emotional Voice Variability in Speaker Verification. In: ISCA Workshop on Speech and Emotion (2000)

    Google Scholar 

  5. Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)

    Article  Google Scholar 

  6. Burkhardt, F., Sendlmeier, W.F.: Verification of Acoustical Correlates of Emotional Speech using Formant-Synthesis. In: ISCA Workshop on Speech and Emotion (2000)

    Google Scholar 

  7. LDC: The Linguistic Data Consortium: web pages, at http://www.ldc.upenn.edu

  8. Schroder, M.: Emotional Speech Synthesis: A Review. In: Eurospeech 2001, vol. 1 (2001)

    Google Scholar 

  9. Schuller, B., Rigoll, G., Lang, M.: Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-belief Network Architecture. In: ICASSP 2004 (2004)

    Google Scholar 

  10. Patterson, D.: A Linguistic Approach to Pitch Range Modelling. thesis for the degree of Doctor of Philosophy to the University of Edinburgh (2000)

    Google Scholar 

  11. Sun, S.: Pitch Determination and Voice Quality Analysis Using Subharmonic-To-Harmonic Ratio. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17 (2002)

    Google Scholar 

  12. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, T., Yang, Y., Wu, Z. (2005). Improving Speaker Recognition by Training on Emotion-Added Models. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_49

Download citation

  • DOI: https://doi.org/10.1007/11573548_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29621-8

  • Online ISBN: 978-3-540-32273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics