Abstract
In speaker recognition applications, the changes of emotional states are main causes of errors. The ongoing work described in this contribution attempts to enhance the performance of automatic speaker recognition (ASR) systems on emotional speech. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The method includes classifying the emotional states by acoustical features and generating emotion-added model based on the emotion grouping. Experimental works are performed on Emotional Prosody Speech (EPS) corpus and show significant improvement in EERs and IRs compared with baseline and comparative experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Scherer, K.R., Johnstone, T., Klasmeyer, G., Banziger, T.: Can Automatic Speaker Verification be Improved by Training the Algorithms on Emotional Speech? In: Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China (2000)
Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64, 460–475 (1976)
Scherer, K.R., Johnstone, T., Banziger, T.: Verification of emotionally stressed speakers: The problem of individual differences. In: Proc. of SPECOM 1998 (1998)
Klasmeyer, G., Johnstone, T., Banziger, T., Sappok, C., Scherer, K.R.: Emotional Voice Variability in Speaker Verification. In: ISCA Workshop on Speech and Emotion (2000)
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Burkhardt, F., Sendlmeier, W.F.: Verification of Acoustical Correlates of Emotional Speech using Formant-Synthesis. In: ISCA Workshop on Speech and Emotion (2000)
LDC: The Linguistic Data Consortium: web pages, at http://www.ldc.upenn.edu
Schroder, M.: Emotional Speech Synthesis: A Review. In: Eurospeech 2001, vol. 1 (2001)
Schuller, B., Rigoll, G., Lang, M.: Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-belief Network Architecture. In: ICASSP 2004 (2004)
Patterson, D.: A Linguistic Approach to Pitch Range Modelling. thesis for the degree of Doctor of Philosophy to the University of Edinburgh (2000)
Sun, S.: Pitch Determination and Voice Quality Analysis Using Subharmonic-To-Harmonic Ratio. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17 (2002)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, T., Yang, Y., Wu, Z. (2005). Improving Speaker Recognition by Training on Emotion-Added Models. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_49
Download citation
DOI: https://doi.org/10.1007/11573548_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)