Abstract
In this work we present an approach for text-independent speaker recognition. As features we used Mel Frequency Cepstrum Coefficients (MFCCs) and Temporal Patterns (TRAPs). For each speaker we trained Gaussian Mixture Models (GMMs) with different numbers of densities. The used database was a 36 speakers database with very noisy close-talking recordings. For the training a Universal Background Model (UBM) is built by the EM-Algorithm and all available training data. This UBM is then used to create speaker-dependent models for each speaker. This can be done in two ways: Taking the UBM as an initial model for EM-Training or Maximum-A-Posteriori (MAP) adaptation. For the 36 speaker database the use of TRAPs instead of MFCCs leads to a frame-wise recognition improvement of 12.0 %. The adaptation with MAP enhanced the recognition rate by another 14.2 %.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2005)
Hermansky, H., Sharma, S.: TRAPS – classifiers of temporal patterns. In: Proc. International Conference on Spoken Language Processing (ICSLP), Sydney, Australia (1998)
Maier, A., Hacker, C., Steidl, S., Nöth, E., Niemann, H.: Robust Parallel Speech Recognition in Multiple Energy Bands. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) Pattern Recognition. LNCS, vol. 3663, pp. 133–140. Springer, Heidelberg (2005)
Reynolds, D.A., Rose, R.C.: Robust Test-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transaction on Speech and Audio Processing 3, 72–83 (1995)
Gauvain, J., Lee, C.: Maximum A-Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 19–41 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bocklet, T., Maier, A., Nöth, E. (2007). Text-Independent Speaker Identification Using Temporal Patterns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)