Text-Independent Speaker Identification Using Temporal Patterns | SpringerLink
Skip to main content

Text-Independent Speaker Identification Using Temporal Patterns

  • Conference paper
Text, Speech and Dialogue (TSD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

Abstract

In this work we present an approach for text-independent speaker recognition. As features we used Mel Frequency Cepstrum Coefficients (MFCCs) and Temporal Patterns (TRAPs). For each speaker we trained Gaussian Mixture Models (GMMs) with different numbers of densities. The used database was a 36 speakers database with very noisy close-talking recordings. For the training a Universal Background Model (UBM) is built by the EM-Algorithm and all available training data. This UBM is then used to create speaker-dependent models for each speaker. This can be done in two ways: Taking the UBM as an initial model for EM-Training or Maximum-A-Posteriori (MAP) adaptation. For the 36 speaker database the use of TRAPs instead of MFCCs leads to a frame-wise recognition improvement of 12.0 %. The adaptation with MAP enhanced the recognition rate by another 14.2 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)

    MATH  Google Scholar 

  2. Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2005)

    Google Scholar 

  3. Hermansky, H., Sharma, S.: TRAPS – classifiers of temporal patterns. In: Proc. International Conference on Spoken Language Processing (ICSLP), Sydney, Australia (1998)

    Google Scholar 

  4. Maier, A., Hacker, C., Steidl, S., Nöth, E., Niemann, H.: Robust Parallel Speech Recognition in Multiple Energy Bands. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) Pattern Recognition. LNCS, vol. 3663, pp. 133–140. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Reynolds, D.A., Rose, R.C.: Robust Test-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transaction on Speech and Audio Processing 3, 72–83 (1995)

    Article  Google Scholar 

  6. Gauvain, J., Lee, C.: Maximum A-Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)

    Article  Google Scholar 

  7. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 19–41 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bocklet, T., Maier, A., Nöth, E. (2007). Text-Independent Speaker Identification Using Temporal Patterns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics