Abstract
The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT-06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseline system provides a high speech activity detection (SAD) error around of 10% on lecture data, some different acoustic representations with various normalization techniques are investigated within the framework of log-likelihood ratio (LLR) based speech activity detector. UBMs trained on the different types of acoustic features are also examined in the SID clustering stage. All SAD acoustic models and UBMs are trained with the forced alignment segmentations of the conference data. The diarization system integrating the new SAD models and UBM gives comparable results on both the RT-07S conference and lecture evaluation data for the multiple distant microphone (MDM) condition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
NIST, Spring 2007 Rich Transcription (RT-07S) Meeting Recognition Evaluation Plan (February 2007), http://www.nist.gov/speech/tests/rt/rt2007/spring/docs/rt07s-meeting-eval-plan-v2.pdf
Anguera, X., Wooters, C., Hernando, J.: Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion. In: Automatic Speech Recognition and Understanding (ASRU 2005), San Juan, Puerto Rico. IEEE, Los Alamitos (2005)
Zhu, X., Barras, C., Meignier, S., Gauvain, J.-L.: Combining Speaker Identification and BIC for Speaker Diarization. In: ISCA Interspeech 2005, Lisbon, September 2005, pp. 2441–2444 (2005)
Barras, C., Zhu, X., Meignier, S., Gauvain, J.-L.: Multi-Stage Speaker Diarization of Broadcast News. The IEEE Transactions on Audio, Speech and Language Processing, September 2006 (to appear)
Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Speaker Diarization: from Broadcast News to Lectures. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, Springer, Heidelberg (2006)
Siegler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation and clustering of broadcast news audio. In: The DARPA Speech Recognition Workshop, Chantilly, USA (February 1997)
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA (February 1998)
Cettolo, M.: Segmentation, classification and clustering of an Italian broadcast news corpus. In: Conf. on Content-Based Multimedia Information Access (RIAO 2000), April 2000, Paris (2000)
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Improving speaker diarization. In: The Proceedings of Fall 2004 Rich Transcription Workshop (RT 2004), November 2004, Palisades, NY, USA (2004)
Tranter, S.E., Reynolds, D.A.: Speaker diarization for broadcast news. In: Proc. ISCA Speaker Recognition Workshop Odyssey 2004, May 2004, Toledo, Spain (2004)
Schroeder, J., Campbell, J. (eds.): Digital Signal Processing (DSP), a review journal - Special issue on NIST 1999 speaker recognition workshop. Academic Press, London (2000)
Barras, C., Gauvain, J.-L.: Feature and score normalization for speaker verification of cellular data. In: IEEE ICASSP 2003, Hong Kong (2003)
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proc. ISCA Speaker Recognition Workshop Odyssey 2001, June 2001, pp. 213–218 (2001)
Gauvain, J.-L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–298 (1994)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (DSP), a review journal - Special issue on NIST 1999 speaker recognition workshop 10(1-3), 19–41 (2000)
Reynolds, D.A., Singer, E., Carlson, B.A., O’Leary, G.C., McLaughlin, J.J., Zissman, M.A.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proc. of International Conf. on Spoken Language Processing (ICSLP 1998) (1998)
Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: The ICSI-SRI Spring 2005 Speech-To-Text evaluation System. In: Rich Transcription 2005 Spring Meeting Recognition Evaluation, July 2005, Edinburgh, Great Britain (2005)
NIST, Fall 2004 Rich Transcription (RT-04F) evaluation plan (August 2004), http://www.nist.gov/speech/tests/rt/rt2004/fall/docs/rt04f-eval-plan-v14.pdf
NIST, Spring 2006 Rich Transcription (RT-06S) Meeting Recognition Evaluation Plan (February 2006), http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-eval-plan-v2.pdf
Wooters, C., Huijbregts, M.: The ICSI RT07s Speaker Diarization System. In: Rich Transcription 2007 Meeting Recognition Workshop, Baltimore, USA (May 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, X., Barras, C., Lamel, L., Gauvain, JL. (2008). Multi-stage Speaker Diarization for Conference and Lecture Meetings. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)