Abstract
We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xie, L.: Research on Key Issues of Audio Visual Speech Recognition. Ph.D Thesis of Northwestern Polytechnical University (2004)
Richardson, J., Bilmes, J., Diorio, C.: Hidden-Articulator Models for Speech Recognition. Speech Communication 41, 511–529 (2003)
Bilmes, J.A., Zweig, G., et al.: Discrimiatively Structured Graphical Models for Speech Recognition. Technical Report of JHU 2001 Summer Workshop (2001)
Saenko, K., Livescu, K., Glass, J., Darrell, T.: Production Domain Modeling of Pronunciation for Visual Speech Recognition. In: Proc. ICASSP 2005, Philadelphia (2005)
Adjoudani, A., Benoit, C.: On the Integration of Auditory and Visual Parameters on an HMM-based ASR. In: Stork, D.G., Hennecke, M.E. (eds.) Speechreading by Humans and Machines, pp. 461–471. Springer, Berlin (1996)
Lucey, S.: Audio-Visual Speech Processing. Ph.D Thesis of Queensland University of Technology (2002)
Xie, L., Zhao, R.C., Liu, Z.Q.: Adaptive Stream Reliability Modelling based on Local Dispersion Measures for Audio Visual Speech Recognitin. In: Proc. ICMLC 2005, Guangzhou, China (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, L., Liu, ZQ. (2006). Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_104
Download citation
DOI: https://doi.org/10.1007/11739685_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33584-9
Online ISBN: 978-3-540-33585-6
eBook Packages: Computer ScienceComputer Science (R0)