Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition | SpringerLink
Skip to main content

Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition

  • Conference paper
Advances in Machine Learning and Cybernetics

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3930))

  • 1172 Accesses

Abstract

We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Xie, L.: Research on Key Issues of Audio Visual Speech Recognition. Ph.D Thesis of Northwestern Polytechnical University (2004)

    Google Scholar 

  2. Richardson, J., Bilmes, J., Diorio, C.: Hidden-Articulator Models for Speech Recognition. Speech Communication 41, 511–529 (2003)

    Article  Google Scholar 

  3. Bilmes, J.A., Zweig, G., et al.: Discrimiatively Structured Graphical Models for Speech Recognition. Technical Report of JHU 2001 Summer Workshop (2001)

    Google Scholar 

  4. Saenko, K., Livescu, K., Glass, J., Darrell, T.: Production Domain Modeling of Pronunciation for Visual Speech Recognition. In: Proc. ICASSP 2005, Philadelphia (2005)

    Google Scholar 

  5. Adjoudani, A., Benoit, C.: On the Integration of Auditory and Visual Parameters on an HMM-based ASR. In: Stork, D.G., Hennecke, M.E. (eds.) Speechreading by Humans and Machines, pp. 461–471. Springer, Berlin (1996)

    Google Scholar 

  6. Lucey, S.: Audio-Visual Speech Processing. Ph.D Thesis of Queensland University of Technology (2002)

    Google Scholar 

  7. Xie, L., Zhao, R.C., Liu, Z.Q.: Adaptive Stream Reliability Modelling based on Local Dispersion Measures for Audio Visual Speech Recognitin. In: Proc. ICMLC 2005, Guangzhou, China (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xie, L., Liu, ZQ. (2006). Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_104

Download citation

  • DOI: https://doi.org/10.1007/11739685_104

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33584-9

  • Online ISBN: 978-3-540-33585-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics