Using the Tandem Approach for AF Classification in an AVSR System | SpringerLink
Skip to main content

Using the Tandem Approach for AF Classification in an AVSR System

  • Conference paper
Advances in Neural Networks - ISNN 2008 (ISNN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5264))

Included in the following conference series:

  • 3021 Accesses

Abstract

This paper describes an audio visual speech recognition (AVSR) system based on articulatory features (AF). It implements a tandem approach where artificial neural networks (ANN), in particular multi-layer perceptrons (MLP), are used as posterior probability estimators for transforming raw input data into the more abstract articulatory features. Such an approach is particularly well suited if relatively few training data are available, a situation which is typical for AVSR. In addition, the MLP feature extraction results and some analysis in terms of recognition accuracy and confusions are presented. Our AF-based AVSR system has been trained on the audio-visual speech corpus VIDTIMIT, which contains conversational speech based on a medium size vocabulary including more than 1200 words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Petajan, E.D.: Automatic Lipreading to Enhance Speech Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 265–272. IEEE Press, San Francisco (1985)

    Google Scholar 

  2. Potamianos, G., Neti, L.J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing, pp. 356–396. MIT Press, Cambridge (2004)

    Google Scholar 

  3. Watson, R.: A Survey of Gesture Recognition Techniques. Technical report, Trinity College Dublin (1993)

    Google Scholar 

  4. Fasel, B., Luettin, J.: Automatic Facial Expression Analysis: A Survey. J. Pat. Rec. 36, 259–275 (2003)

    Article  MATH  Google Scholar 

  5. Kirchhoff, K.: Robust Speech Recognition Using Articulatory Information. PhD Thesis, University of Bielefeld (1999)

    Google Scholar 

  6. Abu-Amer, T., Carson-Berndsen, J.: HARTFEX: A Multi-Dimensional System of HMM Based Recognizers for Articulatory Feature Extraction. In: 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, pp. 2541–2544 (2003)

    Google Scholar 

  7. Papcun, J., Hochberg, J., Thomas, T.R., Laroche, F., Zacks, J., Levy, S.: Inferring Articulation and Recognizing Gestures from Acoustics with A Neural Network Trained on X-ray Microbeam Data. J. Acoust. Soc. Am. 92, 688–700 (1992)

    Article  Google Scholar 

  8. Zacks, J., Thomas, T.R.: A new Neural Network for Articulatory Speech Recognition and Its Application to Vowel Identification. J. Com. Sp. Lan. 8, 189–209 (1994)

    Article  Google Scholar 

  9. Frankel, J., King, S.: ASR - Articulatory Speech Recognition. In: 7th European Conference on Speech Communication and Technology, Scandinavia, Aalborg, pp. 599–602 (2003)

    Google Scholar 

  10. Eide, E., Rohlicek, J.R., Gish, H., Mitter, S.: A Linguistic Feature Representation of the Speech Waveform. In: 18th International Conference on Acoustics Speech and Signal Processing, pp. 483–486. IEEE Press, Minneapolis (1993)

    Chapter  Google Scholar 

  11. Deng, L., Erler, K.: Microstructural Speech Units and Their HMM Representations for Discrete Utterance Speech Recognition. In: International Conference on Acoustics Speech and Signal Processing, pp. 193–196. IEEE Press, Washington (1991)

    Google Scholar 

  12. Gan, T., Menzel, W.: An Audio Visual Speech Recognition Framework Based on Articulatory Features. In: 7th International Coference on Auditory-Visual Speech Processing, pp. 137–141. Tilburg University, Tilburg (2007)

    Google Scholar 

  13. Hennecke, M.E., Stork, D.G., Prasad, K.V.: Visionary Speech: Looking ahead to Practical Speechreading Systems. J. Spe. Hum. Mach., 331–349 (1996)

    Google Scholar 

  14. Luettin, J.: Visual Speech and Speaker Recognition. PhD thesis, University of Sheffeld (1997)

    Google Scholar 

  15. Livescu, K., Cetin, O., Johnson, M.H., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Haggerty, S.D., Woods, B., Frankel, J., Doss, M.M., Saenko, K.: Articulatory Feature-based Methods for Acoustic and Audio-Visual Speech Recognition: JHU Summer Workshop Final Report. In: 32nd IEEE International Conference on Acoustics Speech and Signal Processing, pp. 621–624. IEEE Press, Honolulu (2007)

    Google Scholar 

  16. Saenko, K., Darrell, T., Glass, J.: Articulatory Features for Robust Visual Speech Recognition. In: 6th International Conference on Multimodal Interfaces, pp. 152–158. ACM, New York (2004)

    Chapter  Google Scholar 

  17. Sanderson, C., Paliwal, K.K.: Identity Verification Using Speech and Face Information. J. Dig. Sig. Proc. 14, 449–480 (2004)

    Article  Google Scholar 

  18. Hermansky, H., Ellis, D.I.W., Shamza, S.: Tandem Connectionist Feature Extraction for Conventional HMM Systems. In: 25th International Conference on Acoustics Speech and Signal Processing, pp. 1635–1638. IEEE Press, Istanbul (2000)

    Google Scholar 

  19. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP Speech Analysis Technique. In: Proceedings of the International Conference Acoustics Speech Signal Processing, San Francisco, California, pp. 1121–1124 (1991)

    Google Scholar 

  20. Fisher, W.M., Doddington, G.R., Goudie-Marshall, K.M.: The DARPA Speech Recognition Research Database: Specifications and Status. In: The DARPA Speech Recognition Workshop, Palo Alto, Canada, pp. 93–99 (1986)

    Google Scholar 

  21. Viola, P., Jones, M.: Robust Real-Time Face Detection. J. Com. Vis. 57, 137–154 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gan, T., Menzel, W., Zhang, J. (2008). Using the Tandem Approach for AF Classification in an AVSR System. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_94

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87734-9_94

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87733-2

  • Online ISBN: 978-3-540-87734-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics