Abstract
Designing of a robust Human-Computer Interaction (HCI) system is a challenging task,especially for automatic speech recognition (ASR) when working under unfriendly environment.This paper proposesan ASRsystem which uses bimodal information (i.e. Speech along with the visual input) resulting inimproved robustness. In thisresearch staticand dynamic (∆) audio features are extracted using the Mel-Frequency Cepstral Coefficients (MFCC).The visual feature isextracted using Two-Dimensional Discrete Wavelet Transform (2D-DWT). Audio-video recognition is performed over different combination of visual feature using HMM (Hidden Markov Model) under clean and noisy environmental conditions.Aligarh Muslim University Audio Visual (AMUAV) Hindi database has been chosen as the baseline data. In addition, noisy speech signal performance is evaluated for different Signal to Noise Ratio (SNR: 30 dB to -20 dB). At last, addition of visual information to ASR is reported to increase the accuracy when working under smart assistive environment, i.e. for applications, which may not have the noise-free background condition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Seymour, R., Stewart, D., Ming, J.: Comparison of image transform-based features for visual speech recognition in clean and corrupted videos. EURASIP Journal on Image and Video Processing 2008, 1–9 (2008)
Upadhyaya, P., Farooq, O., Varshney, P., Upadhyaya, A.: Enhancement of VSR Using Low Dimension Visual Feature. In: International Conference on Multimedia Signal Processing and Communication Technologies, IMPACT 2013, AMU, Aligarh, India, pp. 71–74. IEEE Press (2013)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91, 1306–1326 (2003)
Petajan, E.: Automatic lipreading to enhance speech recognition. In: IEEE Global Telecommunications Conference, Atlanta, GA, USA, pp. 265–272. IEEE Press (1984)
Chen, T.: Audiovisual speech processing, Lip Reading and Lip Synchronization. IEEE Signal Processing Magazine, 9–21 (2001)
Valles, A., Gurban, M., Thiran, J.: Low Dimensional Motion Features for Audio-Visual Speech Recognition. In: 15th European Signal Processing Conference, EUSIPCO, Poznan, Poland, pp. 297–301 (2007)
Young, S.: A review of large vocabulary continuous speech. IEEE Signal Processing Magazine 13(5), 45–57 (1996)
Upadhyaya, P., Farooq, O., Varshney, P.: Comparative study of viseme recognition by using DCT feature. In: International Symposium Frontier Research on Speech and Music, FRSM, Gurgaon, Haryana, India, pp. 171–175 (2012)
Varshney, P., Farooq, O., Upadhyaya, P.: Hindi viseme recognition using subspace DCT features. International Journal of Applied Pattern Recognition (in press, 2014)
Varshney, P., Upadhyaya, P., Farooq, O.: Transform based Visual Features for Bimodal Recognition of Hindi Visemes. International Journal of Electronics and Computer Science Engineering 1(3), 892–897 (2012) ISSN- 2277- 1956
Stewart, D., Seymour, R., Pass, A., Ming, J.: Robust Audio Visual Speech Recognition under noisy audio-video conditions. IEEE Transactions on Cybernetics 44(2), 175–184 (2014)
Zhou, Z., Hong, X., Zhao, G., Pietikainen, M.: A compact representation of visual speech data using latent variables. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(1), 181–187 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P. (2015). Performance Evaluation of Bimodal Hindi Speech Recognition under Adverse Environment. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-12012-6_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12011-9
Online ISBN: 978-3-319-12012-6
eBook Packages: EngineeringEngineering (R0)