Abstract
Several research studies have shown that the robustness and performance of speech recognition systems can be improved using physiologically inspired filterbank based on Gabor filters. In this paper, we proposed a feature extraction method based on 59 two-dimensional Gabor filterbank. The use of these set of filters aims to extracting specific modulation frequencies and limiting the redundancy on feature level. The recognition performance of our feature extraction method is evaluated in isolated words extracted from TIMIT corpus. The obtained results demonstrate that the proposed extraction method gives better recognition rates to those obtained using the classic methods MFCC, PLP and LPC.
Chapter PDF
Similar content being viewed by others
References
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Ephraim, Y., Merhav, N.: Hidden markov processes. IEEE Transactions on Information Theory 48(6), 1518–1569 (2002)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27403 (1993)
Hermansky, H.: Perceptual linear predictive (plp) analysis of speech. The Journal of the Acoustical Society of America 87, 1738 (1990)
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing 2(4), 578–589 (1994)
Kim, C., Stern, R.M.: Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. In: Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 28–31 (2009)
Kleinschmidt, M., Gelbart, D.: Improving word accuracy with gabor feature extraction. In: Annual Conference of the International Speech Communication Association, INTERSPEECH (2002)
Lei, H., Meyer, B.T., Mirghafori, N.: Spectro-temporal gabor features for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4241–4244 (2012)
Lippmann, R.P.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)
Mesgarani, N., David, S., Shamma, S.: Representation of phonemes in primary auditory cortex: How the brain analyzes speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV–765 (2007)
Mesgarani, N., Shamma, S.: Speech processing with a cortical representation of audio. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5872–5875 (2011)
Meyer, B.T., Kollmeier, B.: Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition. Speech Communication 53(5), 753–767 (2011)
Meyer, B.T., Ravuri, S.V., Schädler, M.R., Morgan, N.: Comparing Different Flavors of Spectro-Temporal Features for ASR. In: Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1269–1272 (2011)
Meyer, B.T., Spille, C., Kollmeier, B., Morgan, N.: Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition. In: Annual Conference of the International Speech Communication Association (INTERSPEECH), vol. 15, p. 20 (2012)
O’Shaughnessy, D.: Linear predictive coding. IEEE Potentials 7(1), 29–32 (1988)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Ravuri, S.V., Morgan, N.: Using spectro-temporal features to improve AFE feature extraction for ASR. In: Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1181–1184 (2010)
Schädler, M.R., Meyer, B.T., Kollmeier, B.: Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. The Journal of the Acoustical Society of America 131, 4134 (2012)
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (Revised for HTK version 3.4.1). Cambridge University Engineering Department (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Missaoui, I., Lachiri, Z. (2014). Gabor Filterbank Features for Robust Speech Recognition. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds) Image and Signal Processing. ICISP 2014. Lecture Notes in Computer Science, vol 8509. Springer, Cham. https://doi.org/10.1007/978-3-319-07998-1_76
Download citation
DOI: https://doi.org/10.1007/978-3-319-07998-1_76
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07997-4
Online ISBN: 978-3-319-07998-1
eBook Packages: Computer ScienceComputer Science (R0)