Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Thimmaraja Yadava, G.; Jayanna, H. S.

doi:10.1007/s10772-020-09671-5

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Published: 22 January 2020

Volume 23, pages 149–167, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

G. Thimmaraja Yadava¹ &
H. S. Jayanna²

257 Accesses
20 Citations
Explore all metrics

Abstract

In this paper, the improvements in the recently implemented Kannada speech recognition system is demonstrated in detail. The Kannada automatic speech recognition (ASR) system consists of ASR models which are created by using Kaldi, IVRS call flow and weather and agricultural commodity prices information databases. The task specific speech data used in the recently developed spoken dialogue system had high level of different background noises. The different types of noises present in collected speech data had an adverse effect on the on line and off line speech recognition performances. Therefore, to improve the speech recognition accuracy in Kannada ASR system, a noise reduction algorithm is developed which is a fusion of spectral subtraction with voice activity detection (SS-VAD) and minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC) estimator. The noise elimination algorithm is added in the system before the feature extraction part. An alternative ASR models are created using subspace Gaussian mixture models (SGMM) and deep neural network (DNN) modeling techniques. The experimental results show that, the fusion of noise elimination technique and SGMM/DNN based modeling gives a better relative improvement of 7.68% accuracy compared to the recently developed GMM-HMM based ASR system. The least word error rate (WER) acoustic models could be used in spoken dialogue system. The developed spoken query system is tested from Karnataka farmers under uncontrolled environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Article 16 February 2022

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Article 29 July 2023

Continuous Kannada Speech Recognition System Under Degraded Condition

Article 15 July 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abhishek, D., Shahnawazuddin, S., Deepak, K. T., Siddika, I., Prasanna, S. R. M., & Sinha, R. (2017). Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. Journal of Signal Processing Systems, 88(01), 91–102.
Abushariah, M. A. M., Ainon, R. N., Zainuddin, R., Elshafei, M., & Khalifa, O. O. (2010). Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools. In Computer and Communication Engineering (ICCCE), 2010 International Conference on, Kuala Lumpur, pp. 1–6.
Agricultural Marketing Information Network—AGMARKNET. (2011). http://agmarknet.nic.in.
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J. (Dec 2014). A complete KALDI recipe for building arabic speech recognition systems. Spoken Language Technology Workshop (SLT), IEEE, South Lake Tahoe, NV, pp. 525–529.
Al-Qatab, B. A. Q., & Ainon, R. N. (2010). Arabic speech recognition using Hidden Markov Model Toolkit(HTK). In 2010 International Symposium on Information Technology, Kuala Lumpur, pp. 557–556.
Ansari, Z., & Seyyedsalehi, S. A. (2016). Toward growing modular deep neural networks for continuous speech recognition. Neural Computing and Applications, 28, 1177–1196. https://doi.org/10.1007/s00521-016-2438-x.
Article Google Scholar
Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Article Google Scholar
Cole, C., Karam, M. & Aglan, H. (March 2008). Spectral subtraction of noise in speech processing applications. In 40th Southeastern Symposium System Theory, SSST-2008, pp. 50–53, 16–18.
Dahl, G., Yu, D., Deng, L., & Acero, A, (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. In IEEE Transactions on Audio Speech, and Language Processing (receiving 2013 IEEE SPS Best Paper Award), pp. 30–42.
David, H., & James, G. (2014) Speech recognition without a lexicon—bridging the gap between graphemic and phonetic systems. INTERSPEECH, Singapore, pp. 14–18.
Derbali, M., Mu’Tasem, J., & Taib, M. (2012). A review of speech recognition with Sphinx engine in language detection. Journal of Theoretical and Applied Information Technology, 40(2), 147–155.
Google Scholar
Dey, A., Shahnawazuddin, S., Deepak, K. T., Imani, S., Prasanna, S. R. M., & Sinha, R. (2016). Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries. In 2016 Twenty Second National Conference on Communication (NCC), pp. 1–6.
Glass, J. R. (1999). Challenges for spoken dialogue systems. In Proceedings of IEEE ASRU workshop.
Goel, S., & Bhattacharya, M. (July 2010). Speech based dialog query system over asterisk pbx server. In 2nd International Conference on Signal Processing Signal Processing Systems (ICSPS), Dalian.
Hinton, G. E., Deng, L., Yu, D., Dahl, G., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., & Kings-bury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computer, 18, 1527–1554.
Article MathSciNet MATH Google Scholar
Hu, Y., & Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communications, 49, 588–601.
Article Google Scholar
Huanhuan, L., Xiaoqing, Y., Wanggen, W., & Ram, S. (July 2012) An improved spectral subtraction method. International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, pp. 790–793.
India Telecom Online—ITO. (2013). www.indiatelecomonline.com.
Jounghoon, B., & Hanseok, K. (2003). A novel spectral subtraction scheme for robust speech recognition: spectral subtraction using spectral harmonics of speech. IEEE International Conference on Multimedia and Expo, vol. 3, pp. I-648–I-651.
Karan, B., Sahoo, J., & Sahu, P. K. (2015). Automatic speech recognition based odia system. International Conference on Microwave, Optical and Communcation Engineering, December 18–20, 2015, IIT Bhubaneswar, India.
Karnataka Raitha Mitra. (2008). raitamitra.kar.nic.in/statistics.html.
Karpov, A., Markov, K., Kipyatkova, I., Vazhinina, D., & Ronzhin, A. (2014). Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communications, 56(0167–6393), 213–228.
Article Google Scholar
Kipyatkova, I. S., & Karpov, A. A. (2017). A study of neural network russian language models for automatic continuous speech recognition systems. Automation and Remote Control, 78(5), 858–867.
Article MathSciNet Google Scholar
Kotkar, P., Thies, W., & Amarsinghe, S. (April 2008). An audio wiki for publishing user-generated content in the developing world. In HCI for Community and International Development, Florence, Italy.
Lamere, P., Kwok, P., Evandro, B. G., Singh, R., Walker, W., Wolf, P. (2003). The CMU Sphinx-4 speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing.
Loizou, P. (2007). Speech enhancement: Theory and practice (1st ed.). Boca Raton, FL: CRC Taylor & Francis.
Book Google Scholar
Lu, Y., & Loizou, P. C. (2011). Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Transactions on Audio, Speech, and Language processing, 19(5), 1123–1137.
Article Google Scholar
Ming, J., & Crookes, D. (2017). Speech enhancement based on full-sentence correlation and clean speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 531–543.
Article Google Scholar
Nahar, K. M. O., & Squeir, M. A. (2016). Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition. International Journal of Speech Technology, 19, 495–508.
Article Google Scholar
Popovic, B., Ostrogonac, S., Pakoci, E., Jakovljevic, N., Delic, V. (2015). Deep neural network based continuous speech recognition for Serbian using the Kaldi toolkit. Berlin: Springer, https://doi.org/10.1007/978-3-319-23132-23.
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., et al. (2011). The subspace gaussian mixture model-a structured model for speech recognition. Computer Speech and Language, 25(2), 404–439.
Article Google Scholar
Prabhaker, M. (April 2006). Tamil market: A spoken dialog system for rural India. In ACM CHI Conference.
Rabiner, L. R. (1994). Applications of voice processing to telecommunications. Proceedings of IEEE, 82, 199–228.
Article Google Scholar
Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp. 501–510.
Rose, Richard, & Tang Yun, (2011). An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition. In ICASSP, pp. 4508–4511.
Rose R. C, Yin, S.C., & Tang, Y, (2011). An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition, in Proc. ICASSP, pp. 4508-4511.
Sailor, H. B., & Patil, H. A. (2016). Novel unsupervised auditory filterbank learning using convolutional RBM for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2341–2353.
Article Google Scholar
Shahnawazuddin, S., Thotappa, D., Sharma, B. D., Deka, A., Parasanna, S. R. M., & Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. National Conference on Communications (NCC), New Delhi, India, pp. 1–5.
Thimmaraja, G. Y., & Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology (IJST), 20(3), 635–644. https://doi.org/10.1007/s10772-017-9428-y.
Article Google Scholar
Trihandoyo, A., Belloum, A., & Hou, K. M. (1995). A real-time speech recognition architecture for a multi-channel interactive voice response system. Proceedings of ICASSP, 4, 2687–2690.
Google Scholar
Van Segbroeck, M., & Van Hamme, H. (2011). Advances in missing feature techniques for robust large-vocabulary continuous speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 123–137.
Article Google Scholar
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition. Menlo Park: Sun Microsystems, Inc.
Wolfe, P. J., & Godsill, S. J. (Aug. 2001). Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement. In Proceedings of 11th IEEE Signal Process. Workshop Statist. Signal Process., pp. 496–499.
Xia, B., Liang, Y., & Bao, C. (Nov. 2009). A modified spectral subtraction method for speech enhancement based on masking property of human auditory system. International Conference on Wireless Communications Signal Processing, WCSP, pp. 1–5.
Yi, H., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229.
Article Google Scholar
Zhang, S. X., Ragni, A., & Gales, M. J. F. (2010). Structured log linear models for noise robust speech recognition. IEEE Signal Processing Letters, 17(11), 945–948.
Article Google Scholar

Download references

Acknowledgements

This study was supported by Department of Electronics and Information Technology (DeitY), Ministry of Communications and Information Technology, Government of India.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, School of Engineering and Technology, Jain Deemed to be University, Kanakapura Road, Karnataka, India
G. Thimmaraja Yadava
Department of Information Science and Engg, Siddaganga Institute of Technology, Tumkur, Karnataka, India
H. S. Jayanna

Authors

G. Thimmaraja Yadava
View author publications
You can also search for this author in PubMed Google Scholar
H. S. Jayanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Thimmaraja Yadava.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thimmaraja Yadava, G., Jayanna, H.S. Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. Int J Speech Technol 23, 149–167 (2020). https://doi.org/10.1007/s10772-020-09671-5

Download citation

Received: 19 July 2018
Accepted: 02 January 2020
Published: 22 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10772-020-09671-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Continuous Kannada Speech Recognition System Under Degraded Condition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Continuous Kannada Speech Recognition System Under Degraded Condition

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation