Abstract
In this work, we present recent advancements in our earlier automatic continuous Kannada speech recognition (ACKSR) system under real-time conditions. In our previous research, we collected task-specific Kannada speech data from 2400 speakers in field conditions, proposing a robust noise elimination technique to enhance degraded speech data. The automatic speech recognition models were developed using Kaldi, and experimental results revealed slightly higher word error rates, attributed to the substantial speech data required for training deep neural networks. Building upon these findings, our current work addresses this limitation by expanding the database. We collected continuous Kannada speech data from an additional 300 speakers under real-time conditions. The updated degraded speech database underwent enhancement using the proposed noise elimination technique. The results demonstrate a significant improvement in the performance of the ACKSR system, particularly in terms of speech recognition accuracy compared to our earlier work.




Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Rabiner, L. R. (1994). Applications of voice processing to telecommunications. Proceedings of the IEEE, 82, 199–228.
Nagaraja, B. G., & Jayanna, H. S. (2016). Feature extraction and modelling techniques for multilingual speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 9(2), 67–78.
Jainar, S. J., Sale, P. L., & Nagaraja, B. G. (2020). VAD, feature extraction and modelling techniques for speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 12(1–2), 1–18.
Shahnawazuddin, S., et al. (2017). Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. Journal of Signal Processing Systems, 88, 91–102. https://doi.org/10.1007/s11265-016-1133-6
Dey, A., Shahnawazuddin, S., Deepak, K. T., Imani, S., Prasanna, S. R. M., & Sinha, R. (2016). Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries. Twenty Second National Conference on Communication. https://doi.org/10.1109/NCC.2016.7561193
Shahnawazuddin, S., et al. (2015). Low complexity on-line adaptation techniques in context of Assamese spoken query system. Journal of Signal Processing Systems, 81, 83–97. https://doi.org/10.1007/s11265-014-0906-z
Shahnawazuddin, S., Thotappa, D., Sarma, B. D., Deka, A., Prasanna, S. R. M., & Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. National Conference on Communications. https://doi.org/10.1109/NCC.2013.6488011
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). A spatial procedure to spectral subtraction for speech enhancement. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-12152-3
Zhao, Y. (1993). A speaker independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units. IEEE Transactions on Speech and Audio Processing, 1(3), 345–361.
Wachter, M. D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., & Compernolle, D. V. (2007). Template-based continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15(4), 1377–1389.
Triefenbach, F., Demuynck, K., & Martens, J. P. (2014). Large vocabulary continuous speech recognition with reservoir-based acoustic models. IEEE Signal Processing Letters, 21(3), 311–315.
Su, R., Liu, X., & Wang, L. (2015). Automatic complexity control of generalized variable parameter HMMs for noise robust speech recognition. IEEE Transactions on Speech and Audio Processing, 23(1), 102–114.
He, F., Chu, S.-H.C., Kjartansson, O., Rivera, C., Katanova, A., Gutkin, A., Demirsahin, I., Johny, C., Jansche, M., Sarin, S. & Pipatsrisawat, K. (2020). Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In Proc. 12th language resources and evaluation conference, ELRA (pp. 6494–6503).
Shimada, Kazuki, Bando, Yoshiaki, Mimura, Masato, Itoyama, Katsutoshi, Yoshii, Kazuyoshi, & Kawahara, Tatsuya. (2019). Unsupervised speech enhancement based on multichannel NMF informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(5), 960–971.
Loweimi, E., Barker, J., & Hain, T. (2017). Statistical normalisation of phase based feature representation for robust speech recognition. In IEEE international conference on acoustics, speech and signal processing (pp. 5310–5314).
Rani, P. S., Andhavarapu, S., & Kodukula, S. R. M. (2020). Significance of phase in DNN based speech enhancement algorithms. In IEEE proceedings of national conference on communications (pp. 1–5).
Sharma, U., Om, H., & Mishra, A. N. (2023). HindiSpeech-Net: A deep learning based robust automatic speech recognition system for Hindi language. Multimedia Tools and Applications, 82(11), 16173–16193.
Kumar, A., & Aggarwal, R. K. (2022). Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 25(1), 67–78.
Ganapathy, S. (2017). Multivariate autoregressive spectrogram modeling for noisy speech recognition. IEEE Signal Processing Letters, 24(9), 1373–1377.
Changrampadi, M. H., Shahina, A., Narayanan, M. B., & Khan, A. N. (2022). End-to-end speech recognition of Tamil language. Intelligent Automation & Soft Computing, 32(2), 1309–1323.
Thimmaraja Yadava, G., & Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology, 20(3), 1–10.
Thimmaraja Yadava, G., & Jayanna, H. S. (2018). Speech enhancement by combining spectral subtraction and minimum mean square error spectrum power estimator based on zero crossing. International Journal of Speech Technology, 22(3), 639–648.
Thimmaraja Yadava, G., & Jayanna, H. S. (2020). Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. International Journal of Speech Technology, 23(1), 149–167.
Thimmaraja, Yadava G., & Jayanna, H. S. (2018). Improvements in spoken query system to access the agricultural commodity prices and weather information in Kannada language/dialects. Journal of Intelligent Systems, 29(1), 664–687.
Praveen Kumar, P. S., Thimmaraja Yadava, G., & Jayanna, H. S. (2019). Continuous Kannada speech recognition system under degraded condition. Circuits, Systems and Signal Processing, 39(1), 391–419.
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). Enhancements in continuous Kannada ASR system by background noise elimination. Circuits, Systems and Signal Processing. https://doi.org/10.1007/s00034-022-01973-0
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2023). An end-to-end continuous Kannada ASR system under uncontrolled environment. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15854-4
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
GTY: Conceptualization, software, data collection, validation, formal analysis, investigation and original draft preparation. BGN: Conceptualization, methodology, validation, writing–review and editing, and investigation. GPR: Software and formal analysis.
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflict of interest.
Ethical approval
Not applicable
Consent to participate
Not applicable
Consent for publication
On behalf of all authors, I consent to publish our manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Thimmaraja Yadava, G., Nagaraja, B.G. & Raghudathesh, G.P. Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects. Wireless Pers Commun 134, 209–223 (2024). https://doi.org/10.1007/s11277-024-10903-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-024-10903-z