Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects

Thimmaraja Yadava, G.; Nagaraja, B. G.; Raghudathesh, G. P.

doi:10.1007/s11277-024-10903-z

Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects

Published: 02 March 2024

Volume 134, pages 209–223, (2024)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

G. Thimmaraja Yadava ORCID: orcid.org/0000-0002-3266-9732¹,
B. G. Nagaraja² &
G. P. Raghudathesh³

257 Accesses
2 Citations
Explore all metrics

Abstract

In this work, we present recent advancements in our earlier automatic continuous Kannada speech recognition (ACKSR) system under real-time conditions. In our previous research, we collected task-specific Kannada speech data from 2400 speakers in field conditions, proposing a robust noise elimination technique to enhance degraded speech data. The automatic speech recognition models were developed using Kaldi, and experimental results revealed slightly higher word error rates, attributed to the substantial speech data required for training deep neural networks. Building upon these findings, our current work addresses this limitation by expanding the database. We collected continuous Kannada speech data from an additional 300 speakers under real-time conditions. The updated degraded speech database underwent enhancement using the proposed noise elimination technique. The results demonstrate a significant improvement in the performance of the ACKSR system, particularly in terms of speech recognition accuracy compared to our earlier work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

An end-to-end continuous Kannada ASR system under uncontrolled environment

Article 13 June 2023

Continuous Kannada Speech Recognition System Under Degraded Condition

Article 15 July 2019

Data availability

Enquiries about data availability should be directed to the authors.

References

Rabiner, L. R. (1994). Applications of voice processing to telecommunications. Proceedings of the IEEE, 82, 199–228.
Article Google Scholar
Nagaraja, B. G., & Jayanna, H. S. (2016). Feature extraction and modelling techniques for multilingual speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 9(2), 67–78.
Article Google Scholar
Jainar, S. J., Sale, P. L., & Nagaraja, B. G. (2020). VAD, feature extraction and modelling techniques for speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 12(1–2), 1–18.
Article Google Scholar
Shahnawazuddin, S., et al. (2017). Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. Journal of Signal Processing Systems, 88, 91–102. https://doi.org/10.1007/s11265-016-1133-6
Article Google Scholar
Dey, A., Shahnawazuddin, S., Deepak, K. T., Imani, S., Prasanna, S. R. M., & Sinha, R. (2016). Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries. Twenty Second National Conference on Communication. https://doi.org/10.1109/NCC.2016.7561193
Article Google Scholar
Shahnawazuddin, S., et al. (2015). Low complexity on-line adaptation techniques in context of Assamese spoken query system. Journal of Signal Processing Systems, 81, 83–97. https://doi.org/10.1007/s11265-014-0906-z
Article Google Scholar
Shahnawazuddin, S., Thotappa, D., Sarma, B. D., Deka, A., Prasanna, S. R. M., & Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. National Conference on Communications. https://doi.org/10.1109/NCC.2013.6488011
Article Google Scholar
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). A spatial procedure to spectral subtraction for speech enhancement. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-12152-3
Article Google Scholar
Zhao, Y. (1993). A speaker independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units. IEEE Transactions on Speech and Audio Processing, 1(3), 345–361.
Article MathSciNet Google Scholar
Wachter, M. D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., & Compernolle, D. V. (2007). Template-based continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15(4), 1377–1389.
Article Google Scholar
Triefenbach, F., Demuynck, K., & Martens, J. P. (2014). Large vocabulary continuous speech recognition with reservoir-based acoustic models. IEEE Signal Processing Letters, 21(3), 311–315.
Article Google Scholar
Su, R., Liu, X., & Wang, L. (2015). Automatic complexity control of generalized variable parameter HMMs for noise robust speech recognition. IEEE Transactions on Speech and Audio Processing, 23(1), 102–114.
Google Scholar
He, F., Chu, S.-H.C., Kjartansson, O., Rivera, C., Katanova, A., Gutkin, A., Demirsahin, I., Johny, C., Jansche, M., Sarin, S. & Pipatsrisawat, K. (2020). Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In Proc. 12th language resources and evaluation conference, ELRA (pp. 6494–6503).
Shimada, Kazuki, Bando, Yoshiaki, Mimura, Masato, Itoyama, Katsutoshi, Yoshii, Kazuyoshi, & Kawahara, Tatsuya. (2019). Unsupervised speech enhancement based on multichannel NMF informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(5), 960–971.
Article Google Scholar
Loweimi, E., Barker, J., & Hain, T. (2017). Statistical normalisation of phase based feature representation for robust speech recognition. In IEEE international conference on acoustics, speech and signal processing (pp. 5310–5314).
Rani, P. S., Andhavarapu, S., & Kodukula, S. R. M. (2020). Significance of phase in DNN based speech enhancement algorithms. In IEEE proceedings of national conference on communications (pp. 1–5).
Sharma, U., Om, H., & Mishra, A. N. (2023). HindiSpeech-Net: A deep learning based robust automatic speech recognition system for Hindi language. Multimedia Tools and Applications, 82(11), 16173–16193.
Article Google Scholar
Kumar, A., & Aggarwal, R. K. (2022). Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 25(1), 67–78.
Article Google Scholar
Ganapathy, S. (2017). Multivariate autoregressive spectrogram modeling for noisy speech recognition. IEEE Signal Processing Letters, 24(9), 1373–1377.
Article Google Scholar
Changrampadi, M. H., Shahina, A., Narayanan, M. B., & Khan, A. N. (2022). End-to-end speech recognition of Tamil language. Intelligent Automation & Soft Computing, 32(2), 1309–1323.
Article Google Scholar
Thimmaraja Yadava, G., & Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology, 20(3), 1–10.
Google Scholar
Thimmaraja Yadava, G., & Jayanna, H. S. (2018). Speech enhancement by combining spectral subtraction and minimum mean square error spectrum power estimator based on zero crossing. International Journal of Speech Technology, 22(3), 639–648.
Article Google Scholar
Thimmaraja Yadava, G., & Jayanna, H. S. (2020). Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. International Journal of Speech Technology, 23(1), 149–167.
Article Google Scholar
Thimmaraja, Yadava G., & Jayanna, H. S. (2018). Improvements in spoken query system to access the agricultural commodity prices and weather information in Kannada language/dialects. Journal of Intelligent Systems, 29(1), 664–687.
Google Scholar
Praveen Kumar, P. S., Thimmaraja Yadava, G., & Jayanna, H. S. (2019). Continuous Kannada speech recognition system under degraded condition. Circuits, Systems and Signal Processing, 39(1), 391–419.
Article Google Scholar
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). Enhancements in continuous Kannada ASR system by background noise elimination. Circuits, Systems and Signal Processing. https://doi.org/10.1007/s00034-022-01973-0
Article Google Scholar
Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2023). An end-to-end continuous Kannada ASR system under uncontrolled environment. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15854-4
Article Google Scholar

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Department of E&CE, Nitte Meenakshi Institute of Technology, Yelahanka, Bengaluru, Karnataka, 560064, India
G. Thimmaraja Yadava
Department of E&CE, Vidyavardhaka College of Engineering, Gokulam 3 Stage, Mysuru, Karnataka, 570002, India
B. G. Nagaraja
Manipal School of Information Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
G. P. Raghudathesh

Authors

G. Thimmaraja Yadava
View author publications
You can also search for this author in PubMed Google Scholar
B. G. Nagaraja
View author publications
You can also search for this author in PubMed Google Scholar
G. P. Raghudathesh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GTY: Conceptualization, software, data collection, validation, formal analysis, investigation and original draft preparation. BGN: Conceptualization, methodology, validation, writing–review and editing, and investigation. GPR: Software and formal analysis.

Corresponding author

Correspondence to G. Thimmaraja Yadava.

Ethics declarations

Conflict of interest

Authors have no conflict of interest.

Ethical approval

Not applicable

Consent to participate

Not applicable

Consent for publication

On behalf of all authors, I consent to publish our manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Thimmaraja Yadava, G., Nagaraja, B.G. & Raghudathesh, G.P. Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects. Wireless Pers Commun 134, 209–223 (2024). https://doi.org/10.1007/s11277-024-10903-z

Download citation

Accepted: 29 January 2024
Published: 02 March 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11277-024-10903-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An end-to-end continuous Kannada ASR system under uncontrolled environment

Continuous Kannada Speech Recognition System Under Degraded Condition

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An end-to-end continuous Kannada ASR system under uncontrolled environment

Continuous Kannada Speech Recognition System Under Degraded Condition

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation