Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects | Wireless Personal Communications Skip to main content

Advertisement

Log in

Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

In this work, we present recent advancements in our earlier automatic continuous Kannada speech recognition (ACKSR) system under real-time conditions. In our previous research, we collected task-specific Kannada speech data from 2400 speakers in field conditions, proposing a robust noise elimination technique to enhance degraded speech data. The automatic speech recognition models were developed using Kaldi, and experimental results revealed slightly higher word error rates, attributed to the substantial speech data required for training deep neural networks. Building upon these findings, our current work addresses this limitation by expanding the database. We collected continuous Kannada speech data from an additional 300 speakers under real-time conditions. The updated degraded speech database underwent enhancement using the proposed noise elimination technique. The results demonstrate a significant improvement in the performance of the ACKSR system, particularly in terms of speech recognition accuracy compared to our earlier work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  1. Rabiner, L. R. (1994). Applications of voice processing to telecommunications. Proceedings of the IEEE, 82, 199–228.

    Article  Google Scholar 

  2. Nagaraja, B. G., & Jayanna, H. S. (2016). Feature extraction and modelling techniques for multilingual speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 9(2), 67–78.

    Article  Google Scholar 

  3. Jainar, S. J., Sale, P. L., & Nagaraja, B. G. (2020). VAD, feature extraction and modelling techniques for speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 12(1–2), 1–18.

    Article  Google Scholar 

  4. Shahnawazuddin, S., et al. (2017). Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. Journal of Signal Processing Systems, 88, 91–102. https://doi.org/10.1007/s11265-016-1133-6

    Article  Google Scholar 

  5. Dey, A., Shahnawazuddin, S., Deepak, K. T., Imani, S., Prasanna, S. R. M., & Sinha, R. (2016). Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries. Twenty Second National Conference on Communication. https://doi.org/10.1109/NCC.2016.7561193

    Article  Google Scholar 

  6. Shahnawazuddin, S., et al. (2015). Low complexity on-line adaptation techniques in context of Assamese spoken query system. Journal of Signal Processing Systems, 81, 83–97. https://doi.org/10.1007/s11265-014-0906-z

    Article  Google Scholar 

  7. Shahnawazuddin, S., Thotappa, D., Sarma, B. D., Deka, A., Prasanna, S. R. M., & Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. National Conference on Communications. https://doi.org/10.1109/NCC.2013.6488011

    Article  Google Scholar 

  8. Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). A spatial procedure to spectral subtraction for speech enhancement. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-12152-3

    Article  Google Scholar 

  9. Zhao, Y. (1993). A speaker independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units. IEEE Transactions on Speech and Audio Processing, 1(3), 345–361.

    Article  MathSciNet  Google Scholar 

  10. Wachter, M. D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., & Compernolle, D. V. (2007). Template-based continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 15(4), 1377–1389.

    Article  Google Scholar 

  11. Triefenbach, F., Demuynck, K., & Martens, J. P. (2014). Large vocabulary continuous speech recognition with reservoir-based acoustic models. IEEE Signal Processing Letters, 21(3), 311–315.

    Article  Google Scholar 

  12. Su, R., Liu, X., & Wang, L. (2015). Automatic complexity control of generalized variable parameter HMMs for noise robust speech recognition. IEEE Transactions on Speech and Audio Processing, 23(1), 102–114.

    Google Scholar 

  13. He, F., Chu, S.-H.C., Kjartansson, O., Rivera, C., Katanova, A., Gutkin, A., Demirsahin, I., Johny, C., Jansche, M., Sarin, S. & Pipatsrisawat, K. (2020). Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In Proc. 12th language resources and evaluation conference, ELRA (pp. 6494–6503).

  14. Shimada, Kazuki, Bando, Yoshiaki, Mimura, Masato, Itoyama, Katsutoshi, Yoshii, Kazuyoshi, & Kawahara, Tatsuya. (2019). Unsupervised speech enhancement based on multichannel NMF informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(5), 960–971.

    Article  Google Scholar 

  15. Loweimi, E., Barker, J., & Hain, T. (2017). Statistical normalisation of phase based feature representation for robust speech recognition. In IEEE international conference on acoustics, speech and signal processing (pp. 5310–5314).

  16. Rani, P. S., Andhavarapu, S., & Kodukula, S. R. M. (2020). Significance of phase in DNN based speech enhancement algorithms. In IEEE proceedings of national conference on communications (pp. 1–5).

  17. Sharma, U., Om, H., & Mishra, A. N. (2023). HindiSpeech-Net: A deep learning based robust automatic speech recognition system for Hindi language. Multimedia Tools and Applications, 82(11), 16173–16193.

    Article  Google Scholar 

  18. Kumar, A., & Aggarwal, R. K. (2022). Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 25(1), 67–78.

    Article  Google Scholar 

  19. Ganapathy, S. (2017). Multivariate autoregressive spectrogram modeling for noisy speech recognition. IEEE Signal Processing Letters, 24(9), 1373–1377.

    Article  Google Scholar 

  20. Changrampadi, M. H., Shahina, A., Narayanan, M. B., & Khan, A. N. (2022). End-to-end speech recognition of Tamil language. Intelligent Automation & Soft Computing, 32(2), 1309–1323.

    Article  Google Scholar 

  21. Thimmaraja Yadava, G., & Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology, 20(3), 1–10.

    Google Scholar 

  22. Thimmaraja Yadava, G., & Jayanna, H. S. (2018). Speech enhancement by combining spectral subtraction and minimum mean square error spectrum power estimator based on zero crossing. International Journal of Speech Technology, 22(3), 639–648.

    Article  Google Scholar 

  23. Thimmaraja Yadava, G., & Jayanna, H. S. (2020). Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. International Journal of Speech Technology, 23(1), 149–167.

    Article  Google Scholar 

  24. Thimmaraja, Yadava G., & Jayanna, H. S. (2018). Improvements in spoken query system to access the agricultural commodity prices and weather information in Kannada language/dialects. Journal of Intelligent Systems, 29(1), 664–687.

    Google Scholar 

  25. Praveen Kumar, P. S., Thimmaraja Yadava, G., & Jayanna, H. S. (2019). Continuous Kannada speech recognition system under degraded condition. Circuits, Systems and Signal Processing, 39(1), 391–419.

    Article  Google Scholar 

  26. Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2022). Enhancements in continuous Kannada ASR system by background noise elimination. Circuits, Systems and Signal Processing. https://doi.org/10.1007/s00034-022-01973-0

    Article  Google Scholar 

  27. Thimmaraja Yadava, G., Nagaraja, B. G., & Jayanna, H. S. (2023). An end-to-end continuous Kannada ASR system under uncontrolled environment. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15854-4

    Article  Google Scholar 

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

GTY: Conceptualization, software, data collection, validation, formal analysis, investigation and original draft preparation. BGN: Conceptualization, methodology, validation, writing–review and editing, and investigation. GPR: Software and formal analysis.

Corresponding author

Correspondence to G. Thimmaraja Yadava.

Ethics declarations

Conflict of interest

Authors have no conflict of interest.

Ethical approval

Not applicable

Consent to participate

Not applicable

Consent for publication

On behalf of all authors, I consent to publish our manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thimmaraja Yadava, G., Nagaraja, B.G. & Raghudathesh, G.P. Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects. Wireless Pers Commun 134, 209–223 (2024). https://doi.org/10.1007/s11277-024-10903-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-024-10903-z

Keywords

Navigation