A spoken query system to access the real time agricultural commodity prices and weather information in Kannada language/dialects | Multimedia Tools and Applications Skip to main content
Log in

A spoken query system to access the real time agricultural commodity prices and weather information in Kannada language/dialects

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We develop two improvements over our previously proposed spectral subtraction with voice activity detection and minimum mean square error spectrum power estimator based on zero crossing (SS-VAD + MMSE-SPZC) enhancement for a real-time spoken query system (SQS). Firstly, we introduce a time delay neural network (TDNN) based modeling technique. Secondly, to properly train the models, we increase the size of the database by collecting the Kannada speech data from an additional 500 farmers under real-time conditions. The proposed combined enhancement technique effectively removes background noise and improves speech quality. When evaluated on the updated degraded speech corpus, our proposed automatic speech recognition (ASR) system achieves better performance compared to previous framework. Moreover, experimental results demonstrate an improvement of 1.32% and 1.48% in terms of speech recognition accuracy for noisy and enhanced speech data respectively, compared to our earlier work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

https://sites.google.com/view/thimmarajayadavag/downloads

Code Availability

https://sites.google.com/view/thimmarajayadavag/downloads

References

  1. Li J (2022) Recent advances in end-to-end automatic speech recognition, Apsipa Transactions on Signal and Information Processing 11(1)

  2. Jainar SJ, Sale PL, Nagaraja BG (2020) VAD, feature extraction and mod- elling techniques for speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering 12(1–2):1–18

  3. Wu F, Kim K, Watanabe S, Han KJ, McDonald R, Weinberger KQ, Artzi Y (2023) Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages, In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing 1–5

  4. Chang E, Seide F, Meng HM, Chen Z, Shi Y, Li YC (2002) A system for spoken query information retrieval on mobile devices. IEEE Trans Audio Speech Lang Process 10(8):531–541

    Article  Google Scholar 

  5. Rabiner LR (1997) Applications of speech recognition in the area of telecom- munications, IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings 501–510

  6. Malik M, Malik MK, Mehmood K, Makhdoom I (2021) Automatic speech recognition: a survey. Multimed Tools Appl 80:9411–9457

    Article  Google Scholar 

  7. Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S, Zhou Z (2022) Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J Sel Top Signal Process 16(6):15191532

    Article  Google Scholar 

  8. Kotkar P, Thies W, Amarasinghe S (2008) An audio wiki for publishing user- generated content in the developing world, in HCI for Community and International Development

  9. Nagaraja BG, Jayanna HS (2013) Kannada language parameters for speaker identification with the constraint of limited data. International Journal of Image, Graphics and Signal Processing 5(9):14

    Article  Google Scholar 

  10. Davies M, Guenther B, Leavy J, Mitchell T, Tanner T (2009) Climate change adaptation, disaster risk reduction and social protection: complementary roles in agriculture and rural growth?. IDS Working Papers 01–37

  11. Wu C, Li X, Guo Y, Wang J, Ren Z, Wang M, Yang Z (2022) Natural language processing for smart construction: Current status and future directions. Automation in Construction 134:104059

    Article  Google Scholar 

  12. Zhang Y, Han W, Qin J, Wang Y, Bapna A, Chen Z, Chen N, Li B, Axelrod V, Wang G, Meng Z (2023) Google usm: scaling automatic speech recognition beyond 100 languages, arXiv:2303.01037

  13. Shahamiri SR (2021) Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans Neural Syst Rehabilitation Eng 29:852–861

    Article  Google Scholar 

  14. Schultz BG, Tarigoppula VSA, Noffs G, Rojas S, van der Walt A, Grayden DB, Vogel AP (2021) Automatic speech recognition in neurodegener- ative disease. Int J Speech Technol 24(3):771–779

    Article  Google Scholar 

  15. Dai Y, Wu Z (2021) Mobile-assisted pronunciation learning with feedback from peers and/or automatic speech recognition: a mixed-methods study, Computer Assisted Language Learning 1–24

  16. Yadava TG, Jayanna HS (2018) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int J Speech Technol 22(3):639–648

    Article  Google Scholar 

  17. Povey D et al (2011) The Kaldi speech recognition toolkit. IEEE Signal Processing Society, IEEE Work- shop on Automatic Speech Recognition and Understanding

    Google Scholar 

  18. Shahnawazuddin S, Thotappa D, Sarma BD, Deka A, Prasanna SRM, Sinha R (2013) Assamese spoken query system to access the price of agricultural commodities, National Conference on Communications 1–5

  19. Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer, Speech and Language 9(2):171–185

    Article  Google Scholar 

  20. Kuhn R, Junqua JC, Nguyen P, Niedzielski N (2000) Rapid speaker adapta- tion in Eigenvoice space, in IEEE Trans Speech Audio Processing 8(6):695–707

  21. Ali A, Zhang Y, Cardinal P, Dahak N, Vogel S, Glass J (2014) A complete KALDI recipe for building Arabic speech recognition systems, IEEE Spoken Language Technology Workshop 525–529

  22. Cardinal P, Ali A, Dehak N, Zhang Y, Hanai TA, Zhang Y, Glass JR, Vogel S (2014) Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera 2088–2092

  23. Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication 56(3):213–228

    Article  Google Scholar 

  24. Feng S, Kudina O, Halpern BM, Scharenborg O (2021) Quantifying bias in automatic speech recognition, arXiv:2103.15122

  25. Miao Y, Gowayyed M, Metze F (2015) End-to-end speech recognition using deep (RNN) models and WFST-based decoding, arXiv:1507.08240

  26. Shahnawazuddin S, Thotappa D, Dey A, Imani S, Prasanna SRM, Sinha R (2016) Improvements in IITG Assamese spoken query system: background noise suppression and alternate acoustic modeling, 1–6

  27. Li J (2022) Recent advances in end-to-end automatic speech recognition, APSIPA Transactions on Signal and Information Processing 11(1)

  28. Meng L, Xu J, Tan X, Wang J, Qin T, Xu B (2021) MixSpeech: data augmentation for low-resource automatic speech recognition, In IEEE international conference on acoustics, speech and signal processing, pp 7008–7012

  29. Sailor H, Patil H (2018) Neural Networks-based automatic speech recognition for agricultural commodity in Gujarati language, proc. 6th workshop on spoken language technologies for under-resourced languages 162–166

  30. Das R, Dey A, Lalhminghlui W, Sarmah P, Vijaya S, Sinha R (2020) Mizo spoken query system enhanced with prosodic information, IEEE 23rd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques 83–88

  31. Mantena GV, Rajendran S, Gangashetty SV, Yegnanarayana B, Prahallad K (2011) Development of a spoken dialogue system for accessing agricultural information in Telugu, In Proceedings of ICON-2011, 9th international conference on natural language processing

  32. Perero-Codosero JM, Espinoza-Cuadros FM, Hernández-Gómez LA, Luis A (2022) A comparison of hybrid and end-to-end ASR systems for the IberSpeech-RTVE 2020 speech-to-text transcription challenge. Applied Sciences 12(2):903

    Article  CAS  Google Scholar 

  33. Zhang F, Wang Y, Zhang X, Liu C, Saraf Y, Zweig G (2020) Faster, simpler and more accurate hybrid asr systems using wordpieces, arXiv preprint arXiv:2005.09150

  34. Yadava TG, Nagaraja BG, Jayanna HS (2022) Performance evaluation of spectral subtraction with vad and timefrequency ltering for speech enhancement, In Emerging Research in Computing, Information, Commu- nication and Applications 407–414

  35. Defrancq B, Fantinuoli C (2021) Automatic speech recognition in the booth: assessment of system performance, interpreters performances, and inter- actions in the context of numbers. Target 33(1):73–102

    Article  Google Scholar 

  36. Yadav H, Sitaram S (2022) A survey of multilingual models for automatic speech recognition, arXiv:2202.12576

  37. Aldarmaki H, Ullah A, Ram S, Zaki N (2022) Unsupervised automatic speech recognition: a review. Speech Communication 139:76–91

    Article  Google Scholar 

  38. Miao H, Cheng G, Zhang P, Yan Y (2020) Online hybrid CTC/atten- tion end-to-end automatic speech recognition architecture. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1452–1465

    Article  Google Scholar 

  39. Yadava TG, Jayanna HS (2018) Improvements in spoken query system to access the agricultural commodity prices and weather information in Kan- nada language/dialects. Journal of Intelligent Systems 29(1):664–687

    Article  Google Scholar 

Download references

Funding

This work was a part of consortium project on “Speech-based Access of Agricultural Commodity Prices and Weather Information in 11 Indian Languages /Dialects, funded by the Technology Development for Indian Languages (TDIL) programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC &IT), Govt. of India (Grant number: 11(18)/2012-HCC(TDIL)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thimmaraja Yadava G.

Ethics declarations

Conflict of interest

Authors have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Nagaraja B G, Jayanna H S and Shivakumar B R are contributed equally to this work.

Appendices

Appendix A: Considerations and challenges of the research approach

The following limitations should be considered when interpreting and applying this research findings to real-world SQS and ASR applications.

  • This research focuses on developing improvements to the ASR system specifically for the Kannada language/dialects. As a result, the findings and conclusions may not be directly applicable to other languages or dialects, limiting the generalizability of the approach.

  • The challenges of real-time data collection, such as background noise variations, environmental conditions, and other contextual factors, may impact the quality and diversity of the collected data.

Table 6 Description of speech database collection
Table 7 Comprehensive comparison of ASR toolkits

Appendix B: Speech database description

The Table 6 presents the speech data collected for this study, encompassing Kannada language participants (male and female) across diverse dialect regions of Karnataka state.

Appendix C: Comparison of ASR toolkits

Appendix D: List of Acronyms

Table 8 List of acronyms used in the research work

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

G, T.Y., G, N.B., S, J.H. et al. A spoken query system to access the real time agricultural commodity prices and weather information in Kannada language/dialects. Multimed Tools Appl 83, 28675–28688 (2024). https://doi.org/10.1007/s11042-023-16554-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16554-9

Keywords

Navigation