Abstract
We develop two improvements over our previously proposed spectral subtraction with voice activity detection and minimum mean square error spectrum power estimator based on zero crossing (SS-VAD + MMSE-SPZC) enhancement for a real-time spoken query system (SQS). Firstly, we introduce a time delay neural network (TDNN) based modeling technique. Secondly, to properly train the models, we increase the size of the database by collecting the Kannada speech data from an additional 500 farmers under real-time conditions. The proposed combined enhancement technique effectively removes background noise and improves speech quality. When evaluated on the updated degraded speech corpus, our proposed automatic speech recognition (ASR) system achieves better performance compared to previous framework. Moreover, experimental results demonstrate an improvement of 1.32% and 1.48% in terms of speech recognition accuracy for noisy and enhanced speech data respectively, compared to our earlier work.
Similar content being viewed by others
Data Availability
Code Availability
References
Li J (2022) Recent advances in end-to-end automatic speech recognition, Apsipa Transactions on Signal and Information Processing 11(1)
Jainar SJ, Sale PL, Nagaraja BG (2020) VAD, feature extraction and mod- elling techniques for speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering 12(1–2):1–18
Wu F, Kim K, Watanabe S, Han KJ, McDonald R, Weinberger KQ, Artzi Y (2023) Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages, In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing 1–5
Chang E, Seide F, Meng HM, Chen Z, Shi Y, Li YC (2002) A system for spoken query information retrieval on mobile devices. IEEE Trans Audio Speech Lang Process 10(8):531–541
Rabiner LR (1997) Applications of speech recognition in the area of telecom- munications, IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings 501–510
Malik M, Malik MK, Mehmood K, Makhdoom I (2021) Automatic speech recognition: a survey. Multimed Tools Appl 80:9411–9457
Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S, Zhou Z (2022) Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J Sel Top Signal Process 16(6):15191532
Kotkar P, Thies W, Amarasinghe S (2008) An audio wiki for publishing user- generated content in the developing world, in HCI for Community and International Development
Nagaraja BG, Jayanna HS (2013) Kannada language parameters for speaker identification with the constraint of limited data. International Journal of Image, Graphics and Signal Processing 5(9):14
Davies M, Guenther B, Leavy J, Mitchell T, Tanner T (2009) Climate change adaptation, disaster risk reduction and social protection: complementary roles in agriculture and rural growth?. IDS Working Papers 01–37
Wu C, Li X, Guo Y, Wang J, Ren Z, Wang M, Yang Z (2022) Natural language processing for smart construction: Current status and future directions. Automation in Construction 134:104059
Zhang Y, Han W, Qin J, Wang Y, Bapna A, Chen Z, Chen N, Li B, Axelrod V, Wang G, Meng Z (2023) Google usm: scaling automatic speech recognition beyond 100 languages, arXiv:2303.01037
Shahamiri SR (2021) Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans Neural Syst Rehabilitation Eng 29:852–861
Schultz BG, Tarigoppula VSA, Noffs G, Rojas S, van der Walt A, Grayden DB, Vogel AP (2021) Automatic speech recognition in neurodegener- ative disease. Int J Speech Technol 24(3):771–779
Dai Y, Wu Z (2021) Mobile-assisted pronunciation learning with feedback from peers and/or automatic speech recognition: a mixed-methods study, Computer Assisted Language Learning 1–24
Yadava TG, Jayanna HS (2018) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int J Speech Technol 22(3):639–648
Povey D et al (2011) The Kaldi speech recognition toolkit. IEEE Signal Processing Society, IEEE Work- shop on Automatic Speech Recognition and Understanding
Shahnawazuddin S, Thotappa D, Sarma BD, Deka A, Prasanna SRM, Sinha R (2013) Assamese spoken query system to access the price of agricultural commodities, National Conference on Communications 1–5
Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer, Speech and Language 9(2):171–185
Kuhn R, Junqua JC, Nguyen P, Niedzielski N (2000) Rapid speaker adapta- tion in Eigenvoice space, in IEEE Trans Speech Audio Processing 8(6):695–707
Ali A, Zhang Y, Cardinal P, Dahak N, Vogel S, Glass J (2014) A complete KALDI recipe for building Arabic speech recognition systems, IEEE Spoken Language Technology Workshop 525–529
Cardinal P, Ali A, Dehak N, Zhang Y, Hanai TA, Zhang Y, Glass JR, Vogel S (2014) Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera 2088–2092
Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication 56(3):213–228
Feng S, Kudina O, Halpern BM, Scharenborg O (2021) Quantifying bias in automatic speech recognition, arXiv:2103.15122
Miao Y, Gowayyed M, Metze F (2015) End-to-end speech recognition using deep (RNN) models and WFST-based decoding, arXiv:1507.08240
Shahnawazuddin S, Thotappa D, Dey A, Imani S, Prasanna SRM, Sinha R (2016) Improvements in IITG Assamese spoken query system: background noise suppression and alternate acoustic modeling, 1–6
Li J (2022) Recent advances in end-to-end automatic speech recognition, APSIPA Transactions on Signal and Information Processing 11(1)
Meng L, Xu J, Tan X, Wang J, Qin T, Xu B (2021) MixSpeech: data augmentation for low-resource automatic speech recognition, In IEEE international conference on acoustics, speech and signal processing, pp 7008–7012
Sailor H, Patil H (2018) Neural Networks-based automatic speech recognition for agricultural commodity in Gujarati language, proc. 6th workshop on spoken language technologies for under-resourced languages 162–166
Das R, Dey A, Lalhminghlui W, Sarmah P, Vijaya S, Sinha R (2020) Mizo spoken query system enhanced with prosodic information, IEEE 23rd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques 83–88
Mantena GV, Rajendran S, Gangashetty SV, Yegnanarayana B, Prahallad K (2011) Development of a spoken dialogue system for accessing agricultural information in Telugu, In Proceedings of ICON-2011, 9th international conference on natural language processing
Perero-Codosero JM, Espinoza-Cuadros FM, Hernández-Gómez LA, Luis A (2022) A comparison of hybrid and end-to-end ASR systems for the IberSpeech-RTVE 2020 speech-to-text transcription challenge. Applied Sciences 12(2):903
Zhang F, Wang Y, Zhang X, Liu C, Saraf Y, Zweig G (2020) Faster, simpler and more accurate hybrid asr systems using wordpieces, arXiv preprint arXiv:2005.09150
Yadava TG, Nagaraja BG, Jayanna HS (2022) Performance evaluation of spectral subtraction with vad and timefrequency ltering for speech enhancement, In Emerging Research in Computing, Information, Commu- nication and Applications 407–414
Defrancq B, Fantinuoli C (2021) Automatic speech recognition in the booth: assessment of system performance, interpreters performances, and inter- actions in the context of numbers. Target 33(1):73–102
Yadav H, Sitaram S (2022) A survey of multilingual models for automatic speech recognition, arXiv:2202.12576
Aldarmaki H, Ullah A, Ram S, Zaki N (2022) Unsupervised automatic speech recognition: a review. Speech Communication 139:76–91
Miao H, Cheng G, Zhang P, Yan Y (2020) Online hybrid CTC/atten- tion end-to-end automatic speech recognition architecture. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1452–1465
Yadava TG, Jayanna HS (2018) Improvements in spoken query system to access the agricultural commodity prices and weather information in Kan- nada language/dialects. Journal of Intelligent Systems 29(1):664–687
Funding
This work was a part of consortium project on “Speech-based Access of Agricultural Commodity Prices and Weather Information in 11 Indian Languages /Dialects, funded by the Technology Development for Indian Languages (TDIL) programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC &IT), Govt. of India (Grant number: 11(18)/2012-HCC(TDIL)).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Nagaraja B G, Jayanna H S and Shivakumar B R are contributed equally to this work.
Appendices
Appendix A: Considerations and challenges of the research approach
The following limitations should be considered when interpreting and applying this research findings to real-world SQS and ASR applications.
-
This research focuses on developing improvements to the ASR system specifically for the Kannada language/dialects. As a result, the findings and conclusions may not be directly applicable to other languages or dialects, limiting the generalizability of the approach.
-
The challenges of real-time data collection, such as background noise variations, environmental conditions, and other contextual factors, may impact the quality and diversity of the collected data.
Appendix B: Speech database description
The Table 6 presents the speech data collected for this study, encompassing Kannada language participants (male and female) across diverse dialect regions of Karnataka state.
Appendix C: Comparison of ASR toolkits
Appendix D: List of Acronyms
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
G, T.Y., G, N.B., S, J.H. et al. A spoken query system to access the real time agricultural commodity prices and weather information in Kannada language/dialects. Multimed Tools Appl 83, 28675–28688 (2024). https://doi.org/10.1007/s11042-023-16554-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16554-9