Abstract
This research work showcases advancements in an isolated Kannada automatic speech recognition (ASR) system designed for accessing agricultural commodity prices and weather information in uncontrolled environments. The system includes an interactive voice response system (IVRS), models of ASR, and databases of weather and agricultural commodity prices information. However, the previous system suffered from reduced accuracy due to the presence of various background noises during offline and online speech recognition. To address this issue, the proposed system includes a background noise reduction module that is introduced before the part of speech feature extraction. The investigation results indicate that the proposed noise reduction algorithm outperforms traditional signal processing algorithms, resulting in no audibility of musical and other background noises in the enhanced NOIZEUS speech corpora and isolated Kannada speech data. The use of this noise suppression algorithm and time delay neural network (TDNN) ASR modeling technique in the system results in a 1.1% improvement in speech recognition accuracy compared to the previous deep neural network - hidden Markov model (DNN-HMM) based system. The enhanced isolated Kannada system was tested online by 500 speakers/users for accessing real-time agricultural commodity prices and weather information in Kannada language/dialects under corrupted environments. The algorithms source code and ASR models are made publicly available.
Similar content being viewed by others
References
Al-Anzi FS (2022) Improved noise-resilient isolated words speech recognition using piecewise differentiation. Fractals 30(08):2240227
Bhable S, Lahase A, Maher S (2021) Automatic speech recognition (ASR) of isolated words in Hindi low resource Language. Int J Res Appl Sci Eng Technol 9(2):260–265
Boumehdi A, Yousfi A (2020) Construction of a database for speech recognition of isolated Arabic words. In: Proc Int conf on intelligent systems: theories and applications, pp 1–4
Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121
Hu YJSC (2007) Subjective evaluation and comparison of speech enhancement algorithms. Speech Comm 49:588–601
Hu Y, Loizou PC (2006) Evaluation of objective measures for speech enhancement. In: Ninth international conference on spoken language processing
Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Kumar PS, Yadava TG, Jayanna HS (2019) Continuous Kannada speech recognition system under degraded condition. Circuits, Systems and Signal Processing, Springer 39(1):391–419
Kumar A, Solanki SS, Chandra M (2022) Effect of background Indian music on performance of speech recognition models for Hindi databases. International Journal of Speech Technology, 1–12
Liu F, Zhao L, Cheng X, Dai Q, Shi X, Qiao J (2020) Fine-grained action recognition by motion saliency and mid-level patches. Appl Sci 10 (8):2811
Loizou PC (2005) Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Trans Speech Audio Process 13(5):857–869
Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466
Maruf MR, Faruque MO, Mahmood S, Nelima NN, Muhtasim MG, Pervez MJA (2020) Effects of noise on RASTA-PLP and MFCC based Bangla ASR using CNN. IEEE Region 10 Symposium, 1564–1567
Nagaraja BG, Jayanna HS (2013) Kannada language parameters for speaker identification with the constraint of limited data. Int J Image Graph Signal Process 5(9):14
Nagaraja BG, Jayanna HS (2013) Combination of features for crosslingual speaker identification with the constraint of limited data. Inproceedings of the Fourth International Conference on Signal and Image Processing 1:143–148
Papoulis A, Unnikrishna SP (2002) Probability random variables and stochastic processes
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J (2011) The Kaldi speech recognition toolkit. IEEE Workshop on Automatic Speech Recognition and Understanding, IEEE Signal Processing Society
Recommendation IT (2001) Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec ITU-T 862
Shahnawazuddin S, Deepak KT, Sarma BD, Deka A, Prasanna SRM, Sinha R (2015) Low complexity on-line adaptation techniques in context of Assamese spoken query system. J Signal Process Syst 81:83–97
Shahnawazuddin S, Deepak KT, Dey A, Siddika I, Prasanna SRM, Sinha R (2017) Improvements in IITG Assamese spoken query system: background noise suppression and alternate acoustic modeling. J Signal Process Syst 88:91–102
Shareef SR, Irhayim YF (2021) A review: isolated Arabic words recognition using artificial intelligent techniques. J Phys: Conf Ser 1897(1):012–026
Slivova M, Voznak M, Tovarek J, Partila P (2022) Detection of speaker liveness with CNN isolated word ASR for verification systems. Multimed Tools Appl 81(7):9445–9457
Tejedor-García C, Cardeñoso-Payo V, Escudero-Mancebo D (2021) Automatic speech recognition (ASR) systems applied to pronunciation assessment of L2 Spanish for Japanese speakers. Appl Sci 11(15):6695
Yadava TG, Jayanna HS (2020) Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. Int J Speech Technol 23:149–167
Yoma NB, McInnes FR, Jack MA (1998) Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans Speech Audio Process 6(6):579–582
Zada B, Ullah R (2020) Pashto isolated digits recognition using deep convolutional neural network. Heliyon 6(2):e03372
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflict of interests on the manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
https://sites.google.com/view/thimmarajayadavag/downloads.
Nagaraja B G and Jayanna H S contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
G, T.Y., G, N.B. & S, J.H. Improvements in ASR system to access the real-time agricultural commodity prices and weather information in Kannada language/dialects. Multimed Tools Appl 83, 4195–4217 (2024). https://doi.org/10.1007/s11042-023-15350-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15350-9