Abstract
In this paper, the role of speech recognition system in the assessment of dysarthric speech based on a method called Elman back propagation network (EBN) is studied. Dysarthria is a neurological disability that damages the control of motor speech articulators. The persons who suffer from Dysarthria may have speech intelligibility rate which may vary from low (2 %) to high (95 %). EBN is a Recurrent network, here a fully connected neural network is built such that the speech characteristics are represented simultaneously by neuron activation states. It is an efficient self supervised training algorithm. For parametric representation of the speech signal, we used Glottal feature along with mel frequency cepstral coefficients. Then finally the output of both the features is compared after the evaluation process using different neural networks and modeling methods. Evaluation of the proposed method is done on the subset of the Universal Access Research database. The subset consists of 9 dysarthric speakers out of 19 speakers each uttering 100 words repeatedly 3 times. The promising performance of the proposed system can be successfully applied to help the people who work for the voice disorder persons.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arisoy, E., Chen, S. F., Ramabhadran, B., & Sethy, A. (2014). Converting neural network language models into back-off language models for efficient decoding in automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 184–192.
Buabin, E. (2012). Boosted hybrid recurrent neural classifier for text document classification on the Reuters news text corpus. International Journal of Machine Learning and Computing, 2(5), 588–592.
De Mulder, W., Bethard, S., & Moens, M.-F. (2015). A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language, 30(1), 61–98.
Dede, G., & Sazli, M. H. (2010). Speech recognition with artificial neural networks. Digital Signal Processing, 20(3), 763–768.
Duffy, J. (1995). Motor speech disorders. St. Louis: Mosby.
Fachrie, M., & Harjoko, A. (2015). Robust Indonesian Digit Speech Recognition using Elman Recurrent Neural Network. Konferensi Nasional Informatika (KNIF), 2015, 49–54.
Finch, A., Dixon, P., & Sumita, E. (2012). Rescoring a phrase-based machine transliteration system with recurrent neural network language models. NEWS’12 Proceedings of the 4th named entity workshop (pp. 47–51).
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610.
Green, P., Carmiehael, J., Hatzis, A., Enderby, P., Hawley, M., & Parker, M. (2003). Automatic speech recognition with sparse training data for dysarthric speakers. Proceedings of the 8th European conference on speech communication and technology (pp. 1189–1192). Geneva.
Hawley, M. S., Enderby, P., Green, P., Cuningham, S., Brownsell, S., Carmichael, J., et al. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29(5), 586–593.
Huang, F., Ahuja, A., Downey, D., Yang, Y., Guo, Y., & Yates, A. (2014). Learning representations for weakly supervised natural language processing tasks. Computational Linguistics, 40(1), 85–120.
Jayaram, G., & Abdelhamied, K. (1995). Experiments in dysarthric speech recognition using artificial neural networks. Journal of Rehabilitation Research and Developmen., 32(2), 162–169.
Lecorvé, G., & Motlicek, P. (2012). Conversion of recurrent neural network language models to weighted finite state transducers for automatic speech recognition. Proceedings of Interspeech (pp. 1666–1669).
Love, R. J. (1992). Childhood motor speech disability. Boston: Allyn and Bacon.
Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzio, J. E., & Bunnell, H. T. (1996). The Nemours database of dysarthric speech. Fourth international conference on spoken language. ICSLP 96 (Vol. 3, pp. 1962–1965). Philadelphia, PA.
Michaelis, D., Gramss, T., & Strube, H. W. (1997). Glottal-to-noise excitation ratio: A new measure for describing pathological voices. ACUSTICA Acta Acustica, 83, 700–706.
Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2015). Learning longer memory. In Recurrent neural networks. arXiv:1412.7753v2 [cs.NE].
O’Shaughnessy, D. (2001). Speech communication human and machines (II ed.). New Delhi: Universities press (India) Limited.
Selouani S.-A., Yakoub, M. S., & O’Shaughnessy D. (2009). Alternative speech communication system for persons with severe speech disorders. Eurasip Journal on Advances in Signal Processing, pp. 1–12, Article No. 6.
Selva Nidhyananthan S., Shantha Selva Kumari, R., & Jaffino, G. (2012). Text-independent speaker identification using residual feature extraction Technique. CiiT International Journal of Digital Signal Processing, Vol. 4(3), pp. 81–85.
Sheela, K. G, & Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering. Article ID 425740.
Shi, Y., Zhang, W. -Q., Liu, J., & Johnson, M. T. (2013). RNN language model with word clustering and class-based output layer. EURASIP Journal on Audio, Speech, and Music Processing, 2013, 22.
Sundermeyer, M, Oparin, I., Gauvain, J. -L., Freiberg, B., Schluter, R., & Ney, H. (2013). Comparison of feedforward and recurrent neural network language models. Proceedings of the international conference on acoustics, speech and signal processing (pp. 8430–8434).
Trentin, E., & Gori, M. (2010). A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing, 20(3), 763–768.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Selva Nidhyananthan, S., Shantha Selva kumari, R. & Shenbagalakshmi, V. Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition. Int J Speech Technol 19, 577–583 (2016). https://doi.org/10.1007/s10772-016-9349-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9349-1