Abstract
The Russian language is characterized by very flexible word order, which limits the ability of the standard n-grams to capture important regularities in the data. Moreover, it is highly inflectional language with rich morphology, which leads to high out-of-vocabulary (OOV) word rates. In this paper, we present comparison of two advanced language modeling techniques: factored language model (FLM) and recurrent neural network (RNN) language model, applied for Russian large vocabulary speech recognition. Evaluation experiments showed that the FLM, built using training corpus of 10M words was better and reduced the perplexity and word error rate (WER) by 20% and 4.0% respectively. Further WER reduction by 7.4% was achieved when the training data were increased to 40M words and 3-gram, FLM and RNN language models were combined together by linear interpolation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cubberley, P.: Russian: a linguistic introduction. Cambridge University Press (2002)
Whittaker, E.W., Woodland, P.C.: Comparison of language modelling techniques for Russian and English. In: Proc. ICSLP (1998)
Stuker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proc. SPECOM, St. Peterburg, Russia, pp. 297–303 (September 2004)
Vazhenina, D., Markov, K.: Phoneme set selection for Russian speech recognition. In: Proc. IEEE NLP-KE, Tokushima, Japan, pp. 475–478 (November 2011)
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proc. InterSpeech, pp. 3161–3164 (August 2011)
Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modelling for conversational Arabic speech recognition. Computer Speech and Language 20(4), 589–608 (2006)
El-Desoky Mousa, A., Schluter, R., Ney, H.: Investigations on the use of morpheme level features in language models for Arabic LVCSR. In: Proc. ICASSP, Kyoto, Japan, pp. 5021–5024 (March 2012)
Sak, H., Saraclar, M., Gungor, T.: Morphology-based and sub-word language modelling for Turkish speech recognition. In: Proc. ICASSP, Dallas, USA, pp. 5402–5405 (March 2010)
Mikolov, T., Kopecky, J., Burget, L., Glembek, O., Cernocky, J.: Neural network based language models for highly inflective languages. In: Proc. ICASSP, Taipei, Taiwan, pp. 4725–4728 (April 2009)
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model. In: Proc. InterSpeech, Makuhari, Japan, pp. 1045–1048 (September 2010)
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extentions of recurrent neural network language models. In: Proc. ICASSP, Prague, Czech Republic, pp. 5528–5531 (May 2011)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. NeMLaP, Manchester, UK, pp. 44–49 (1994)
Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., Divjak, D.: Designing and evaluating Russian tagsets. In: Proc. LREC, Marrakech, pp. 279–285 (May 2008)
Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proc. SPECOM, St. Petersburg, Russia, pp. 515–520 (June 2009)
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proc. APSIPA ASC, Sapporo, Japan, pp. 131–137 (October 2009)
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. ICSLP, vol. 2, pp. 901–904 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Vazhenina, D., Markov, K. (2013). Evaluation of Advanced Language Modeling Techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)