Evaluation of Advanced Language Modeling Techniques for Russian LVCSR

Vazhenina, Daria; Markov, Konstantin

doi:10.1007/978-3-319-01931-4_17

Daria Vazhenina²² &
Konstantin Markov²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1213 Accesses
5 Citations

Abstract

The Russian language is characterized by very flexible word order, which limits the ability of the standard n-grams to capture important regularities in the data. Moreover, it is highly inflectional language with rich morphology, which leads to high out-of-vocabulary (OOV) word rates. In this paper, we present comparison of two advanced language modeling techniques: factored language model (FLM) and recurrent neural network (RNN) language model, applied for Russian large vocabulary speech recognition. Evaluation experiments showed that the FLM, built using training corpus of 10M words was better and reduced the perplexity and word error rate (WER) by 20% and 4.0% respectively. Further WER reduction by 7.4% was achieved when the training data were increased to 40M words and 3-gram, FLM and RNN language models were combined together by linear interpolation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Comparison of RNN LM and FLM for Russian Speech Recognition

Language Models with RNNs for Rescoring Hypotheses of Russian ASR

A hybrid input-type recurrent neural network for LVCSR language modeling

Article Open access 08 August 2016

References

Cubberley, P.: Russian: a linguistic introduction. Cambridge University Press (2002)
Google Scholar
Whittaker, E.W., Woodland, P.C.: Comparison of language modelling techniques for Russian and English. In: Proc. ICSLP (1998)
Google Scholar
Stuker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proc. SPECOM, St. Peterburg, Russia, pp. 297–303 (September 2004)
Google Scholar
Vazhenina, D., Markov, K.: Phoneme set selection for Russian speech recognition. In: Proc. IEEE NLP-KE, Tokushima, Japan, pp. 475–478 (November 2011)
Google Scholar
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proc. InterSpeech, pp. 3161–3164 (August 2011)
Google Scholar
Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modelling for conversational Arabic speech recognition. Computer Speech and Language 20(4), 589–608 (2006)
Article Google Scholar
El-Desoky Mousa, A., Schluter, R., Ney, H.: Investigations on the use of morpheme level features in language models for Arabic LVCSR. In: Proc. ICASSP, Kyoto, Japan, pp. 5021–5024 (March 2012)
Google Scholar
Sak, H., Saraclar, M., Gungor, T.: Morphology-based and sub-word language modelling for Turkish speech recognition. In: Proc. ICASSP, Dallas, USA, pp. 5402–5405 (March 2010)
Google Scholar
Mikolov, T., Kopecky, J., Burget, L., Glembek, O., Cernocky, J.: Neural network based language models for highly inflective languages. In: Proc. ICASSP, Taipei, Taiwan, pp. 4725–4728 (April 2009)
Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model. In: Proc. InterSpeech, Makuhari, Japan, pp. 1045–1048 (September 2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extentions of recurrent neural network language models. In: Proc. ICASSP, Prague, Czech Republic, pp. 5528–5531 (May 2011)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. NeMLaP, Manchester, UK, pp. 44–49 (1994)
Google Scholar
Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., Divjak, D.: Designing and evaluating Russian tagsets. In: Proc. LREC, Marrakech, pp. 279–285 (May 2008)
Google Scholar
Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proc. SPECOM, St. Petersburg, Russia, pp. 515–520 (June 2009)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proc. APSIPA ASC, Sapporo, Japan, pp. 131–137 (October 2009)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. ICSLP, vol. 2, pp. 901–904 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Human Interface Laboratory, The University of Aizu, Japan
Daria Vazhenina & Konstantin Markov

Authors

Daria Vazhenina
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Markov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vazhenina, D., Markov, K. (2013). Evaluation of Advanced Language Modeling Techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of Advanced Language Modeling Techniques for Russian LVCSR

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Comparison of RNN LM and FLM for Russian Speech Recognition

Language Models with RNNs for Rescoring Hypotheses of Russian ASR

A hybrid input-type recurrent neural network for LVCSR language modeling

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluation of Advanced Language Modeling Techniques for Russian LVCSR

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Comparison of RNN LM and FLM for Russian Speech Recognition

Language Models with RNNs for Rescoring Hypotheses of Russian ASR

A hybrid input-type recurrent neural network for LVCSR language modeling

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation