Abstract
In the paper, we describe a research of recurrent neural network (RNN) language model (LM) for N-best list rescoring for automatic continuous Russian speech recognition and make a comparison of it with factored language model (FLM). We tried RNN with different number of units in the hidden layer. For FLM creation, we used five linguistic factors: word, lemma, stem, part-of-speech, and morphological tag. All models were trained on the text corpus of 350M words. Also we made linear interpolation of RNN LM and FLM with the baseline 3-gram LM. We achieved the relative WER reduction of 8 % using FLM and 14 % relative WER reduction using RNN LM with respect to the baseline model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Huang, Z., Zweig, G., Dumoulin, B.: Cache based recurrent neural network language model inference for first pass speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6354–6358. IEEE (2014)
Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of the SPECOM, pp. 515–520 (2009)
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)
Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 219–226. Springer, Heidelberg (2013)
Kipyatkova, I., Karpov, A.: Development of factored language models for automatic Russian speech recognition. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 234–246 (2015)
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC 2009, 2009 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 131–137. International Organizing Committee (2009)
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-recurrent neural network language modeling toolkit. In: Proceedings of the ASRU Workshop, pp. 196–201 (2011)
Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 201–208 (2005)
Shi, Y., Larson, M., Wiggers, P., Jonker, C.M.: Exploiting the succeeding words in recurrent neural network language models. In: INTERSPEECH, pp. 632–636 (2013)
Sokirko, A.: Morphological modules on the website. In: Proceedings of Dialog 2004 International Conference, pp. 559–564 (2004). www.aot.ru
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: Srilm at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, p. 5 (2011)
Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Vazhenina, D., Markov, K.: Evaluation of advanced language modeling techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 124–131. Springer, Heidelberg (2013)
Zulkarneev, M., Penalov, S.: System of speech recognition for Russian language, using deep neural networks and finite state transducers. Neurocomput. Develop. Appl. 10, 40–46 (2013)
Acknowledgments
This research is partially supported by the Council for Grants of the President of Russia (Projects No. MK-5209.2015.8 and MD-3035.2015.8), by the Russian Foundation for Basic Research (Projects No. 15-07-04415 and 15-07-04322), and by the Government of the Russian Federation (Grant No. 074-U01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kipyatkova, I., Karpov, A. (2015). A Comparison of RNN LM and FLM for Russian Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)