On the Most Frequent Sequences of Words in Russian Spoken Everyday Language (Bigrams and Trigrams): An Experience of Classification | SpringerLink
Skip to main content

On the Most Frequent Sequences of Words in Russian Spoken Everyday Language (Bigrams and Trigrams): An Experience of Classification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Abstract

The article provides a description of the most frequent bigrams and trigrams obtained using the n-gram analysis technique on a representative sample of Russian spoken language. N-gram analysis allows identifying frequent lists of sequences consisting of n graphical words, which is important for describing corpus material in various theoretical and applied aspects. The source data for applying this technique was a sample of 388 episodes of everyday speech communication from the ORD corpus (about 110 hours of audio). The results of the n-gram analysis in the form of frequency lists of word sequences allow constructing a typology of the most common bigrams and trigrams in Russian oral communication and lead the study equally to the levels of grammar, pragmatics, lexicon, and phraseology. The list of the most frequent bigrams and trigrams contains grammatical structures (U TEBYA, YA NE PONIMAYU, MNE KAZHETSYA), idioms (in a broad sense of the term) (VSYO RAVNO, TO ZHE SAMOE), introductory units (TAK SKAZAT’, S DRUGOY STORONY), as well as a number of sequences typical only for oral speech, such as one-word pragmatic markers (NU VOT, KAK BY, NU V OBSHCEM), amplifications (DA-DA, TAK-TAK-TAK), and hesitations-vocalizations (E-E, M-M-M). The obtained frequency lists can be useful for solving many modern applied natural language processing tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. MAS. In: Evgenieva, A.P. (ed.) Dictionary of the Russian Language in Four Volumes, vol. IV. S – Ya, 790p. Russian Language, Moscow (1988)

    Google Scholar 

  2. MAS. In: Evgenieva, A.P. (ed.) Dictionary of the Russian Language in Four Volumes, vol. II. K – O, 736p. Russian Language, Moscow (1986)

    Google Scholar 

  3. Fillmore, Ch.J., Kay, P., O’Connor, M.C.: Regularity and idiomaticity in grammatical constructions: the case of let alone. Language 64(3), 501–538 (1988)

    Google Scholar 

  4. Hilpert, M.: Construction Grammar and its Application to English, p. 220. Edinburgh University Press, Edinburgh (2014)

    Google Scholar 

  5. Dobrovolsky, D.O.: Grammar of constructions and phraseology. Quest. Linguist. 3, 7–21 (2016)

    Google Scholar 

  6. Sherstinova, T.: The syntax of everyday russian speech through the prism of N-gram analysis. In: Glazunova, O.I., Rogova, K.A. (eds.) Russian Grammar: Structural Organization of Language and Processes of Language Functioning, pp. 454–466. LENAND, Moscow (2019)

    Google Scholar 

  7. ORD Corpus of Russian Everyday Speech. https://ord.spbu.ru/. Accessed 01 Apr 2023

  8. Bogdanova-Beglarian, N.V., Blinova, O.V., Martynenko, G.Ya., Sherstinova, T.Yu.: Corpus of the Russian language of everyday communication “one speech day”: current state and prospects. In: Moldovan, A.M. (ed.) Proceedings of the V.V. Vinogradov Institute of the Russian Language, National Corpus of the Russian Language: Research and Development, Rep. ed. Issue of V.A. Plungyan, no. 21, pp. 101–110. IRYA RAN, Moscow (2019)

    Google Scholar 

  9. Russian Language of Everyday Communication: Features of Functioning in Different Social Groups. In: Bogdanova-BeglarIan, N.V. (ed.) Collective Monograph, 244p. LAIKA, St. Petersburg (2016)

    Google Scholar 

  10. Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: A Professional Framework for Multimodality Research. In: Proceedings of LREC 2006 Fifth International Conference on Language Resources and Evaluation, Genoa, pp. 1556–1559 (2006)

    Google Scholar 

  11. Sherstinova, T.: The structure of the ORD speech corpus of Russian everyday communication. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNAI, vol. 5729, pp. 258–265. Springer, Heidelberg (2009)

    Google Scholar 

  12. Anthony, L.: AntConc (Version 4.2.0) [Computer Software]. Waseda University, Tokyo (2023). http://www.laurenceanthony.net/software.

  13. Podlesskaya, V.I., Kibrik, A.A.: Correction of failures in oral spontaneous speech: the experience of corpus research. In: Kobozeva, I.M., Narinyani, A.S., Selegey, V.P. (eds.) Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2005”. RGGU, Moscow (2005). http://www.dialog-21.ru/media/2416/podlesskaya-kibrik.pdf.

  14. Sherstinova, T.Yu.: On repetitions of discursive words in everyday speech communication (based on the Russian language). In: Monakhov, S., Vasilyeva, I., Khokhlova, M. (eds.) Proceedings of the 45th International Philological Conference (IPC-2016). Advances in Social Science, Education and Humanities Research (ASSEHR), vol. 122, pp. 480–483. Atlantis Press (2016)

    Google Scholar 

  15. PM. In: Bogdanova-Beglarian, N.V. (ed.) Pragmatic Markers of Russian Everyday Speech: Dictionary-Monograph, 520p. Nestor-History, St. Petersburg (2021)

    Google Scholar 

  16. Akhmanova, O.S.: Dictionary of Linguistic Terms, p. 607. Soviet Encyclopedia, Moscow (1966)

    Google Scholar 

  17. Liu, D.: Phraseological Units in the Russian Everyday Speech: Typology and Functioning, 389p. Ph.D. Thesis, St. Petersburg (typescript) (2019)

    Google Scholar 

  18. Baranov, A.N., Plungyan, V.A., Rakhilina, E.V.: Guide to the Discursive Words of the Russian Language, p. 208. Pomovsky and Partners, Moscow (1993)

    Google Scholar 

  19. Sherstinova, T.Yu.: The most common words of everyday Russian speech (in terms of gender and depending on the conditions of communication. In: Selegey, V.P. (ed.) Computational Linguistics and Intelligent Technologies. Based on the Materials of the Annual International Conference “Dialogue”, vol. 15, no. 22, pp. 616–631. RGGU, Moscow (2016)

    Google Scholar 

  20. Sound Corpus as a Material for the Analysis of Russian Speech. In: Bogdanova-Beglarian, N.V. (ed.) Collective Monograph. Part 2. Theoretical and Practical Aspects of the Analysis. Volume 1. On Some Features of Oral Spontaneous Speech of Various Types. Sound Corpus as a Material for Teaching the Russian Language to Foreign Audiences, 396p. Faculty of Philology of St. Petersburg State University, St. Petersburg (2014)

    Google Scholar 

  21. Kilgarriff, A.: Simple maths for keywords. In: Proceedings of Corpus Linguistics Conference CL 2009. University of Liverpool, UK (2009). https://ucrel.lancs.ac.uk/publications/cl2009/.

  22. Tomasello, M.: Constructing a Language: A Usage-Based Theory of Language Acquisition, p. 388. Harvard University Press, Harvard (2003)

    Google Scholar 

Download references

Acknowledgements

The presented research was supported by the Russian Science Foundation, project No. 22-18-00189 “Structure and functionality of stable multiword units in Russian everyday speech”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria V. Khokhlova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khokhlova, M.V., Blinova, O.V., Bogdanova-Beglarian, N., Sherstinova, T. (2023). On the Most Frequent Sequences of Words in Russian Spoken Everyday Language (Bigrams and Trigrams): An Experience of Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics