Abstract
The article provides a description of the most frequent bigrams and trigrams obtained using the n-gram analysis technique on a representative sample of Russian spoken language. N-gram analysis allows identifying frequent lists of sequences consisting of n graphical words, which is important for describing corpus material in various theoretical and applied aspects. The source data for applying this technique was a sample of 388 episodes of everyday speech communication from the ORD corpus (about 110 hours of audio). The results of the n-gram analysis in the form of frequency lists of word sequences allow constructing a typology of the most common bigrams and trigrams in Russian oral communication and lead the study equally to the levels of grammar, pragmatics, lexicon, and phraseology. The list of the most frequent bigrams and trigrams contains grammatical structures (U TEBYA, YA NE PONIMAYU, MNE KAZHETSYA), idioms (in a broad sense of the term) (VSYO RAVNO, TO ZHE SAMOE), introductory units (TAK SKAZAT’, S DRUGOY STORONY), as well as a number of sequences typical only for oral speech, such as one-word pragmatic markers (NU VOT, KAK BY, NU V OBSHCEM), amplifications (DA-DA, TAK-TAK-TAK), and hesitations-vocalizations (E-E, M-M-M). The obtained frequency lists can be useful for solving many modern applied natural language processing tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
MAS. In: Evgenieva, A.P. (ed.) Dictionary of the Russian Language in Four Volumes, vol. IV. S – Ya, 790p. Russian Language, Moscow (1988)
MAS. In: Evgenieva, A.P. (ed.) Dictionary of the Russian Language in Four Volumes, vol. II. K – O, 736p. Russian Language, Moscow (1986)
Fillmore, Ch.J., Kay, P., O’Connor, M.C.: Regularity and idiomaticity in grammatical constructions: the case of let alone. Language 64(3), 501–538 (1988)
Hilpert, M.: Construction Grammar and its Application to English, p. 220. Edinburgh University Press, Edinburgh (2014)
Dobrovolsky, D.O.: Grammar of constructions and phraseology. Quest. Linguist. 3, 7–21 (2016)
Sherstinova, T.: The syntax of everyday russian speech through the prism of N-gram analysis. In: Glazunova, O.I., Rogova, K.A. (eds.) Russian Grammar: Structural Organization of Language and Processes of Language Functioning, pp. 454–466. LENAND, Moscow (2019)
ORD Corpus of Russian Everyday Speech. https://ord.spbu.ru/. Accessed 01 Apr 2023
Bogdanova-Beglarian, N.V., Blinova, O.V., Martynenko, G.Ya., Sherstinova, T.Yu.: Corpus of the Russian language of everyday communication “one speech day”: current state and prospects. In: Moldovan, A.M. (ed.) Proceedings of the V.V. Vinogradov Institute of the Russian Language, National Corpus of the Russian Language: Research and Development, Rep. ed. Issue of V.A. Plungyan, no. 21, pp. 101–110. IRYA RAN, Moscow (2019)
Russian Language of Everyday Communication: Features of Functioning in Different Social Groups. In: Bogdanova-BeglarIan, N.V. (ed.) Collective Monograph, 244p. LAIKA, St. Petersburg (2016)
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: A Professional Framework for Multimodality Research. In: Proceedings of LREC 2006 Fifth International Conference on Language Resources and Evaluation, Genoa, pp. 1556–1559 (2006)
Sherstinova, T.: The structure of the ORD speech corpus of Russian everyday communication. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNAI, vol. 5729, pp. 258–265. Springer, Heidelberg (2009)
Anthony, L.: AntConc (Version 4.2.0) [Computer Software]. Waseda University, Tokyo (2023). http://www.laurenceanthony.net/software.
Podlesskaya, V.I., Kibrik, A.A.: Correction of failures in oral spontaneous speech: the experience of corpus research. In: Kobozeva, I.M., Narinyani, A.S., Selegey, V.P. (eds.) Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2005”. RGGU, Moscow (2005). http://www.dialog-21.ru/media/2416/podlesskaya-kibrik.pdf.
Sherstinova, T.Yu.: On repetitions of discursive words in everyday speech communication (based on the Russian language). In: Monakhov, S., Vasilyeva, I., Khokhlova, M. (eds.) Proceedings of the 45th International Philological Conference (IPC-2016). Advances in Social Science, Education and Humanities Research (ASSEHR), vol. 122, pp. 480–483. Atlantis Press (2016)
PM. In: Bogdanova-Beglarian, N.V. (ed.) Pragmatic Markers of Russian Everyday Speech: Dictionary-Monograph, 520p. Nestor-History, St. Petersburg (2021)
Akhmanova, O.S.: Dictionary of Linguistic Terms, p. 607. Soviet Encyclopedia, Moscow (1966)
Liu, D.: Phraseological Units in the Russian Everyday Speech: Typology and Functioning, 389p. Ph.D. Thesis, St. Petersburg (typescript) (2019)
Baranov, A.N., Plungyan, V.A., Rakhilina, E.V.: Guide to the Discursive Words of the Russian Language, p. 208. Pomovsky and Partners, Moscow (1993)
Sherstinova, T.Yu.: The most common words of everyday Russian speech (in terms of gender and depending on the conditions of communication. In: Selegey, V.P. (ed.) Computational Linguistics and Intelligent Technologies. Based on the Materials of the Annual International Conference “Dialogue”, vol. 15, no. 22, pp. 616–631. RGGU, Moscow (2016)
Sound Corpus as a Material for the Analysis of Russian Speech. In: Bogdanova-Beglarian, N.V. (ed.) Collective Monograph. Part 2. Theoretical and Practical Aspects of the Analysis. Volume 1. On Some Features of Oral Spontaneous Speech of Various Types. Sound Corpus as a Material for Teaching the Russian Language to Foreign Audiences, 396p. Faculty of Philology of St. Petersburg State University, St. Petersburg (2014)
Kilgarriff, A.: Simple maths for keywords. In: Proceedings of Corpus Linguistics Conference CL 2009. University of Liverpool, UK (2009). https://ucrel.lancs.ac.uk/publications/cl2009/.
Tomasello, M.: Constructing a Language: A Usage-Based Theory of Language Acquisition, p. 388. Harvard University Press, Harvard (2003)
Acknowledgements
The presented research was supported by the Russian Science Foundation, project No. 22-18-00189 “Structure and functionality of stable multiword units in Russian everyday speech”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khokhlova, M.V., Blinova, O.V., Bogdanova-Beglarian, N., Sherstinova, T. (2023). On the Most Frequent Sequences of Words in Russian Spoken Everyday Language (Bigrams and Trigrams): An Experience of Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)