Abstract
The ORD corpus is a representative resource of everyday spoken Russian that contains about 1000 h of long-term audio recordings of daily communication made in real settings by research volunteers. ORD macro episodes are the large communication episodes united by setting/scene of communication, social roles of participants and their general activity. The paper describes annotation principles used for tagging of macro episodes, provides current statistics on communication situations presented in the corpus and reveals their most common types. Annotation of communication situations allows using these codes as filters for selection of audio data, therefore making it possible to study Russian everyday speech in different communication situations, to determine and describe various registers of spoken Russian. As an example, several high frequency word lists referring to different communication situations are compared. Annotation of macro episodes that is made for the ORD corpus is a prerequisite for its further pragmatic annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For more information on how to use frequency word list structure in study of language styles see [8].
References
Asinovsky, A., Bogdanova, N., Rusakova, M., Ryko, A., Stepanova, S., Sherstinova, T.: The ORD speech corpus of Russian everyday communication “One Speaker’s Day”: creation principles and annotation. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 250–257. Springer, Heidelberg (2009)
Sherstinova, T.: Communikativnyje macroepizody v korpuse povsednevnoj russkoj rechi “Odin rechevoj den\(\text{' }\,\)”: principy annotirovanija i rezul’taty statisticheskoj obrabotki. In: Zakharov, V., Mitrofanova, O., Khokhlova, M. (eds.) Proceeding of the International Conference “Corpus linguistics-2013”, pp. 449–456. St. Petersburg State University, St. Petersburg (2013)
Sherstinova, T.: Pragmaticheskoe annotirovanie konnunicativnykh jedinic v korpuse ORD: mikroepisody i rechevye akty. In: Proceeding of the International Conference “Corpus linguistics-2015”, pp. 436–446 (2015) (in Russian)
Potapova, R.K.: Rech: kommunikacija, informatika, kibernetika. URSS, Moscow (2003)
Sherstinova, T.: The structure of the ORD speech corpus of Russian everyday communication. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 258–265. Springer, Heidelberg (2009)
Chebanov, S., Martynenko, G.: Semiotika opisatel’nykh tekstov: tipologicheskij aspekt. St. Peterburg State University, St. Petersburg (1999)
Ottenheimer, H.J.: The Anthropology of Language: An Introduction to Linguistic Anthropology. Wadsworth Cenage Learning, Belmont, CA (2006)
Martynenko, G.: Osnovy stilemetrii. Leningrad State University, Leningrad (1988)
Acknowledgements
The annotation principles for macro episodes tagging have been developed with support of the Russian Foundation for Humanities (project # 12-04-12017, “Information System of Communication Scenarios of Russian Spontaneous Speech”). The presented statistics were obtained within the framework of the project “Everyday Russian Language in Different Social Groups” supported by the Russian Scientific Foundation, project # 14-18-02070.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sherstinova, T. (2015). Macro Episodes of Russian Everyday Oral Communication: Towards Pragmatic Annotation of the ORD Speech Corpus. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)