Developing a Thai emotional speech corpus from Lakorn (EMOLA)

Kasuriya, Sawit; Theeramunkong, Thanaruk; Wutiwiwatchai, Chai; Sukhummek, Piyawat

doi:10.1007/s10579-018-9428-9

Developing a Thai emotional speech corpus from Lakorn (EMOLA)

Original Paper
Published: 28 November 2018

Volume 53, pages 17–55, (2019)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Sawit Kasuriya ORCID: orcid.org/0000-0003-3429-0440¹,
Thanaruk Theeramunkong^2,4,
Chai Wutiwiwatchai³ &
…
Piyawat Sukhummek¹

622 Accesses
3 Citations
Explore all metrics

Abstract

Advances in emotional speech recognition and synthesis essentially rely on the availability of annotated emotional speech corpora. As a low resource language, the Thai language critically lacks corpora of emotional speech, although a few corpora have been constructed for speech recognition and synthesis. This paper presents the design of a Thai emotional speech corpus (namely EMOLA), its construction and annotation process, and its analysis. In the corpus design, four basic types with twelve subtypes of emotions are defined with consideration of the Pleasure-Arousal-Dominance emotional state model. To construct the corpus, a series of Thai dramas (1397 min) were selected and its video clips of approximately 868 min were annotated. As a result, 8987 transcriptions (of conversation turns) were derived in total, with each transcription tagged as one basic type and a few subtypes. Finally, an analysis was conducted to describe the characteristics of this corpus in three sets of statistics: collection-level, annotator-oriented and actor-oriented statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

GAUDIE: Development, validation, and exploration of a naturalistic German AUDItory Emotional database

Article Open access 23 May 2023

References

Abrilian, S., Devillers, L., Buisine, S., & Martin, J.-C. (2005). EmoTV1: Annotation of real-life emotions for the specification of multimodal affective interfaces. In HCI International.
Arimoto, Y., Ohno, S., & Iida, H. (2008). Automatic emotional degree labeling for speakers’ anger utterance during natural Japanese Dialog. In LREC.
Arimoto, Y., Ohno, S., & Iida, H. (2011). Assessment of spontaneous emotional speech database toward emotion recognition: Intensity and similarity of perceived emotion from spontaneously expressed emotional speech. Acoustical Science and Technology, 32(1), 26–29.
Article Google Scholar
Asghar, D., Moloud, P., & Peymaneh, S. (2008). The pattern of Facial Expression among Iranian Children. In Proceedings of Measuring Behavior (pp. 172–173). Maastricht.
Bachorowski, J.-A. (1999). Vocal expression and perception of emotion. Current Directions in Psychological Science, 8(2), 53–57.
Article Google Scholar
Bann, E. Y., & Bryson, J. J. (2012). The conceptualisation of emotion qualia: Semantic clustering of emotional tweets. In Computational models of cognitive processes: Proceedings of the 13th neural computation and psychology workshop (pp. 249–263). World Scientific.
Bao, W., Li, Y. A., Yang, M., Li, H., Chao, L., & Tao, J. (2014). Building a Chinese Natural Emotional Audio-visual Database. In 12th international conference on signal processing (ICSP) (pp. 583–587).
Batliner, A., Fischer, K., Huber, R., Spilker, J., & Nöth, E. (2003). How to find trouble in communication. Speech Communication, 40(1), 117–143.
Article Google Scholar
Burkhardt, F. A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. Interspeech, 5, 1517–1520.
Google Scholar
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42, 335–359.
Article Google Scholar
Campbell, N. (2003). Databases of expressive speech. In Proceedings of oriental COCOSDA workshop.
Cichosz, J., & Slot, K. (2005). Low-Dimensional feature space derivation for emotion recognition. In Ninth European conference on speech communication and technology (pp. 477–480).
Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of affective computing and intelligent interaction.
Cole, R. (2005). The CU kids’ speech corpus. The Center for Spoken Language Research (CSLR). http://cslr.colorado.edu/.
Colombetti, G. (2009). From affect programs to dynamical discrete emotions. Philosophical Psychology, 22(4), 407–425.
Article Google Scholar
Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: an Italian emotional speech database. In LREC (pp. 3501–3504).
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are express in speech. Speech Communication, 40(1), 5–32.
Article Google Scholar
Crystal, D. (1975). The English tone of voice. London: Edward Arnold.
Google Scholar
Crystal, D. (1976). Prosodic systems and intonation in English. Cambridge: Cambridge University Press.
Google Scholar
Dadkhah, A., Pourmohammadi, M., & Shirinbayan, P. (2008). The pattern of Facial Expression among Iranian Children. In: Measuring behavior 2008. Psychonomic Soc Inc, 1710 Fortview Rd, Austin, TX 78704, USA.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.
Article Google Scholar
Douglas-Cowie, E., Cowie, R., & Schroder, M. (2000). A new emotion database: considerations, sources and scope. In ISCA tutorial and research workshop (ITRW) on speech and emotion.
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., et al. (2007). The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction (pp. 488–500).
Douglas-Cowie, E., Devillers, L., Martin, J.-C., Cowie, R., Savvidou, S., Abrilian, S., et al. (2005). Multimodal databases of everyday emotion: Facing up to complexity. In Ninth European conference on speech communication and technology.
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200.
Article Google Scholar
Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guide-lines for research and an integration of findings: Guidelines for research and an integration of findings. Oxford: Pergamon.
Google Scholar
Fersini, E., Messina, E., & Archetti, F. (2012). Emotional states in judicial courtrooms: an experimental investigation. Speech Communication, 54(1), 11–22.
Article Google Scholar
Fersini, E., Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. In International workshop on machine learning and data mining in pattern recognition (pp. 594–602). Springer.
Fu, L., Mao, X., & Chen, L. (2008). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008 (ICALIP2008) (pp. 61–65). IEEE.
Greasley, P., Setter, J., Waterman, M., Sherrard, C., Roach, P., Arnfield, S., et al. (1995). Representation of prosodic and emotional features in a spoken language database. In Proceedings of the XIIIth ICPhS.
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In IEEE international conference on multimedia and expo.
Haq, S., Jackson, P. J., & Edge, J. (2008). Audio-visual feature selection and reduction for emotion classification. In Proceedings of AVSP (pp. 185–190).
Havlena, W. J., & Holbrook, M. B. (1986). The varieties of consumption experience: Comparing two typologies of emotion in consumer behavior. Journal of Consumer Research, 13(3), 394–404.
Article Google Scholar
Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In LREC.
Iida, A., Campbell, N., Iga, S., Higuchi, F., & Yasumura, M. (1998). Acoustic nature and perceptual testing of corpora of emotional speech. In ICSLP.
Johnstone, T., & Scherer, K. R. (1999). The effects of emotions on voice quality. In Proceedings of the XIVth international congress of phonetic sciences (pp. 2029–2032). Citeseer.
Kaiser, S., & Scherer, K. R. (1998). Models of ‘normal’ emotions applied to facial and vocal expression in clinical disorders. In J. Flack, F. William & J. D. Laird (Eds.), Emotions in psychopathology: Theory and research (pp. 81–98).
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. In International conference on contemporary computing (pp. 485–492). Berlin: Springer.
Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2008). A real-world emotional speech corpus for modern greek. In LREC.
Kövecses, Z. (2003). Metaphor and emotion: Language, culture, and body in human feeling. Cambridge: Cambridge University Press.
Google Scholar
Laskowski, K., & Burger, S. (2006). Annotation and analysis of emotionally relevant behavior in the ISL meeting corpus. In LREC.
Li, A. (2015). Encoding and decoding of emotional speech: A cross-cultural and multimodal study between Chinese and Japanese. Berlin: Springer.
Book Google Scholar
Lian-hong, C., Dan-dan, C., & Rui, C. (2007). TH-CoSS,a Mandarin Speech Corpus for TTS. Journal of Chinese Information Processing, 02.
Lubis, N. A. (2014). Construction and analysis of Indonesian emotional speech corpus. In 17th oriental chapter of the international committee for the co-ordination and standardization of speech databases and assessment techniques (COCOSDA) (pp. 1–5).
Lubis, N., Gomez, R., Sakti, S., Nakamura, K., Yoshino, K., Nakamura, S., et al. (2016). Construction of Japanese audio-visual emotion database and its application in emotion recognition. In LREC.
Lubis, N., Sakti, S., Neubig, G., Toda, T., & Nakamura, S. (2015). Construction and analysis of social-affective interaction corpus in English and Indonesian. In Oriental COCOSDA held jointly with 2015 conference on asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 202–206).
Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The eNTERFACE’05 audio-visual emotion database. In Data engineering workshops, 2006 (p. 8). IEEE.
Mehrabian, A. (1995). Relationships among three general approaches to personality description. The Journal of Psychology, 129(5), 565–581.
Article Google Scholar
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.
Article Google Scholar
Mehrabian, A., & Russell, J. A. (1974). Approach to environmental psychology. Cambridge, MA: MIT Press.
Google Scholar
Mori, H., Satake, T., Nakamura, M., & Kasuya, H. (2008). UU database: A spoken dialogue corpus for studies on paralinguistic information in expressive conversation. In International conference on text, speech and dialogue (pp. 427–434). Berlin: Springer.
Mori, H., Satake, T., Nakamura, M., & Kasuya, H. (2011). Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics. Speech Communication, 53(1), 36–50.
Article Google Scholar
Moriyama, T., Mori, S., & Ozawa, S. (2009). A synthesis method of emotional speech using subspace constraints in prosody. Journal of Information Processing, 50(3), 1181–1191.
Google Scholar
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.
Article Google Scholar
O’Connor, J., & Arnold, G. (1973). Intonation of colloquial English. London: Longman.
Google Scholar
Plutchik, R. (1980). A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellerman (Eds.), Emotion: Theory, research, and experience (Vol. 1, pp. 3–31). New York: Academic Press.
Chapter Google Scholar
Plutchik, R. (1984). Emotions: A general psychoevolutionary theory. In Approaches to emotion (pp. 197–219).
Posner, J., Russell, J. A., & Petersona, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Dvelopmental and Psychopathology, 17(3), 715–734.
Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–8). IEEE.
Russell, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.
Article Google Scholar
Saratxaga, I., Navas, E., Hernaez, I., & Luengo, I. (2006). Designing and recording an emotional speech database for corpus based synthesis in Basque. In Proceedings of fifth international conference on language resources and evaluation (LREC) (pp. 2126–2129).
Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143.
Article Google Scholar
Scherer, K. R. (1995). Expression of emotion in voice and music. Journal of Voice, 9(3), 235–248.
Article Google Scholar
Scherer, K. R., & Tannenbaum, P. H. (1986). Emotional experiences in everyday life: A survey approach. Motivation and Emotion, 10(4), 295–314.
Article Google Scholar
Schlosberg, H. (1954). Three dimensions of emotion. Psychological Review, 61, 81–88.
Article Google Scholar
Schubiger, M. (1958). English intonation, its form and function. Tübingen: M. Niemeyer Verlag.
Google Scholar
Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2012). The Belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41.
Article Google Scholar
Stein, N. L., & Oatley, K. (1992). Basic emotions: Theory and measurement. Cognition and Emotion, 6(3–4), 161–168.
Article Google Scholar
Trong, K. P., Neerincx, M. A., & Van Leeuwen, D. A. (2008). Measuring spontaneous vocal and facial emotion expressions in real world environments. In Proceedings of measuring behavior 2008 (pp. 170–171). Maastricht.
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.
Article Google Scholar
Wang, X., Li, A., & Tao, J. (2007). An expressive speech corpus of standard Chinese. In O-COCOSDA2007. Hanoi, Vietnam.
Watson, D., & Tellegan, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98, 219–235.
Article Google Scholar
Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in Mandarin for emotion analysis and affective speaker recognition. In 2006 IEEE Odyssey-the speaker and language recognition workshop (pp. 1–5).
Wundt, W. M. (1897). Outlines of psychology. In http://psychclassics.asu.edu/index.htm, Classics in the history of psychology. Toronto: York University 2010.
Yamagishi, J., Onishi, K., Masuko, T., & Kobayashi, T. (2005). Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE TRANSACTIONS on Information and Systems, 88(3), 502–509.
Article Google Scholar
Zhang, S., Ching, P., & Kong, F. (2006). Acoustic analysis of emotional speech in Mandarin Chinese. In International symposium on chinese spoken language processing (pp. 57–66).
Zhang, S., Xu, Y., Jia, J., & Cai, L. (2008). Analysis and modeling of affective audio visual speech based on PAD emotion space. In 6th international symposium on Chinese spoken language processing (pp. 1–4). Kunming, China.
Zovato, E., Sandri, S., Quazza, S., & Badino, L. (2004). Prosodic analysis of a multi-style corpus in the perspective of emotional speech synthesis. In ICSLP 2004 (Vol. 2, pp. 1453–1457). Prentice Hall.

Download references

Acknowledgements

This work was partially supported by a SIIT graduate student scholarship, the Center of Excellence in Intelligent Informatics, Speech and Language Technology and Service Innovation (CILS), Thammasat University, the Center of Excellence in Intelligent Informatics and Service Innovation (IISI), SIIT, Thammasat University, and the Thailand Research Fund under the Grant Number RTA6080013.

Author information

Authors and Affiliations

School of Information, Computer and Communication Technologies, Sirindhorn International Institute of Technology, Thammasat University, 99 Phaholyothin Road, Khlong Luang, Pathumthani, 12120, Thailand
Sawit Kasuriya & Piyawat Sukhummek
School of Information, Computer and Communication Technologies, Sirindhorn International Institute of Technology, Thammasat University, 99 Moo 18 Paholyothin Road, Klong Luang, Rangsit, Pathumthani, 12121, Thailand
Thanaruk Theeramunkong
National Electronics and Computer Technology Center (NECTEC), 112 Phahonyothin Road, Khlong Nueng, Khlong Luang District, Pathumthani, 12120, Thailand
Chai Wutiwiwatchai
Academy of Science, Royal Society of Thailand, Sanam Sueapa, Khet Dusit, Bangkok, 10300, Thailand
Thanaruk Theeramunkong

Authors

Sawit Kasuriya
View author publications
You can also search for this author in PubMed Google Scholar
Thanaruk Theeramunkong
View author publications
You can also search for this author in PubMed Google Scholar
Chai Wutiwiwatchai
View author publications
You can also search for this author in PubMed Google Scholar
Piyawat Sukhummek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanaruk Theeramunkong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kasuriya, S., Theeramunkong, T., Wutiwiwatchai, C. et al. Developing a Thai emotional speech corpus from Lakorn (EMOLA). Lang Resources & Evaluation 53, 17–55 (2019). https://doi.org/10.1007/s10579-018-9428-9

Download citation

Published: 28 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10579-018-9428-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Developing a Thai emotional speech corpus from Lakorn (EMOLA)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

GAUDIE: Development, validation, and exploration of a naturalistic German AUDItory Emotional database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Developing a Thai emotional speech corpus from Lakorn (EMOLA)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

GAUDIE: Development, validation, and exploration of a naturalistic German AUDItory Emotional database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation