Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction

Prado, José Augusto; Simplício, Carlos; Lori, Nicolás F.; Dias, Jorge

doi:10.1007/s12369-011-0134-7

Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction

Published: 20 December 2011

Volume 4, pages 29–51, (2012)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

José Augusto Prado¹,
Carlos Simplício^1,2,
Nicolás F. Lori³ &
…
Jorge Dias¹

477 Accesses
Explore all metrics

Abstract

We propose an approach to analyze and synthesize a set of human facial and vocal expressions, and then use the classified expressions to decide the robot’s response in a human-robot-interaction. During a human-to-human conversation, a person senses the interlocutor’s face and voice, perceives her/his emotional expressions, and processes this information in order to decide which response to give. Moreover, observed emotions are taken into account and the response may be aggressive, funny (henceforth meaning humorous) or just neutral according to not only the observed emotions, but also the personality of the person. The purpose of our proposed structure is to endow robots with the capability to model human emotions, and thus several subproblems need to be solved: feature extraction, classification, decision and synthesis. In the proposed approach we integrate two classifiers for emotion recognition from audio and video, and then use a new method for fusion with the social behavior profile. To keep the person engaged in the interaction, after each iterance of analysis, the robot synthesizes human voice with both lips synchronization and facial expressions. The social behavior profile conducts the personality of the robot. The structure and work flow of the synthesis and decision are addressed, and the Bayesian networks are discussed. We also studied how to analyze and synthesize the emotion from the facial expression and vocal expression. A new probabilistic structure that enables a higher level of interaction between a human and a robot is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Emotion Analysis Based on Acoustic and Linguistic Features of the Voice

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Article 10 April 2015

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Gratch J, Marsella S, Petta P (2008) Modeling the cognitive antecedents and consequents of emotion. Cogn Syst 10(1):1–5
Article Google Scholar
Kidd CD, Breazeal C (2007) A robotic weight loss coach. In: Proceedings of the twenty-second conference on artificial intelligence, Menlo Park, CA. AAAI Press, Menlo Park
Google Scholar
Schroder M (2010) The semaine api: Towards a standards-based framework for building emotion-oriented systems. Adv Hum Comput Interact 2010:319406. doi:10.1155/2010/319406/2010/319406. 21 pp.
Google Scholar
Lee CM, Narayanan SS, Pieraccini R (2002) Classifying emotions in human-machine spoken dialogs. In: ICME
Google Scholar
Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: ICASSP IEEE
Google Scholar
Cowie R, Douglas-Cowie E, Karpouszis K, Caridakis G, Wallace M, Kollias S (2007) Recognition of emotional states in natural human-computer interaction. School of Psychology, Queen’s University
Darwin CR (1872) The expression of the emotions in man and animals, 1st edn. Murray, London
Book Google Scholar
Ekman P, Friesen WV, Hager JC (2002) Facial action coding system—the manual. A human face
Ekman P, Friesen W (2003) Unmasking the face: A guide to recognizing emotions from facial clues. Malor Books, Cambridge
Google Scholar
Ekman P, Rosenberg E (2004) What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS), 2nd edn. Oxford University Press, London
Google Scholar
Damasio A (2003) Looking for Spinoza. Harcourt Brace & Co., San Diego. ISBN:978-0-15-100557-4
Google Scholar
Damasio A (2000) The Feeling of what happens. Harcourt Brace & Co., San Diego. ISBN:978-0-15-601075-7
Google Scholar
Spinoza B (1677) Ethics
Google Scholar
Chaitin GJ (2010) Meta math!: the quest for omega. Pantheon, New York. The University of Michigan
Google Scholar
Lori N, Blin A (2010) Application of quantum Darwinism to cosmic inflation: an example of the limits imposed in Aristotelian logic by information-based approach to Godel’s incompleteness. Found Sci 15:199–211
Article MATH MathSciNet Google Scholar
Lori NF, Jesus P (2010) Matter and selfhood in Kant’s physics: a contemporary reappraisal. In: Balsemão Pires E, Nonnenmacher B, Büttner-von Stülpnagel S (eds) Relations of the self. Imprensa da Universidade de Coimbra, Coimbra, pp 207–226
Google Scholar
Levine PA (1997) Waking the tiger—healing trauma. North Atlantic Books, Berkeley
Google Scholar
Damasio A (2010) Self comes to mind: constructing the conscious brain. Pantheon, New York
Google Scholar
Evers K (2009) The empathetic xenophobe: a neurophilosophical view on the self. In: Centre for research ethics and bioethics, (CRB), Uppsala University. The text is adapted from Chap. 3 in Evers (2009): Neuroethique. Quand la matiere s eveille, Editions Odile Jacob, Paris, and was originally presented in an earlier version at College de France, Paris, 2006
Google Scholar
George S, Leroux P (2002) An approach to automatic analysis of learners social behavior during computer-mediated synchronous conversations. In: Cerri S, Gouarderes G, Paraguacu F (eds) Intelligent tutoring systems. Lecture notes in computer science, vol 2363. Springer, Berlin, pp 630–640 [Online]. Available: doi:10.1007/3-540-47987-2_64
Chapter Google Scholar
Kau AS, Tierney E, Bukelis I, Stump MH, Kates WR, Trescher WH, Kaufmann WE (2004) Social behaviour profile in young males with fragile x synfrome: characteristics and specificity. Am J Med Genet 126:9–17
Article Google Scholar
Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of oz studies: Why and how. In: Proceedings of the international workshop on intelligent user interfaces, Orlando, FL. ACM, New York, pp 193–200
Google Scholar
Klemmer S, Sinha A, Chen J, Landay J, Aboobaker N, Wang A (2000) Suede: a wizard of oz prototyping tool for speech user interfaces. In: CHI letters: Proceedings of the ACM symposium on user interface software and technology, vol 2, pp 1–10
Google Scholar
Ernst M, Bülthoff H (2004) Merging the senses into a robust percept. Trends Cogn Sci 8(4):162–169
Article Google Scholar
Sondhi M (1968) New methods of pitch extraction. IEEE Trans Audio Electroacoust 16:262–266
Article Google Scholar
Boersma P, Weenink D, Eletronic University of Amsterdam [Online]. Available: http://www.fon.hum.uva.nl/praat/
Invertions S (2010) Eletronic [Online]. Available: www.facegen.com
Intel (2006) Intel open source computer vision library, http://www.intel.com/technology/computing/opencv
Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE 91(9):1370–1390
Article Google Scholar
Paknikar G (2008) Facial image based expression classification system using committee neural networks. PhD dissertation, The Graduate Faculty of The University of Akron
Wuhan (2004) Facial expression recognition based on local binary patterns and coarse-to-fine classification. In: Fourth international conference on computer and information technology (CIT’04), vol 16
Google Scholar
Pantic M (2009) Facial expression recognition. In: Encyclopedia of biometrics, pp 400–406
Google Scholar
Nicolaou MA, Gunes H, Pantic M (2010) Audio-visual classification and fusion of spontaneous affective data in likelihood space. In: ICPR, pp 3695–3699
Google Scholar
Yang MH, Kriegman DJ, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24:34–58
Article Google Scholar
Viola P, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR
Google Scholar
Cohen I, Sebe N, Garg A, Lew M, Huang T (2002) Facial expression recognition from video sequences. In: Proc ICME, pp 121–124
Google Scholar
Sebe N, Lew M, Cohen I, Garg A, Huang T (2002) Emotion recognition using a Cauchy naive Bayes classifier. In: Proc ICPR, vol 1, pp 17–20
Google Scholar
Stock O, Strapparava C (2003) Getting serious about the development of computational humor. In: Proceedings of the 8th international joint conference on artificial intelligence (IJCAI), pp 59–64
Google Scholar
Stock O, Strapparava C (2005) The act of creating humorous acronyms. J Appl Artif Intell 19:137–151
Article Google Scholar
Ritchie G (1998) Prospects for computational humor. In: Proceedings of 7th IEEE international workshop on robot and human communication, pp 283–291
Google Scholar
Binsted K Pain H, Ritchie G (1997) Children’s evaluation of computer-generated punning riddles. Department of Artificial Intelligence, University of Edinburgh
Prado J, Lobo J, Dias J (2010) Sophie: social robotic platform for human interactive experimentation. In: 4th international conference on cognitive systems, COGSYS 2010, ETH Zurich, Switzerland
Google Scholar
Prado J, Santos L, Dias J (2009) Horopter based dynamic background segmentation applied to an interactive mobile robot. In: 14th international conference on advanced robotics, ICAR09, Munich, Germany
Google Scholar
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181 [Online]. Available: http://www.sciencedirect.com/science/article/B6V1C-4K1HCKM-1/2/3c1a10a68e9fe662b07918424294495a
Article Google Scholar
Lyons M, Budynck J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21:1357–1362
Article Google Scholar
Kanade T, Cohn V, Tian Y, (2000) Cohn-Kanade au-coded facial expression database [Online]. Available: http://vasc.ri.cmu.edu/idb/html/face/facial_expression/
Kamachi M, Lyons M, Gyoba J (1998) The Japanese female facial expression (jaffe) database [Online]. Available: http://www.kasrl.org/jaffe.html

Download references

Author information

Authors and Affiliations

Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
José Augusto Prado, Carlos Simplício & Jorge Dias
Institute Polytechnic of Leiria, Leiria, Portugal
Carlos Simplício
Institute of Biomedical Research in Light and Image (IBILI), Faculty of Medicine, University of Coimbra, Coimbra, Portugal
Nicolás F. Lori

Authors

José Augusto Prado
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Simplício
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás F. Lori
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Dias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Augusto Prado.

Additional information

The authors gratefully acknowledge support from Institute of Systems and Robotics at University of Coimbra (ISR-UC), Portuguese Foundation for Science and Technology (FCT) [SFRH/BD/60954/2009, Ciencia2007, PTDC/SAU-BEB/100147/2008], and Polytechnical Institute of Leiria (IPL).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prado, J.A., Simplício, C., Lori, N.F. et al. Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction. Int J of Soc Robotics 4, 29–51 (2012). https://doi.org/10.1007/s12369-011-0134-7

Download citation

Accepted: 19 November 2011
Published: 20 December 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s12369-011-0134-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Emotion Analysis Based on Acoustic and Linguistic Features of the Voice

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Emotion Analysis Based on Acoustic and Linguistic Features of the Voice

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation