3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

Martin, Jean-Claude; d’Alessandro, Christophe; Jacquemin, Christian; Katz, Brian; Max, Aurélien; Pointal, Laurent; Rilliard, Albert

doi:10.1007/978-3-540-74997-4_3

Jean-Claude Martin¹,
Christophe d’Alessandro¹,
Christian Jacquemin¹,
Brian Katz¹,
Aurélien Max¹,
Laurent Pointal¹ &
…
Albert Rilliard¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4722))

Included in the following conference series:

International Workshop on Intelligent Virtual Agents

3043 Accesses

Abstract

The integration of virtual agents in real-time interactive virtual applications raises several challenges. The rendering of the movements of the virtual character in the virtual scene (locomotion of the character or rotation of its head) and the binaural rendering in 3D of the synthetic speech during these movements need to be spatially coordinated. Furthermore, the system must enable real-time adaptation of the agent’s expressive audiovisual signals to user’s on-going actions. In this paper, we describe a platform that we have designed to address these challenges as follows: (1) the modules enabling real time synthesis and spatial rendering of the synthetic speech, (2) the modules enabling 3D real time rendering of facial expressions using a GPU-based 3D graphic engine, and (3) the integration of these modules within an experimental platform using gesture as an input modality. A new model of phoneme-dependent human speech directivity patterns is included in the speech synthesis system, so that the agent can move in the virtual scene with realistic 3D visual and audio rendering. Future applications of this platform include perceptual studies about multimodal perception and interaction, expressive real time question and answer system and interactive arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Article Open access 08 September 2015

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

References

Bailly, G., Bérar, M., Elisei, F., Odisi, M.: Audiovisual Speech Synthesis. International Journal of Speech Technology. Special Issue on Speech Synthesis: Part II. 6(4) (2003)
Google Scholar
Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. Models and Techniques in Computer Animation. AAAI/MIT Press, Cambridge (1993)
Google Scholar
Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Computer Animation and Virtual Worlds 15(5) (2004)
Google Scholar
Beskow, J.: Talking Heads - Models and Applications for Multimodal Speech Synthesis. PhD Thesis. Stockholm (2003), http://www.speech.kth.se/~beskow/thesis/index.html
Reveret, L., Essa, I.: Visual Coding and Tracking of Speech Related Facial Motion, Hawai, USA
Google Scholar
Bevacqua, E., Pelachaud, C.: Expressive audio-visual speech. Comp. Anim. Virtual Worlds, 15 (2004)
Google Scholar
DeCarlo, D., Stone, M., Revilla, C., Venditti, J.: Specifying and Animating Facial Signals for Discourse in Embodied Conversational Agents. Computer Animation and Virtual Worlds 15(1) (2004)
Google Scholar
Cohen, M., Beskow, J., Massaro, D.: Recent developments in facial animation: an inside view. In: AVSP 1998 (1998)
Google Scholar
Ostermann, J.: Animation of synthetic faces in MPEG-4. In: Computer Animation 1998, Philadelphia, USA, pp. 49–51 (1998)
Google Scholar
Schröder, M.: Speech and Emotion Research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis (2004)
Google Scholar
Kob, M., Jers, H.: Directivity measurement of a singer. Journal of the Acoustical Society of America 105(2) (1999)
Google Scholar
Prudon, R., d’Alessandro, C.: selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation. In: 4th ISCA/IEEE International Workshop on Speech Synthesis, IEEE Computer Society Press, Los Alamitos (2001)
Google Scholar
Katz, B., Prezat, F., d’Alessandro, C.: Human voice phoneme directivity pattern measurements. In: 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, Hawaï 3359 (2006)
Google Scholar
D’Alessandro, C., D’Alessandro, N., Le Beux, S., Simko, J., Cetin, F., Pirker, H.: The speech conductor: gestural control of speech synthesis. In: eNTERFACE 2005. The SIMILAR NoE Summer Workshop on Multimodal Interfaces, Mons, Belgium, pp. 52–61 (2005)
Google Scholar
Campbell, N.: Speech & Expression; the value of a longitudinal corpus. In: LREC 2004, pp. 183–186 (2004)
Google Scholar
Martin, J.-C., Abrilian, S., Devillers, L.: Annotating Multimodal Behaviors Occurring during Non Basic Emotions. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 550–557. Springer, Heidelberg (2005)
Chapter Google Scholar
d’Alessandro, C., Rilliard, A., Le Beux, S.: Computerized chironomy: evaluation of hand-controlled Intonation reiteration. In: Interspeech 2007, Antwerp, Belgium (2007)
Google Scholar
Le Beux, S., Rilliard, A.: A real-time intonation controller for expressive speech synthesis. In: SSW-6. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany (2007)
Google Scholar
d’Alessandro, N., Doval, B., d’Alessandro, C., Beux, L., Woodruff, P., Fabre, Y., Dutoit, T.: RAMCESS: Realtime and Accurate Musical Control of Expression in Singing Synthesis. Journal on Multimodal User Interfaces 1(1) (2007)
Google Scholar
Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation. The Standard, Implementation and Applications. John Wiley & Sons, LTD, Chichester (2002)
Google Scholar
Tsapatsoulis, N., Raouzaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion Recognition and Synthesis based on MPEG-4 FAPs. MPEG-4 Facial Animation. John Wiley & Sons, Chichester (2002)
Google Scholar
Balci, K.: MPEG-4 based open source toolkit for 3D Facial Animation. In: Working conference on Advanced visual interfaces, New York, NY, USA, pp. 399–402 (2004)
Google Scholar
Kshirsagar, S., Garchery, S., Magnenat-Thalmann, N.: Feature Point Based Mesh Deformation Applied to MPEG-4 Facial Animation. In: IFIP Tc5/Wg5.10 Deform 2000 Workshop and Avatars 2000 Workshop on Deformable Avatars, pp. 24–34 (2000)
Google Scholar
Beeson, C.: Animation in the ”Dawn” demo. GPU Gems, Programming Techniques, Tips, and Tricks for Real-Time Graphics. Wiley, Chichester, UK (2004)
Google Scholar
Fagel, S.: Video-realistic Synthetic Speech with a Parametric Visual Speech Synthesizer. In: International Conference on Spoken Language Processing (INTERSPEECH/ICSLP 2004) (2004)
Google Scholar
Jacquemin, C.: Pogany: A tangible cephalomorphic interface for expressive facial animation. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 558–569. Springer, Heidelberg (2007)
Google Scholar
Benamara, F.: WebCoop: un système de Questions-Réponses coopératif sur le Web. PhD Thesis (2004)
Google Scholar
Bosma, W.: Extending answers using discourse structure. In: Proceedings of RANLP workshop on Crossing Barriers in Text Summarization Research, Borovets, Bulgaria (2005)
Google Scholar
Boves, L., den Os, E.: Interactivity and multimodality in the IMIX demonstrator. In: ICME 2005. IEEE conference on Multimedia and Expo (2005)
Google Scholar
Rosset, S., Galibert, O., Illouz, G., Max, A.: Integrating spoken dialog and question answering: the Ritel project. In: Interspeech’06, Pittsburgh, USA (2006)
Google Scholar
Marsi, E., van Rooden, F.: Expressing uncertainty with a Talking Head in a Multimodal Question-Answering System, Aberdeen, UK
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France
Jean-Claude Martin, Christophe d’Alessandro, Christian Jacquemin, Brian Katz, Aurélien Max, Laurent Pointal & Albert Rilliard

Authors

Jean-Claude Martin
View author publications
You can also search for this author in PubMed Google Scholar
Christophe d’Alessandro
View author publications
You can also search for this author in PubMed Google Scholar
Christian Jacquemin
View author publications
You can also search for this author in PubMed Google Scholar
Brian Katz
View author publications
You can also search for this author in PubMed Google Scholar
Aurélien Max
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Pointal
View author publications
You can also search for this author in PubMed Google Scholar
Albert Rilliard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Catherine Pelachaud Jean-Claude Martin Elisabeth André Gérard Chollet Kostas Karpouzis Danielle Pelé

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martin, JC. et al. (2007). 3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head. In: Pelachaud, C., Martin, JC., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds) Intelligent Virtual Agents. IVA 2007. Lecture Notes in Computer Science(), vol 4722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74997-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-74997-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74996-7
Online ISBN: 978-3-540-74997-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

Abstract

Access this chapter

Preview

Similar content being viewed by others

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

Abstract

Access this chapter

Preview

Similar content being viewed by others

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation