3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head | SpringerLink
Skip to main content

3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

  • Conference paper
Intelligent Virtual Agents (IVA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4722))

Included in the following conference series:

  • 3043 Accesses

Abstract

The integration of virtual agents in real-time interactive virtual applications raises several challenges. The rendering of the movements of the virtual character in the virtual scene (locomotion of the character or rotation of its head) and the binaural rendering in 3D of the synthetic speech during these movements need to be spatially coordinated. Furthermore, the system must enable real-time adaptation of the agent’s expressive audiovisual signals to user’s on-going actions. In this paper, we describe a platform that we have designed to address these challenges as follows: (1) the modules enabling real time synthesis and spatial rendering of the synthetic speech, (2) the modules enabling 3D real time rendering of facial expressions using a GPU-based 3D graphic engine, and (3) the integration of these modules within an experimental platform using gesture as an input modality. A new model of phoneme-dependent human speech directivity patterns is included in the speech synthesis system, so that the agent can move in the virtual scene with realistic 3D visual and audio rendering. Future applications of this platform include perceptual studies about multimodal perception and interaction, expressive real time question and answer system and interactive arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bailly, G., Bérar, M., Elisei, F., Odisi, M.: Audiovisual Speech Synthesis. International Journal of Speech Technology. Special Issue on Speech Synthesis: Part II. 6(4) (2003)

    Google Scholar 

  2. Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. Models and Techniques in Computer Animation. AAAI/MIT Press, Cambridge (1993)

    Google Scholar 

  3. Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Computer Animation and Virtual Worlds 15(5) (2004)

    Google Scholar 

  4. Beskow, J.: Talking Heads - Models and Applications for Multimodal Speech Synthesis. PhD Thesis. Stockholm (2003), http://www.speech.kth.se/~beskow/thesis/index.html

  5. Reveret, L., Essa, I.: Visual Coding and Tracking of Speech Related Facial Motion, Hawai, USA

    Google Scholar 

  6. Bevacqua, E., Pelachaud, C.: Expressive audio-visual speech. Comp. Anim. Virtual Worlds, 15 (2004)

    Google Scholar 

  7. DeCarlo, D., Stone, M., Revilla, C., Venditti, J.: Specifying and Animating Facial Signals for Discourse in Embodied Conversational Agents. Computer Animation and Virtual Worlds 15(1) (2004)

    Google Scholar 

  8. Cohen, M., Beskow, J., Massaro, D.: Recent developments in facial animation: an inside view. In: AVSP 1998 (1998)

    Google Scholar 

  9. Ostermann, J.: Animation of synthetic faces in MPEG-4. In: Computer Animation 1998, Philadelphia, USA, pp. 49–51 (1998)

    Google Scholar 

  10. Schröder, M.: Speech and Emotion Research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis (2004)

    Google Scholar 

  11. Kob, M., Jers, H.: Directivity measurement of a singer. Journal of the Acoustical Society of America 105(2) (1999)

    Google Scholar 

  12. Prudon, R., d’Alessandro, C.: selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation. In: 4th ISCA/IEEE International Workshop on Speech Synthesis, IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  13. Katz, B., Prezat, F., d’Alessandro, C.: Human voice phoneme directivity pattern measurements. In: 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, Hawaï 3359 (2006)

    Google Scholar 

  14. D’Alessandro, C., D’Alessandro, N., Le Beux, S., Simko, J., Cetin, F., Pirker, H.: The speech conductor: gestural control of speech synthesis. In: eNTERFACE 2005. The SIMILAR NoE Summer Workshop on Multimodal Interfaces, Mons, Belgium, pp. 52–61 (2005)

    Google Scholar 

  15. Campbell, N.: Speech & Expression; the value of a longitudinal corpus. In: LREC 2004, pp. 183–186 (2004)

    Google Scholar 

  16. Martin, J.-C., Abrilian, S., Devillers, L.: Annotating Multimodal Behaviors Occurring during Non Basic Emotions. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 550–557. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. d’Alessandro, C., Rilliard, A., Le Beux, S.: Computerized chironomy: evaluation of hand-controlled Intonation reiteration. In: Interspeech 2007, Antwerp, Belgium (2007)

    Google Scholar 

  18. Le Beux, S., Rilliard, A.: A real-time intonation controller for expressive speech synthesis. In: SSW-6. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany (2007)

    Google Scholar 

  19. d’Alessandro, N., Doval, B., d’Alessandro, C., Beux, L., Woodruff, P., Fabre, Y., Dutoit, T.: RAMCESS: Realtime and Accurate Musical Control of Expression in Singing Synthesis. Journal on Multimodal User Interfaces 1(1) (2007)

    Google Scholar 

  20. Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation. The Standard, Implementation and Applications. John Wiley & Sons, LTD, Chichester (2002)

    Google Scholar 

  21. Tsapatsoulis, N., Raouzaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion Recognition and Synthesis based on MPEG-4 FAPs. MPEG-4 Facial Animation. John Wiley & Sons, Chichester (2002)

    Google Scholar 

  22. Balci, K.: MPEG-4 based open source toolkit for 3D Facial Animation. In: Working conference on Advanced visual interfaces, New York, NY, USA, pp. 399–402 (2004)

    Google Scholar 

  23. Kshirsagar, S., Garchery, S., Magnenat-Thalmann, N.: Feature Point Based Mesh Deformation Applied to MPEG-4 Facial Animation. In: IFIP Tc5/Wg5.10 Deform 2000 Workshop and Avatars 2000 Workshop on Deformable Avatars, pp. 24–34 (2000)

    Google Scholar 

  24. Beeson, C.: Animation in the ”Dawn” demo. GPU Gems, Programming Techniques, Tips, and Tricks for Real-Time Graphics. Wiley, Chichester, UK (2004)

    Google Scholar 

  25. Fagel, S.: Video-realistic Synthetic Speech with a Parametric Visual Speech Synthesizer. In: International Conference on Spoken Language Processing (INTERSPEECH/ICSLP 2004) (2004)

    Google Scholar 

  26. Jacquemin, C.: Pogany: A tangible cephalomorphic interface for expressive facial animation. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 558–569. Springer, Heidelberg (2007)

    Google Scholar 

  27. Benamara, F.: WebCoop: un système de Questions-Réponses coopératif sur le Web. PhD Thesis (2004)

    Google Scholar 

  28. Bosma, W.: Extending answers using discourse structure. In: Proceedings of RANLP workshop on Crossing Barriers in Text Summarization Research, Borovets, Bulgaria (2005)

    Google Scholar 

  29. Boves, L., den Os, E.: Interactivity and multimodality in the IMIX demonstrator. In: ICME 2005. IEEE conference on Multimedia and Expo (2005)

    Google Scholar 

  30. Rosset, S., Galibert, O., Illouz, G., Max, A.: Integrating spoken dialog and question answering: the Ritel project. In: Interspeech’06, Pittsburgh, USA (2006)

    Google Scholar 

  31. Marsi, E., van Rooden, F.: Expressing uncertainty with a Talking Head in a Multimodal Question-Answering System, Aberdeen, UK

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Catherine Pelachaud Jean-Claude Martin Elisabeth André Gérard Chollet Kostas Karpouzis Danielle Pelé

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martin, JC. et al. (2007). 3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head. In: Pelachaud, C., Martin, JC., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds) Intelligent Virtual Agents. IVA 2007. Lecture Notes in Computer Science(), vol 4722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74997-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74997-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74996-7

  • Online ISBN: 978-3-540-74997-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics