A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks | SpringerLink
Skip to main content

A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks

  • Conference paper
  • First Online:
Advances in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2017)

Abstract

This paper proposes a technique for generating a 2D photo-realistic facial animation from an input text. The technique is based on the mapping from 3D facial feature points with deep neural networks (DNNs). Our previous approach was based only on a 2D space using hidden Markov models (HMMs) and DNNs. However, this approach has a disadvantage that generated 2D facial pixels are sensitive to the rotation of the face in the training data. In this study, we alleviate the problem using 3D facial feature points obtained by Kinect. The information of the face shape and color is parameterized by the 3D facial feature points. The relation between the labels from texts and face-model parameters are modeled by DNNs in the model training. As a preliminary experiment, we show that the proposed technique can generate the 2D facial animation from arbitrary input texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 21449
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anderson, R., Stenger, B., Wan, V., Cipolla, R.: Expressive visual text-to-speech using active appearance models. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 3382–3389 (2013)

    Google Scholar 

  2. Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Robotics-DL tentative, pp. 586–606. International Society for Optics and Photonics (1992)

    Google Scholar 

  3. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

    Article  Google Scholar 

  4. Gales, M.J.: Cluster adaptive training of hidden Markov models. IEEE Trans. Speech Audio Process. 8(4), 417–428 (2000)

    Article  Google Scholar 

  5. Kinect for Windows SDK 2.0 Programming Guide: High definition face tracking. https://msdn.microsoft.com/en-us/library/dn785525.aspx

  6. Kurematsu, A., Takeda, K., Sagisaka, Y., Katagiri, S., Kuwabara, H., Shikano, K.: ATR Japanese speech database as a tool of speech recognition and synthesis. Speech Commun. 9(4), 357–363 (1990)

    Article  Google Scholar 

  7. Nose, T., Tachibana, M., Kobayashi, T.: HMM-based style control for expressive speech synthesis with arbitrary speaker’s voice using model adaptation. IEICE Trans. Inf. Syst. E92–D(3), 489–497 (2009)

    Article  Google Scholar 

  8. Nose, T., Yamagishi, J., Masuko, T., Kobayashi, T.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. Syst. E90–D(9), 1406–1413 (2007)

    Article  Google Scholar 

  9. Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)

    Article  Google Scholar 

  10. Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: Proceedings of the INTERSPEECH, pp. 25–28 (2000)

    Google Scholar 

  11. Sato, K., Nose, T., Ito, A.: Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit. In: Proceeding of the Twelfth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 29–36 (2017)

    Google Scholar 

  12. Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the ICASSP, pp. 7962–7966 (2013)

    Google Scholar 

  13. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)

    Article  Google Scholar 

Download references

Acknowledgment

Part of this work was supported by JSPS KAKENHI Grant Number JP15H02720 and JP26280055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuki Sato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Sato, K., Nose, T., Ito, A., Chiba, Y., Ito, A., Shinozaki, T. (2018). A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63859-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63858-4

  • Online ISBN: 978-3-319-63859-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics