A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks

Sato, Kazuki; Nose, Takashi; Ito, Akira; Chiba, Yuya; Ito, Akinori; Shinozaki, Takahiro

doi:10.1007/978-3-319-63859-1_15

Kazuki Sato⁷,
Takashi Nose⁷,
Akira Ito⁷,
Yuya Chiba⁷,
Akinori Ito⁷ &
…
Takahiro Shinozaki⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 82))

Included in the following conference series:

International Conference on Intelligent Information Hiding and Multimedia Signal Processing

1223 Accesses
1 Citations

Abstract

This paper proposes a technique for generating a 2D photo-realistic facial animation from an input text. The technique is based on the mapping from 3D facial feature points with deep neural networks (DNNs). Our previous approach was based only on a 2D space using hidden Markov models (HMMs) and DNNs. However, this approach has a disadvantage that generated 2D facial pixels are sensitive to the rotation of the face in the training data. In this study, we alleviate the problem using 3D facial feature points obtained by Kinect. The information of the face shape and color is parameterized by the 3D facial feature points. The relation between the labels from texts and face-model parameters are modeled by DNNs in the model training. As a preliminary experiment, we show that the proposed technique can generate the 2D facial animation from arbitrary input texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 21449; Price includes VAT (Japan)

Hardcover Book: JPY 21449; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

Learning to Generate Customized Dynamic 3D Facial Expressions

Facial Animation Based on 2D Shape Regression

References

Anderson, R., Stenger, B., Wan, V., Cipolla, R.: Expressive visual text-to-speech using active appearance models. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 3382–3389 (2013)
Google Scholar
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Robotics-DL tentative, pp. 586–606. International Society for Optics and Photonics (1992)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Gales, M.J.: Cluster adaptive training of hidden Markov models. IEEE Trans. Speech Audio Process. 8(4), 417–428 (2000)
Article Google Scholar
Kinect for Windows SDK 2.0 Programming Guide: High definition face tracking. https://msdn.microsoft.com/en-us/library/dn785525.aspx
Kurematsu, A., Takeda, K., Sagisaka, Y., Katagiri, S., Kuwabara, H., Shikano, K.: ATR Japanese speech database as a tool of speech recognition and synthesis. Speech Commun. 9(4), 357–363 (1990)
Article Google Scholar
Nose, T., Tachibana, M., Kobayashi, T.: HMM-based style control for expressive speech synthesis with arbitrary speaker’s voice using model adaptation. IEICE Trans. Inf. Syst. E92–D(3), 489–497 (2009)
Article Google Scholar
Nose, T., Yamagishi, J., Masuko, T., Kobayashi, T.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. Syst. E90–D(9), 1406–1413 (2007)
Article Google Scholar
Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)
Article Google Scholar
Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: Proceedings of the INTERSPEECH, pp. 25–28 (2000)
Google Scholar
Sato, K., Nose, T., Ito, A.: Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit. In: Proceeding of the Twelfth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 29–36 (2017)
Google Scholar
Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the ICASSP, pp. 7962–7966 (2013)
Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Article Google Scholar

Download references

Acknowledgment

Part of this work was supported by JSPS KAKENHI Grant Number JP15H02720 and JP26280055.

Author information

Authors and Affiliations

Graduate School of Engineering, Tohoku University, 6-6-05 Aramaki Aza Aoba, Aoba-ku, Sendai, Miyagi, 980-8579, Japan
Kazuki Sato, Takashi Nose, Akira Ito, Yuya Chiba & Akinori Ito
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama, 226-8502, Japan
Takahiro Shinozaki

Authors

Kazuki Sato
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar
Akira Ito
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Shinozaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuki Sato .

Editor information

Editors and Affiliations

Fujian Provincial Key Lab of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Swinburne University of Technology, Hawthorn, Victoria, Australia
Pei-Wei Tsai
Universiti Teknologi Petronas, Teronoh, Malaysia
Junzo Watada
University of Canberra, Bruce, Aust Capital Terr, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sato, K., Nose, T., Ito, A., Chiba, Y., Ito, A., Shinozaki, T. (2018). A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-63859-1_15
Published: 18 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63858-4
Online ISBN: 978-3-319-63859-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

Learning to Generate Customized Dynamic 3D Facial Expressions

Facial Animation Based on 2D Shape Regression

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Study on 2D Photo-Realistic Facial Animation Generation Using 3D Facial Feature Points and Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

Learning to Generate Customized Dynamic 3D Facial Expressions

Facial Animation Based on 2D Shape Regression

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation