A novel attribute-based generation architecture for facial image editing | Multimedia Tools and Applications
Skip to main content

A novel attribute-based generation architecture for facial image editing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Facial image editing is one of the hot topics in recent years due to the great development in deep generative models. Current models are either based on variational autoencoder(VAE) or generative adversarial network(GAN). However, VAE-based models usually generate oversmooth images, while GAN-based-only models cannot randomly generate images with specific attributes and suffer from unstable training. To overcome these limitations, a novel attribute-disentangled generative model based on the combination of VAE and GAN is proposed for facial image editing by manipulating specific attributes and synthesizing facial images conditioned on the specified attributes. In the encoder-decoder architecture of the proposed model, the latent space mapped by the encoder is split into two subspaces: the attribute-irrelevant space and the attribute-relevant space. The attribute-irrelevant space characterizes the factors such as identity, position, background etc, which are expected to be kept unchanged during the editing. The attribute-relevant space is used to represent the attributes such as hair color, gender, age etc that we want to manipulate. We use the adversarial training scheme to train the model, where images generated by the proposed model are re-feeded to the encoder to ensure their distribution is close to the real data distribution in the attribute-irrelevant subspace while they can be correctly classified in the attribute-relevant subspace, without explicitly giving the discriminators such as in GANs. To evaluate the performance of the proposed model, quantitative and qualitative comparisons between the proposed model and other state-of-the-art algorithms were tesed on the CelebA dataset. The evaluation results show that the proposed model can effectively generate high-quality facial images with diverse specified attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Akhtar Z, Dasgupta D, Banerjee B (2019) Face authenticity: An overview of face manipulation generation, detection and recognition. In: Nutan College of Engineering & Research, International Conference on Communication and Information Processing (ICCIP)

  2. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 214–223

  3. Bao J, Chen D, Wen F, Li H, Hua G (2017) Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2745–2754

  4. Bengio Y, éric ThibodeauLaufer, Alain G, Yosinski J (2013) Deep generative stochastic networks trainable by backprop. Computer Science 2:226–234

    Google Scholar 

  5. Brock A, Donahue J, Simonyan K (2019) Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations. https://openreview.net/forum?id=B1xsqj09Fm

  6. Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:1609.07093

  7. Charlier P, Froesch P, Huynh-Charlier I, Fort A, Hurel A, Jullien F (2014) Use of 3d surface scanning to match facial shapes against altered exhumed remains in a context of forensic individual identification. Forensic Science, Medicine, and Pathology 10(4):654–661

    Article  Google Scholar 

  8. Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks. arXiv:1612.02136

  9. Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797

  10. Dai B, Wipf D (2019) Diagnosing and enhancing VAE models. In: International Conference on Learning Representations. https://openreview.net/forum?id=B1e0X3C9tQ

  11. Donahue J, Krähenbühl P, Darrell T (2016) Adversarial feature learning. arXiv:1605.09782

  12. Dumoulin V, Belghazi I, Poole B, Lamb A, Arjovsky M, Mastropietro O, Courville A (2016) Adversarially learned inference. arXiv:1606.00704

  13. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems , pp 2672–2680

  14. Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Draw: a recurrent neural network for image generation

  15. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777

  16. Guo Q, Zhu C, Xia Z, Wang Z, Liu Y (2017) Attribute-controlled face photo synthesis from simple line drawing. arXiv:1702.02805

  17. He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28 (11):5464–5478

    Article  MathSciNet  Google Scholar 

  18. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp 6626–6637

  19. Huang H, He R, Sun Z, Tan T, et al. (2018) Introvae: Introspective variational autoencoders for photographic image synthesis. In: Advances in neural information processing systems, pp 52–63

  20. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition , pp 1125–1134

  21. Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations. https://openreview.net/forum?id=Hk99zCeAb

  22. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp 4401–4410

  23. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2019) Analyzing and improving the image quality of stylegan. arXiv:1912.04958

  24. Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks, JMLR. org. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 1857–1865

  25. Kim T, Kim B, Cha M, Kim J (2017) Unsupervised visual attribute transfer with reconfigurable generative adversarial networks Computer Vision and Pattern Recognition

  26. Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization, international conference on learning representations

  27. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: international conference on learning representations

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  29. Lample G, Zeghidour N, Usunier N, Bordes A, Denoyer L, Ranzato M (2017) Fader networks: Manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems , pp 5967–5976

  30. Larsen A BL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric International Conference on Machine Learning, pp 1558–1566

  31. Li M, Zuo W, Zhang D (2016) Deep identity-aware transfer of facial attributes. arXiv:1610.05586

  32. Liu M, Breuel TM, Kautz J (2017) Unsupervised image-to-image translation networks

  33. Liu M, Tuzel O (2016) Coupled generative adversarial networks

  34. Liu Z, Luo P, Wang X, Tang X (2016) Deep learning face attributes in the wild. In: IEEE International Conference on Computer Vision, pp 3730–3738

  35. Lu Y, Tai Y-W, Tang C-K (2018) Attribute-guided face generation using conditional cyclegan. In: Proceedings of the European conference on computer vision (ECCV), pp 282–297

  36. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders, Computer Science

  37. Marcolin F, Vezzetti E (2017) Novel descriptors for geometrical 3d face analysis. Multimedia Tools and Applications 76:13805–13834

    Article  Google Scholar 

  38. Mirza M, Osindero S (2014) Conditional generative adversarial nets

  39. Perarnau G, van de Weijer J, Raducanu B, Álvarez JM (2016) Invertible Conditional GANs for image editing. In: NIPS Workshop on Adversarial Training

  40. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations

  41. Rezende D J, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: international conference on machine learning, pp 1278–1286

  42. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

  43. Shen W, Liu R (2017) Learning residual images for face attribute manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1225–1233

  44. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, Computer Science

  45. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems, pp 3483–3491

  46. Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation, international conference on learning representations

  47. Tang Y, Salakhutdinov R (2013) Learning stochastic feedforward neural networks. In: International Conference on Neural Information Processing Systems, pp 530–538

  48. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: A survey of face manipulation and fake detection. arXiv:2001.00179

  49. Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2018) Wasserstein auto-encoders. In: International Conference on Learning Representations. https://openreview.net/forum?id=HkL7n1-0b

  50. Ulyanov D, Vedaldi A, Lempitsky V (2018) It takes (only) two: Adversarial generator-encoder networks. In: Thirty-Second AAAI Conference on Artificial Intelligence

  51. Upchurch P, Gardner J R, Pleiss G, Pless R, Snavely N, Bala K, Weinberger K Q (2017) Deep feature interpolation for image content changes

  52. Vezzetti E, Tornincasa-Luca S, Federica Marcolin U, Dagnes N (2018) 3d geometry-based automatic landmark localization in presence of facial occlusions. Multimedia Tools and Applications 77: 14177–14205

    Article  Google Scholar 

  53. Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807

  54. Wang Z, Bovik A C, Sheikh H R, Simoncelli E P (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4):600–612

    Article  Google Scholar 

  55. Xiao T, Hong J, Ma J (2018) Dna-gan: Learning disentangled representations from multi-attribute images. In: International Conference on Learning Representations, Workshop

  56. Xiao T, Hong J, Ma J (2018) Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In: Proceedings of the European conference on computer vision (ECCV), pp 168–184

  57. Yan X, Yang J, Sohn K, Lee H (2016) Attribute2image: Conditional image generation from visual attributes, Springer International Publishing

  58. Zhang R, Isola P, Efros A A, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR

  59. Zhou S, Xiao T, Yang Y, Feng D, He Q, He W (2017) Genegan: Learning object transfiguration and attribute subspace from unpaired data. In: Proceedings of the British Machine Vision Conference (BMVC). arXiv:1705.04932

  60. Zhu J-Y, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: Computer Vision (ICCV), 2017 IEEE International Conference on

Download references

Acknowledgments

This research was funded by Natural Science Foundation of China under grants numbers 61673018, 61272338, 61703443 and Guangzhou Science and Technology Founding Committee under grant No. 201707010222 and Guangdong Province Key Laboratory of Computer Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weifu Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, D., Zhang, M., Zhang, L. et al. A novel attribute-based generation architecture for facial image editing. Multimed Tools Appl 80, 4881–4902 (2021). https://doi.org/10.1007/s11042-020-09858-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09858-7

Keywords