Abstract
Facial image editing is one of the hot topics in recent years due to the great development in deep generative models. Current models are either based on variational autoencoder(VAE) or generative adversarial network(GAN). However, VAE-based models usually generate oversmooth images, while GAN-based-only models cannot randomly generate images with specific attributes and suffer from unstable training. To overcome these limitations, a novel attribute-disentangled generative model based on the combination of VAE and GAN is proposed for facial image editing by manipulating specific attributes and synthesizing facial images conditioned on the specified attributes. In the encoder-decoder architecture of the proposed model, the latent space mapped by the encoder is split into two subspaces: the attribute-irrelevant space and the attribute-relevant space. The attribute-irrelevant space characterizes the factors such as identity, position, background etc, which are expected to be kept unchanged during the editing. The attribute-relevant space is used to represent the attributes such as hair color, gender, age etc that we want to manipulate. We use the adversarial training scheme to train the model, where images generated by the proposed model are re-feeded to the encoder to ensure their distribution is close to the real data distribution in the attribute-irrelevant subspace while they can be correctly classified in the attribute-relevant subspace, without explicitly giving the discriminators such as in GANs. To evaluate the performance of the proposed model, quantitative and qualitative comparisons between the proposed model and other state-of-the-art algorithms were tesed on the CelebA dataset. The evaluation results show that the proposed model can effectively generate high-quality facial images with diverse specified attributes.
Similar content being viewed by others
References
Akhtar Z, Dasgupta D, Banerjee B (2019) Face authenticity: An overview of face manipulation generation, detection and recognition. In: Nutan College of Engineering & Research, International Conference on Communication and Information Processing (ICCIP)
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 214–223
Bao J, Chen D, Wen F, Li H, Hua G (2017) Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2745–2754
Bengio Y, éric ThibodeauLaufer, Alain G, Yosinski J (2013) Deep generative stochastic networks trainable by backprop. Computer Science 2:226–234
Brock A, Donahue J, Simonyan K (2019) Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations. https://openreview.net/forum?id=B1xsqj09Fm
Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:1609.07093
Charlier P, Froesch P, Huynh-Charlier I, Fort A, Hurel A, Jullien F (2014) Use of 3d surface scanning to match facial shapes against altered exhumed remains in a context of forensic individual identification. Forensic Science, Medicine, and Pathology 10(4):654–661
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks. arXiv:1612.02136
Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797
Dai B, Wipf D (2019) Diagnosing and enhancing VAE models. In: International Conference on Learning Representations. https://openreview.net/forum?id=B1e0X3C9tQ
Donahue J, Krähenbühl P, Darrell T (2016) Adversarial feature learning. arXiv:1605.09782
Dumoulin V, Belghazi I, Poole B, Lamb A, Arjovsky M, Mastropietro O, Courville A (2016) Adversarially learned inference. arXiv:1606.00704
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems , pp 2672–2680
Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Draw: a recurrent neural network for image generation
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777
Guo Q, Zhu C, Xia Z, Wang Z, Liu Y (2017) Attribute-controlled face photo synthesis from simple line drawing. arXiv:1702.02805
He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28 (11):5464–5478
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp 6626–6637
Huang H, He R, Sun Z, Tan T, et al. (2018) Introvae: Introspective variational autoencoders for photographic image synthesis. In: Advances in neural information processing systems, pp 52–63
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition , pp 1125–1134
Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations. https://openreview.net/forum?id=Hk99zCeAb
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp 4401–4410
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2019) Analyzing and improving the image quality of stylegan. arXiv:1912.04958
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks, JMLR. org. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 1857–1865
Kim T, Kim B, Cha M, Kim J (2017) Unsupervised visual attribute transfer with reconfigurable generative adversarial networks Computer Vision and Pattern Recognition
Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization, international conference on learning representations
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: international conference on learning representations
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lample G, Zeghidour N, Usunier N, Bordes A, Denoyer L, Ranzato M (2017) Fader networks: Manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems , pp 5967–5976
Larsen A BL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric International Conference on Machine Learning, pp 1558–1566
Li M, Zuo W, Zhang D (2016) Deep identity-aware transfer of facial attributes. arXiv:1610.05586
Liu M, Breuel TM, Kautz J (2017) Unsupervised image-to-image translation networks
Liu M, Tuzel O (2016) Coupled generative adversarial networks
Liu Z, Luo P, Wang X, Tang X (2016) Deep learning face attributes in the wild. In: IEEE International Conference on Computer Vision, pp 3730–3738
Lu Y, Tai Y-W, Tang C-K (2018) Attribute-guided face generation using conditional cyclegan. In: Proceedings of the European conference on computer vision (ECCV), pp 282–297
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders, Computer Science
Marcolin F, Vezzetti E (2017) Novel descriptors for geometrical 3d face analysis. Multimedia Tools and Applications 76:13805–13834
Mirza M, Osindero S (2014) Conditional generative adversarial nets
Perarnau G, van de Weijer J, Raducanu B, Álvarez JM (2016) Invertible Conditional GANs for image editing. In: NIPS Workshop on Adversarial Training
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations
Rezende D J, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: international conference on machine learning, pp 1278–1286
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242
Shen W, Liu R (2017) Learning residual images for face attribute manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1225–1233
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, Computer Science
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems, pp 3483–3491
Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation, international conference on learning representations
Tang Y, Salakhutdinov R (2013) Learning stochastic feedforward neural networks. In: International Conference on Neural Information Processing Systems, pp 530–538
Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: A survey of face manipulation and fake detection. arXiv:2001.00179
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2018) Wasserstein auto-encoders. In: International Conference on Learning Representations. https://openreview.net/forum?id=HkL7n1-0b
Ulyanov D, Vedaldi A, Lempitsky V (2018) It takes (only) two: Adversarial generator-encoder networks. In: Thirty-Second AAAI Conference on Artificial Intelligence
Upchurch P, Gardner J R, Pleiss G, Pless R, Snavely N, Bala K, Weinberger K Q (2017) Deep feature interpolation for image content changes
Vezzetti E, Tornincasa-Luca S, Federica Marcolin U, Dagnes N (2018) 3d geometry-based automatic landmark localization in presence of facial occlusions. Multimedia Tools and Applications 77: 14177–14205
Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4):600–612
Xiao T, Hong J, Ma J (2018) Dna-gan: Learning disentangled representations from multi-attribute images. In: International Conference on Learning Representations, Workshop
Xiao T, Hong J, Ma J (2018) Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In: Proceedings of the European conference on computer vision (ECCV), pp 168–184
Yan X, Yang J, Sohn K, Lee H (2016) Attribute2image: Conditional image generation from visual attributes, Springer International Publishing
Zhang R, Isola P, Efros A A, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR
Zhou S, Xiao T, Yang Y, Feng D, He Q, He W (2017) Genegan: Learning object transfiguration and attribute subspace from unpaired data. In: Proceedings of the British Machine Vision Conference (BMVC). arXiv:1705.04932
Zhu J-Y, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: Computer Vision (ICCV), 2017 IEEE International Conference on
Acknowledgments
This research was funded by Natural Science Foundation of China under grants numbers 61673018, 61272338, 61703443 and Guangzhou Science and Technology Founding Committee under grant No. 201707010222 and Guangdong Province Key Laboratory of Computer Science.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, D., Zhang, M., Zhang, L. et al. A novel attribute-based generation architecture for facial image editing. Multimed Tools Appl 80, 4881–4902 (2021). https://doi.org/10.1007/s11042-020-09858-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09858-7