Automatic Generation of Semantic Parts for Face Image Synthesis | SpringerLink
Skip to main content

Automatic Generation of Semantic Parts for Face Image Synthesis

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2023 (ICIAP 2023)

Abstract

Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9380
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11725
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Davidson, T.R., Falorsi, L., De Cao, N., Kipf, T., Tomczak, J.M.: Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891 (2018)

  2. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)

    Article  Google Scholar 

  3. Ferrari, C., Serpentoni, M., Berretti, S., Del Bimbo, A.: What makes you, you? Analyzing recognition by swapping face parts. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 945–951. IEEE (2022)

    Google Scholar 

  4. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)

  5. Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (2017)

    Google Scholar 

  6. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  7. Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5549–5558 (2020)

    Google Scholar 

  8. McClelland, J.L., Rumelhart, D.E., Group, P.R., et al.: Parallel Distributed Processing, Volume 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models. vol. 2. MIT press (1987)

    Google Scholar 

  9. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

    Google Scholar 

  10. Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)

    Google Scholar 

  11. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  12. Tan, Z., et al.: Diverse semantic image synthesis via probability distribution modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2021)

    Google Scholar 

  13. Tan, Z., et al.: Efficient semantic image synthesis via class-adaptive normalization. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4852–4866 (2021)

    Google Scholar 

  14. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems. vol. 30 (2017)

    Google Scholar 

  15. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems. vol. 30 (2017)

    Google Scholar 

  16. Wang, Y., Qi, L., Chen, Y.C., Zhang, X., Jia, J.: Image synthesis via semantic composition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13749–13758 (2021)

    Google Scholar 

  17. Zhao, S., Song, J., Ermon, S.: Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017)

  18. Zhu, P., Abdal, R., Qin, Y., Wonka, P.: SEAN: image synthesis with semantic region-adaptive normalization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)

    Google Scholar 

Download references

Acknowledgments

This work was supported by PRIN 2020 “LEGO.AI: LEarning the Geometry of knOwledge in AI systems”, grant no. 2020TA3K9N funded by the Italian MIUR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaso Fontanini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fontanini, T., Ferrari, C., Bertozzi, M., Prati, A. (2023). Automatic Generation of Semantic Parts for Face Image Synthesis. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43148-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43147-0

  • Online ISBN: 978-3-031-43148-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics