Automatic Generation of Semantic Parts for Face Image Synthesis

Fontanini, Tomaso; Ferrari, Claudio; Bertozzi, Massimo; Prati, Andrea

doi:10.1007/978-3-031-43148-7_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14233))

Included in the following conference series:

International Conference on Image Analysis and Processing

793 Accesses
3 Citations

Abstract

Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

IQ-GAN: Instance-Quantized Image Synthesis

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Semantically Consistent Person Image Generation

References

Davidson, T.R., Falorsi, L., De Cao, N., Kipf, T., Tomczak, J.M.: Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891 (2018)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Ferrari, C., Serpentoni, M., Berretti, S., Del Bimbo, A.: What makes you, you? Analyzing recognition by swapping face parts. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 945–951. IEEE (2022)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (2017)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5549–5558 (2020)
Google Scholar
McClelland, J.L., Rumelhart, D.E., Group, P.R., et al.: Parallel Distributed Processing, Volume 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models. vol. 2. MIT press (1987)
Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Google Scholar
Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Tan, Z., et al.: Diverse semantic image synthesis via probability distribution modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2021)
Google Scholar
Tan, Z., et al.: Efficient semantic image synthesis via class-adaptive normalization. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4852–4866 (2021)
Google Scholar
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems. vol. 30 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems. vol. 30 (2017)
Google Scholar
Wang, Y., Qi, L., Chen, Y.C., Zhang, X., Jia, J.: Image synthesis via semantic composition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13749–13758 (2021)
Google Scholar
Zhao, S., Song, J., Ermon, S.: Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017)
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: SEAN: image synthesis with semantic region-adaptive normalization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
Google Scholar

Download references

Acknowledgments

This work was supported by PRIN 2020 “LEGO.AI: LEarning the Geometry of knOwledge in AI systems”, grant no. 2020TA3K9N funded by the Italian MIUR.

Author information

Authors and Affiliations

IMP Lab, Department of Engineering and Architecture, University of Parma, Parma, Italy
Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi & Andrea Prati

Authors

Tomaso Fontanini
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Ferrari
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Bertozzi
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Prati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomaso Fontanini .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fontanini, T., Ferrari, C., Bertozzi, M., Prati, A. (2023). Automatic Generation of Semantic Parts for Face Image Synthesis. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-43148-7_18
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Generation of Semantic Parts for Face Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

IQ-GAN: Instance-Quantized Image Synthesis

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Semantically Consistent Person Image Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Generation of Semantic Parts for Face Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

IQ-GAN: Instance-Quantized Image Synthesis

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Semantically Consistent Person Image Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation