LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing

Lan, Yushi; Tan, Feitong; Xu, Qiangeng; Qiu, Di; Genova, Kyle; Huang, Zeng; Fanello, Sean; Pandey, Rohit; Funkhouser, Thomas; Loy, Chen Change; Zhang, Yinda

doi:10.1007/978-3-031-73650-6_4

Yushi Lan^13,14,
Feitong Tan¹³,
Qiangeng Xu¹³,
Di Qiu¹³,
Kyle Genova¹³,
Zeng Huang¹³,
Sean Fanello¹³,
Rohit Pandey¹³,
Thomas Funkhouser¹³,
Chen Change Loy¹⁴ &
…
Yinda Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15123))

Included in the following conference series:

European Conference on Computer Vision

211 Accesses

Abstract

We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach constructs an implicit representation of 3D human heads, anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we represent semantic consistent head region by a local tri-plane, modulated by a 3D Gaussian. Additionally, we parameterize these tri-planes in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with flexible global and fine-grained region-based editing over facial structures, appearance and expressions. Extensive experiments demonstrate the effectiveness of our method.

Work done while Yushi Lan was an intern at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D Gaussian Parametric Head Model

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360 $$^\circ $$

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

References

An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: Panohead: geometry-aware 3D full-head synthesis in 360deg. In: CVPR, pp. 20950–20959 (2023)
Google Scholar
Anciukevičius, T., et al.: Renderdiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: CVPR, pp. 12608–12618 (2023)
Google Scholar
Bai, Z., et al.: Efficient 3D implicit head avatar with mesh-anchored hash table blendshapes. In: CVPR, pp. 1975–1984 (2024)
Google Scholar
Bai, Z., et al.: Learning personalized high quality volumetric head avatars from monocular RGB videos. In: CVPR, pp. 16890–16900 (2023)
Google Scholar
Besnier, V., Jain, H., Bursuc, A., Cord, M., P’erez, P.: This dataset does not exist: training models from generated images. In: ICASSP (2020)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)
Google Scholar
Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. In: ICLR (2018)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR. OpenReview.net (2019). https://openreview.net/forum?id=B1xsqj09Fm
Chabra, R., et al.: Deep local shapes: learning local SDF priors for detailed 3D reconstruction. In: ECCV (2020)
Google Scholar
Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)
Google Scholar
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
Google Scholar
Chan, E.R., et al.: GeNVS: generative novel view synthesis with 3D-aware diffusion models. In: arXiv (2023)
Google Scholar
Chen, X., Deng, Y., Wang, B.: Mimic3D: thriving 3D-aware GANs via 3D-to-2D imitation. In: ICCV (2023)
Google Scholar
Chen, Y., Wang, T., Wu, T., Pan, X., Jia, K., Liu, Z.: Comboverse: compositional 3D assets creation using spatially-aware diffusion guidance. In: ECCV (2024)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NIPS, vol. 34, pp. 8780–8794 (2021)
Google Scholar
Dupont, E., Kim, H., Eslami, S., Rezende, D., Rosenbaum, D.: From data to functa: your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204 (2022)
Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: CVPR, pp. 8649–8658 (2021)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
Google Scholar
Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., Thies, J.: Neural head avatars from monocular RGB videos. In: CVPR, pp. 18653–18664 (2022)
Google Scholar
Gu, J., Gao, Q., Zhai, S., Chen, B., Liu, L., Susskind, J.: Learning controllable 3D diffusion models from single-view images. arXiv preprint arXiv:2304.06700 (2023)
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D-aware generator for high-resolution image synthesis. In: ICLR (2021)
Google Scholar
Henzler, P., Mitra, N.J., Ritschel, T.: Escaping Plato’s cave: 3D shape from adversarial rendering. In: ICCV (2019)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NIPS, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
Hong, F., Chen, Z., Lan, Y., Pan, L., Liu, Z.: EVA3D: compositional 3D human generation from 2D image collections. In: ICLR (2022)
Google Scholar
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2020)
Google Scholar
Jahanian, A., Puig, X., Tian, Y., Isola, P.: Generative models as a data source for multiview representation learning. In: ICLR (2022)
Google Scholar
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: CVPR (2022)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. (ToG) 42(4), 1–14 (2023)
Article Google Scholar
Keselman, L., Hebert, M.: Approximate differentiable rendering with algebraic surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 596–614. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_35
Chapter Google Scholar
Lan, Y., et al.: Ln3diff: scalable latent neural fields diffusion for speedy 3D generation. In: ECCV (2024)
Google Scholar
Lan, Y., Loy, C.C., Dai, B.: DDF: correspondence distillation from nerf-based GAN. IJCV (2022)
Google Scholar
Lan, Y., Meng, X., Yang, S., Loy, C.C., Dai, B.: E3dge: self-supervised geometry-aware encoder for style-based 3D GAN inversion. In: CVPR (2023)
Google Scholar
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. TOG 36(6) (2017). https://doi.org/10.1145/3130800.3130813
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. 40(4), 1–13 (2021). https://doi.org/10.1145/3450626.3459863
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)
Article Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, N., Siddiqui, Y., Porzi, L., Bulo, S.R., Kontschieder, P., Nießner, M.: Diffrf: rendering-guided 3D radiance field diffusion. In: CVPR, pp. 4328–4338 (2023)
Google Scholar
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.: HoloGAN: unsupervised learning of 3D representations from natural images. In: ICCV (2019)
Google Scholar
Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: CVPR, pp. 11453–11464 (2021)
Google Scholar
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2021)
Google Scholar
Pan, X., Dai, B., Liu, Z., Loy, C.C., Luo, P.: Do 2D GANs know 3D shape? Unsupervised 3D shape reconstruction from 2D image GANs. In: ICLR (2021)
Google Scholar
Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C.C., Luo, P.: Exploiting deep generative prior for versatile image restoration and manipulation. PAMI 44, 7474–7489 (2022)
Article Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Google Scholar
Park, K., et al.: Nerfies: deformable neural radiance fields. In: ICCV (2021)
Google Scholar
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. TOG 40(6) (2021)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Google Scholar
Rebain, D., Matthews, M., Yi, K.M., Lagun, D., Tagliasacchi, A.: LOLNeRF: learn from one look (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NIPS, vol. 35, pp. 36479–36494 (2022)
Google Scholar
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NIPS (2020)
Google Scholar
Shue, J., Chan, E., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: CVPR, pp. 20875–20886 (2022). https://api.semanticscholar.org/CorpusID:254095843
Simsar, E., Tonioni, A., Ornek, E.P., Tombari, F.: Latentswap3D: semantic edits on 3D image GANs. In: ICCVW, pp. 2899–2909 (2023)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR (2021). https://openreview.net/forum?id=PxTIG12RRHS
Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: interactive disentangled editing for high-resolution 3D-aware portrait synthesis. ACM Trans. Graph. (TOG) 41(6), 1–10 (2022). https://doi.org/10.1145/3550454.3555506
Article Google Scholar
Sun, J., et al.: Next3D: generative neural texture rasterization for 3D-aware head avatars. In: CVPR (2023)
Google Scholar
Sun, J., et al.: FENeRF: face editing in neural radiance fields (2021)
Google Scholar
Tan, F., et al.: Volux-GAN: a generative model for 3D face synthesis with HDRI relighting. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
Google Scholar
Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR, pp. 4563–4573 (2023)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: ECCVW (2018)
Google Scholar
Wang, Z., et al.: MVDD: multi-view depth diffusion models. In: ECCV (2024)
Google Scholar
Xiang, J., Yang, J., Deng, Y., Tong, X.: Gram-HD: 3D-consistent image generation at high resolution with generative radiance manifolds. In: ICCV, pp. 2195–2205 (2023)
Google Scholar
Xiang, J., Yang, J., Huang, B., Tong, X.: 3D-aware image generation using 2D diffusion models. In: ICCV, pp. 2383–2393 (2023)
Google Scholar
Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: CVPR, vol. abs/2201.08845 (2022)
Google Scholar
Yang, S., Jiang, L., Liu, Z., Loy, C.C.: VToonify: controllable high-resolution portrait video style transfer. ACM Trans. Graph. (TOG) 41(6), 1–15 (2022). https://doi.org/10.1145/3550454.3555437
Article Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35, 2878–2890 (2013)
Article Google Scholar
Zeng, X., et al.: Lion: latent point diffusion models for 3D shape generation. In: NIPS (2022)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, X., Bi, S., Sunkavalli, K., Su, H., Xu, Z.: Nerfusion: fusing radiance fields for large-scale scene reconstruction. In: CVPR, pp. 5449–5458 (2022)
Google Scholar
Zhang, X., Kundu, A., Funkhouser, T., Guibas, L., Su, H., Genova, K.: Nerflets: local radiance fields for efficient structure-aware 3D scene representation from 2D supervision. In: CVPR (2023)
Google Scholar
Zhang, Y., et al.: DatasetGAN: efficient labeled data factory with minimal human effort. In: CVPR (2021)
Google Scholar
Zheng, Y., Abrevaya, V.F., Bühler, M.C., Chen, X., Black, M.J., Hilliges, O.: Im avatar: implicit morphable head avatars from videos. In: CVPR (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Google, Mountain View, USA
Yushi Lan, Feitong Tan, Qiangeng Xu, Di Qiu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser & Yinda Zhang
S-Lab, Nanyang Technological University, Singapore, Singapore
Yushi Lan & Chen Change Loy

Authors

Yushi Lan
View author publications
You can also search for this author in PubMed Google Scholar
Feitong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Qiangeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Di Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Kyle Genova
View author publications
You can also search for this author in PubMed Google Scholar
Zeng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sean Fanello
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Funkhouser
View author publications
You can also search for this author in PubMed Google Scholar
Chen Change Loy
View author publications
You can also search for this author in PubMed Google Scholar
Yinda Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yushi Lan .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1765 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lan, Y. et al. (2025). LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-73650-6_4
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73649-0
Online ISBN: 978-3-031-73650-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing