Abstract
We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach constructs an implicit representation of 3D human heads, anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we represent semantic consistent head region by a local tri-plane, modulated by a 3D Gaussian. Additionally, we parameterize these tri-planes in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with flexible global and fine-grained region-based editing over facial structures, appearance and expressions. Extensive experiments demonstrate the effectiveness of our method.
Work done while Yushi Lan was an intern at Google.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: Panohead: geometry-aware 3D full-head synthesis in 360deg. In: CVPR, pp. 20950–20959 (2023)
Anciukevičius, T., et al.: Renderdiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: CVPR, pp. 12608–12618 (2023)
Bai, Z., et al.: Efficient 3D implicit head avatar with mesh-anchored hash table blendshapes. In: CVPR, pp. 1975–1984 (2024)
Bai, Z., et al.: Learning personalized high quality volumetric head avatars from monocular RGB videos. In: CVPR, pp. 16890–16900 (2023)
Besnier, V., Jain, H., Bursuc, A., Cord, M., P’erez, P.: This dataset does not exist: training models from generated images. In: ICASSP (2020)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)
Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. In: ICLR (2018)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR. OpenReview.net (2019). https://openreview.net/forum?id=B1xsqj09Fm
Chabra, R., et al.: Deep local shapes: learning local SDF priors for detailed 3D reconstruction. In: ECCV (2020)
Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
Chan, E.R., et al.: GeNVS: generative novel view synthesis with 3D-aware diffusion models. In: arXiv (2023)
Chen, X., Deng, Y., Wang, B.: Mimic3D: thriving 3D-aware GANs via 3D-to-2D imitation. In: ICCV (2023)
Chen, Y., Wang, T., Wu, T., Pan, X., Jia, K., Liu, Z.: Comboverse: compositional 3D assets creation using spatially-aware diffusion guidance. In: ECCV (2024)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NIPS, vol. 34, pp. 8780–8794 (2021)
Dupont, E., Kim, H., Eslami, S., Rezende, D., Rosenbaum, D.: From data to functa: your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204 (2022)
Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: CVPR, pp. 8649–8658 (2021)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., Thies, J.: Neural head avatars from monocular RGB videos. In: CVPR, pp. 18653–18664 (2022)
Gu, J., Gao, Q., Zhai, S., Chen, B., Liu, L., Susskind, J.: Learning controllable 3D diffusion models from single-view images. arXiv preprint arXiv:2304.06700 (2023)
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D-aware generator for high-resolution image synthesis. In: ICLR (2021)
Henzler, P., Mitra, N.J., Ritschel, T.: Escaping Plato’s cave: 3D shape from adversarial rendering. In: ICCV (2019)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NIPS, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
Hong, F., Chen, Z., Lan, Y., Pan, L., Liu, Z.: EVA3D: compositional 3D human generation from 2D image collections. In: ICLR (2022)
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2020)
Jahanian, A., Puig, X., Tian, Y., Isola, P.: Generative models as a data source for multiview representation learning. In: ICLR (2022)
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: CVPR (2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. (ToG) 42(4), 1–14 (2023)
Keselman, L., Hebert, M.: Approximate differentiable rendering with algebraic surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 596–614. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_35
Lan, Y., et al.: Ln3diff: scalable latent neural fields diffusion for speedy 3D generation. In: ECCV (2024)
Lan, Y., Loy, C.C., Dai, B.: DDF: correspondence distillation from nerf-based GAN. IJCV (2022)
Lan, Y., Meng, X., Yang, S., Loy, C.C., Dai, B.: E3dge: self-supervised geometry-aware encoder for style-based 3D GAN inversion. In: CVPR (2023)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. TOG 36(6) (2017). https://doi.org/10.1145/3130800.3130813
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. 40(4), 1–13 (2021). https://doi.org/10.1145/3450626.3459863
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Müller, N., Siddiqui, Y., Porzi, L., Bulo, S.R., Kontschieder, P., Nießner, M.: Diffrf: rendering-guided 3D radiance field diffusion. In: CVPR, pp. 4328–4338 (2023)
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.: HoloGAN: unsupervised learning of 3D representations from natural images. In: ICCV (2019)
Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: CVPR, pp. 11453–11464 (2021)
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2021)
Pan, X., Dai, B., Liu, Z., Loy, C.C., Luo, P.: Do 2D GANs know 3D shape? Unsupervised 3D shape reconstruction from 2D image GANs. In: ICLR (2021)
Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C.C., Luo, P.: Exploiting deep generative prior for versatile image restoration and manipulation. PAMI 44, 7474–7489 (2022)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Park, K., et al.: Nerfies: deformable neural radiance fields. In: ICCV (2021)
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. TOG 40(6) (2021)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Rebain, D., Matthews, M., Yi, K.M., Lagun, D., Tagliasacchi, A.: LOLNeRF: learn from one look (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NIPS, vol. 35, pp. 36479–36494 (2022)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NIPS (2020)
Shue, J., Chan, E., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: CVPR, pp. 20875–20886 (2022). https://api.semanticscholar.org/CorpusID:254095843
Simsar, E., Tonioni, A., Ornek, E.P., Tombari, F.: Latentswap3D: semantic edits on 3D image GANs. In: ICCVW, pp. 2899–2909 (2023)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR (2021). https://openreview.net/forum?id=PxTIG12RRHS
Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: interactive disentangled editing for high-resolution 3D-aware portrait synthesis. ACM Trans. Graph. (TOG) 41(6), 1–10 (2022). https://doi.org/10.1145/3550454.3555506
Sun, J., et al.: Next3D: generative neural texture rasterization for 3D-aware head avatars. In: CVPR (2023)
Sun, J., et al.: FENeRF: face editing in neural radiance fields (2021)
Tan, F., et al.: Volux-GAN: a generative model for 3D face synthesis with HDRI relighting. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR, pp. 4563–4573 (2023)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: ECCVW (2018)
Wang, Z., et al.: MVDD: multi-view depth diffusion models. In: ECCV (2024)
Xiang, J., Yang, J., Deng, Y., Tong, X.: Gram-HD: 3D-consistent image generation at high resolution with generative radiance manifolds. In: ICCV, pp. 2195–2205 (2023)
Xiang, J., Yang, J., Huang, B., Tong, X.: 3D-aware image generation using 2D diffusion models. In: ICCV, pp. 2383–2393 (2023)
Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: CVPR, vol. abs/2201.08845 (2022)
Yang, S., Jiang, L., Liu, Z., Loy, C.C.: VToonify: controllable high-resolution portrait video style transfer. ACM Trans. Graph. (TOG) 41(6), 1–15 (2022). https://doi.org/10.1145/3550454.3555437
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35, 2878–2890 (2013)
Zeng, X., et al.: Lion: latent point diffusion models for 3D shape generation. In: NIPS (2022)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhang, X., Bi, S., Sunkavalli, K., Su, H., Xu, Z.: Nerfusion: fusing radiance fields for large-scale scene reconstruction. In: CVPR, pp. 5449–5458 (2022)
Zhang, X., Kundu, A., Funkhouser, T., Guibas, L., Su, H., Genova, K.: Nerflets: local radiance fields for efficient structure-aware 3D scene representation from 2D supervision. In: CVPR (2023)
Zhang, Y., et al.: DatasetGAN: efficient labeled data factory with minimal human effort. In: CVPR (2021)
Zheng, Y., Abrevaya, V.F., Bühler, M.C., Chen, X., Black, M.J., Hilliges, O.: Im avatar: implicit morphable head avatars from videos. In: CVPR (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lan, Y. et al. (2025). LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-73650-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73649-0
Online ISBN: 978-3-031-73650-6
eBook Packages: Computer ScienceComputer Science (R0)