LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing | SpringerLink
Skip to main content

LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15123))

Included in the following conference series:

  • 106 Accesses

Abstract

We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach constructs an implicit representation of 3D human heads, anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we represent semantic consistent head region by a local tri-plane, modulated by a 3D Gaussian. Additionally, we parameterize these tri-planes in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with flexible global and fine-grained region-based editing over facial structures, appearance and expressions. Extensive experiments demonstrate the effectiveness of our method.

Work done while Yushi Lan was an intern at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: Panohead: geometry-aware 3D full-head synthesis in 360deg. In: CVPR, pp. 20950–20959 (2023)

    Google Scholar 

  2. Anciukevičius, T., et al.: Renderdiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: CVPR, pp. 12608–12618 (2023)

    Google Scholar 

  3. Bai, Z., et al.: Efficient 3D implicit head avatar with mesh-anchored hash table blendshapes. In: CVPR, pp. 1975–1984 (2024)

    Google Scholar 

  4. Bai, Z., et al.: Learning personalized high quality volumetric head avatars from monocular RGB videos. In: CVPR, pp. 16890–16900 (2023)

    Google Scholar 

  5. Besnier, V., Jain, H., Bursuc, A., Cord, M., P’erez, P.: This dataset does not exist: training models from generated images. In: ICASSP (2020)

    Google Scholar 

  6. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)

    Google Scholar 

  7. Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. In: ICLR (2018)

    Google Scholar 

  8. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR. OpenReview.net (2019). https://openreview.net/forum?id=B1xsqj09Fm

  9. Chabra, R., et al.: Deep local shapes: learning local SDF priors for detailed 3D reconstruction. In: ECCV (2020)

    Google Scholar 

  10. Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)

    Google Scholar 

  11. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

    Google Scholar 

  12. Chan, E.R., et al.: GeNVS: generative novel view synthesis with 3D-aware diffusion models. In: arXiv (2023)

    Google Scholar 

  13. Chen, X., Deng, Y., Wang, B.: Mimic3D: thriving 3D-aware GANs via 3D-to-2D imitation. In: ICCV (2023)

    Google Scholar 

  14. Chen, Y., Wang, T., Wu, T., Pan, X., Jia, K., Liu, Z.: Comboverse: compositional 3D assets creation using spatially-aware diffusion guidance. In: ECCV (2024)

    Google Scholar 

  15. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)

    Google Scholar 

  16. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NIPS, vol. 34, pp. 8780–8794 (2021)

    Google Scholar 

  17. Dupont, E., Kim, H., Eslami, S., Rezende, D., Rosenbaum, D.: From data to functa: your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204 (2022)

  18. Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: CVPR, pp. 8649–8658 (2021)

    Google Scholar 

  19. Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)

    Google Scholar 

  20. Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., Thies, J.: Neural head avatars from monocular RGB videos. In: CVPR, pp. 18653–18664 (2022)

    Google Scholar 

  21. Gu, J., Gao, Q., Zhai, S., Chen, B., Liu, L., Susskind, J.: Learning controllable 3D diffusion models from single-view images. arXiv preprint arXiv:2304.06700 (2023)

  22. Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D-aware generator for high-resolution image synthesis. In: ICLR (2021)

    Google Scholar 

  23. Henzler, P., Mitra, N.J., Ritschel, T.: Escaping Plato’s cave: 3D shape from adversarial rendering. In: ICCV (2019)

    Google Scholar 

  24. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NIPS, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

  25. Hong, F., Chen, Z., Lan, Y., Pan, L., Liu, Z.: EVA3D: compositional 3D human generation from 2D image collections. In: ICLR (2022)

    Google Scholar 

  26. Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2020)

    Google Scholar 

  27. Jahanian, A., Puig, X., Tian, Y., Isola, P.: Generative models as a data source for multiview representation learning. In: ICLR (2022)

    Google Scholar 

  28. Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: CVPR (2022)

    Google Scholar 

  29. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  30. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)

    Google Scholar 

  31. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. (ToG) 42(4), 1–14 (2023)

    Article  Google Scholar 

  32. Keselman, L., Hebert, M.: Approximate differentiable rendering with algebraic surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 596–614. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_35

    Chapter  Google Scholar 

  33. Lan, Y., et al.: Ln3diff: scalable latent neural fields diffusion for speedy 3D generation. In: ECCV (2024)

    Google Scholar 

  34. Lan, Y., Loy, C.C., Dai, B.: DDF: correspondence distillation from nerf-based GAN. IJCV (2022)

    Google Scholar 

  35. Lan, Y., Meng, X., Yang, S., Loy, C.C., Dai, B.: E3dge: self-supervised geometry-aware encoder for style-based 3D GAN inversion. In: CVPR (2023)

    Google Scholar 

  36. Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. TOG 36(6) (2017). https://doi.org/10.1145/3130800.3130813

  37. Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. 40(4), 1–13 (2021). https://doi.org/10.1145/3450626.3459863

  38. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)

    Article  Google Scholar 

  39. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)

    Google Scholar 

  40. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  41. Müller, N., Siddiqui, Y., Porzi, L., Bulo, S.R., Kontschieder, P., Nießner, M.: Diffrf: rendering-guided 3D radiance field diffusion. In: CVPR, pp. 4328–4338 (2023)

    Google Scholar 

  42. Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.: HoloGAN: unsupervised learning of 3D representations from natural images. In: ICCV (2019)

    Google Scholar 

  43. Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: CVPR, pp. 11453–11464 (2021)

    Google Scholar 

  44. Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2021)

    Google Scholar 

  45. Pan, X., Dai, B., Liu, Z., Loy, C.C., Luo, P.: Do 2D GANs know 3D shape? Unsupervised 3D shape reconstruction from 2D image GANs. In: ICLR (2021)

    Google Scholar 

  46. Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C.C., Luo, P.: Exploiting deep generative prior for versatile image restoration and manipulation. PAMI 44, 7474–7489 (2022)

    Article  Google Scholar 

  47. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)

    Google Scholar 

  48. Park, K., et al.: Nerfies: deformable neural radiance fields. In: ICCV (2021)

    Google Scholar 

  49. Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. TOG 40(6) (2021)

    Google Scholar 

  50. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)

    Google Scholar 

  51. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2022)

    Google Scholar 

  52. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

    Google Scholar 

  53. Rebain, D., Matthews, M., Yi, K.M., Lagun, D., Tagliasacchi, A.: LOLNeRF: learn from one look (2022)

    Google Scholar 

  54. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)

    Google Scholar 

  55. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  56. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NIPS, vol. 35, pp. 36479–36494 (2022)

    Google Scholar 

  57. Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NIPS (2020)

    Google Scholar 

  58. Shue, J., Chan, E., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: CVPR, pp. 20875–20886 (2022). https://api.semanticscholar.org/CorpusID:254095843

  59. Simsar, E., Tonioni, A., Ornek, E.P., Tombari, F.: Latentswap3D: semantic edits on 3D image GANs. In: ICCVW, pp. 2899–2909 (2023)

    Google Scholar 

  60. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR (2021). https://openreview.net/forum?id=PxTIG12RRHS

  61. Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: interactive disentangled editing for high-resolution 3D-aware portrait synthesis. ACM Trans. Graph. (TOG) 41(6), 1–10 (2022). https://doi.org/10.1145/3550454.3555506

    Article  Google Scholar 

  62. Sun, J., et al.: Next3D: generative neural texture rasterization for 3D-aware head avatars. In: CVPR (2023)

    Google Scholar 

  63. Sun, J., et al.: FENeRF: face editing in neural radiance fields (2021)

    Google Scholar 

  64. Tan, F., et al.: Volux-GAN: a generative model for 3D face synthesis with HDRI relighting. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)

    Google Scholar 

  65. Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR, pp. 4563–4573 (2023)

    Google Scholar 

  66. Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: ECCVW (2018)

    Google Scholar 

  67. Wang, Z., et al.: MVDD: multi-view depth diffusion models. In: ECCV (2024)

    Google Scholar 

  68. Xiang, J., Yang, J., Deng, Y., Tong, X.: Gram-HD: 3D-consistent image generation at high resolution with generative radiance manifolds. In: ICCV, pp. 2195–2205 (2023)

    Google Scholar 

  69. Xiang, J., Yang, J., Huang, B., Tong, X.: 3D-aware image generation using 2D diffusion models. In: ICCV, pp. 2383–2393 (2023)

    Google Scholar 

  70. Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: CVPR, vol. abs/2201.08845 (2022)

    Google Scholar 

  71. Yang, S., Jiang, L., Liu, Z., Loy, C.C.: VToonify: controllable high-resolution portrait video style transfer. ACM Trans. Graph. (TOG) 41(6), 1–15 (2022). https://doi.org/10.1145/3550454.3555437

    Article  Google Scholar 

  72. Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35, 2878–2890 (2013)

    Article  Google Scholar 

  73. Zeng, X., et al.: Lion: latent point diffusion models for 3D shape generation. In: NIPS (2022)

    Google Scholar 

  74. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)

    Google Scholar 

  75. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  76. Zhang, X., Bi, S., Sunkavalli, K., Su, H., Xu, Z.: Nerfusion: fusing radiance fields for large-scale scene reconstruction. In: CVPR, pp. 5449–5458 (2022)

    Google Scholar 

  77. Zhang, X., Kundu, A., Funkhouser, T., Guibas, L., Su, H., Genova, K.: Nerflets: local radiance fields for efficient structure-aware 3D scene representation from 2D supervision. In: CVPR (2023)

    Google Scholar 

  78. Zhang, Y., et al.: DatasetGAN: efficient labeled data factory with minimal human effort. In: CVPR (2021)

    Google Scholar 

  79. Zheng, Y., Abrevaya, V.F., Bühler, M.C., Chen, X., Black, M.J., Hilliges, O.: Im avatar: implicit morphable head avatars from videos. In: CVPR (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yushi Lan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1765 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lan, Y. et al. (2025). LOC3DIFF: Local Diffusion for 3D Human Head Synthesis and Editing. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73650-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73649-0

  • Online ISBN: 978-3-031-73650-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics