DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing | SpringerLink
Skip to main content

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15132))

Included in the following conference series:

  • 287 Accesses

Abstract

We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits or score distillation sampling. A major disadvantage of this approach is the slow convergence caused by aggregating inconsistent information across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two stages. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. To do so, we propose a training-free approach that integrates cues from the 3D geometry of the underlying scene. Second, given a multi-view consistent edited sequence of images, we directly and efficiently optimize the 3D representation, which is based on 3D Gaussian Splatting. Because it avoids incremental and iterative edits, DGE is significantly more accurate and efficient than existing approaches and offers additional benefits, such as enabling selective editing of parts of the scene.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bao, C., et al.: SINE: semantic-driven image-based NeRF editing with prior-guided editing field. In: CVPR (2023)

    Google Scholar 

  2. Bar-Tal, O., Ofri-Amar, D., Fridman, R., Kasten, Y., Dekel, T.: Text2live: text-driven layered image and video editing. In: ECCV (2022)

    Google Scholar 

  3. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)

    Google Scholar 

  4. Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: CVPR (2023)

    Google Scholar 

  5. Cen, J., et al.: Segment any 3D gaussians. arXiv preprint arXiv:2312.00860 (2023)

  6. Ceylan, D., Huang, C.H.P., Mitra, N.J.: Pix2Video: video editing using image diffusion. In: ICCV, pp. 23206–23217 (2023)

    Google Scholar 

  7. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

    Google Scholar 

  8. Chefer, H., Alaluf, Y., Vinker, Y., Wolf, L., Cohen-Or, D.: Attend-and-excite: attention-based semantic guidance for text-to-image diffusion models. In: SIGGRAPH (2023)

    Google Scholar 

  9. Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: tensorial radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13692. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_20

  10. Chen, J., Lyu, J., Wang, Y.: NeuralEditor: editing neural radiance fields via manipulating point clouds. In: CVPR (2023)

    Google Scholar 

  11. Chen, M., Laina, I., Vedaldi, A.: Training-free layout control with cross-attention guidance. In: WACV (2023)

    Google Scholar 

  12. Chen, M., Xie, J., Laina, I., Vedaldi, A.: SHAP-EDITOR: instruction-guided latent 3D editing in seconds. In: CVPR (2024)

    Google Scholar 

  13. Chen, Y., et al.: Gaussianeditor: swift and controllable 3D editing with gaussian splatting. In: CVPR (2024)

    Google Scholar 

  14. Cheng, X., et al.: Progressive3D: progressively local editing for text-to-3D content creation with complex semantic prompts. In: ICLR (2024)

    Google Scholar 

  15. Chiang, P.Z., Tsai, M.S., Tseng, H.Y., sheng Lai, W., Chiu, W.C.: Stylizing 3D scene via implicit representation and hypernetwork. arXiv:2105.13016 (2022)

  16. Dong, J., Wang, Y.: ViCA-NeRF: view-consistency-aware 3D editing of neural radiance fields. In: NeurIPS (2024)

    Google Scholar 

  17. Epstein, D., Jabri, A., Poole, B., Efros, A.A., Holynski, A.: Diffusion self-guidance for controllable image generation. In: NeurIPS (2023)

    Google Scholar 

  18. Fang, J., Wang, J., Zhang, X., Xie, L., Tian, Q.: Gaussianeditor: editing 3D gaussians delicately with text instructions. In: CVPR (2024)

    Google Scholar 

  19. Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. In: ICLR (2023)

    Google Scholar 

  20. Gao, W., Aigerman, N., Groueix, T., Kim, V., Hanocka, R.: TextDeformer: geometry manipulation using text guidance. In: SIGGRAPH (2023)

    Google Scholar 

  21. Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: Tokenflow: consistent diffusion features for consistent video editing. In: ICLR (2024)

    Google Scholar 

  22. Gong, B., Wang, Y., Han, X., Dou, Q.: RecolorNeRF: layer decomposed radiance field for efficient color editing of 3D scenes. In: ACM MM (2023)

    Google Scholar 

  23. Gordon, O., Avrahami, O., Lischinski, D.: Blended-NeRF: zero-shot object generation and blending in existing neural radiance fields. In: ICCV (2023)

    Google Scholar 

  24. Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: CVPR (2018)

    Google Scholar 

  25. Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-NeRF2NeRF: editing 3D scenes with instructions. In: ICCV (2023)

    Google Scholar 

  26. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision (2000)

    Google Scholar 

  27. Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-or, D.: Prompt-to-prompt image editing with cross-attention control. In: ICLR (2023)

    Google Scholar 

  28. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)

    Google Scholar 

  29. Hong, F., Zhang, M., Pan, L., Cai, Z., Yang, L., Liu, Z.: AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars. ACM Trans. Graph. (TOG) 41(4), 1–19 (2022)

    Article  Google Scholar 

  30. Huang, H., Tseng, H., Saini, S., Singh, M., Yang, M.: Learning to stylize novel views. In: ICCV (2021)

    Google Scholar 

  31. Huang, Y., He, Y., Yuan, Y., Lai, Y., Gao, L.: StylizedNeRF: consistent 3D scene stylization as stylized NeRF via 2D-3D mutual learning. In: CVPR (2022)

    Google Scholar 

  32. Huang, Z., Shi, Y., Bruce, N., Gong, M.: SealD-NeRF: interactive pixel-level editing for dynamic scenes by neural radiance fields. arXiv:2402.13510 (2024)

  33. Jambon, C., Kerbl, B., Kopanas, G., Diolatzis, S., Leimkühler, T., Drettakis, G.: NeRFshop: interactive editing of neural radiance fields. Proc. ACM Comput. Graph. Interact. Tech. 6(1), 1–21 (2023)

    Article  Google Scholar 

  34. Kamata, H., Sakuma, Y., Hayakawa, A., Ishii, M., Narihira, T.: Instruct 3D-to-3D: text instruction guided 3D-to-3D conversion. arXiv preprint arXiv:2303.15780 (2023)

  35. Kania, K., Yi, K.M., Kowalski, M., Trzciński, T., Tagliasacchi, A.: CoNeRF: controllable neural radiance fields. In: CVPR, pp. 18623–18632 (2022)

    Google Scholar 

  36. Kawar, B., et al.: Imagic: text-based real image editing with diffusion models. In: CVPR (2023)

    Google Scholar 

  37. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 139 (2023)

    Article  Google Scholar 

  38. Khachatryan, L., et al.: Text2Video-zero: text-to-image diffusion models are zero-shot video generators. In: ICCV (2023)

    Google Scholar 

  39. Kirillov, A., et al.: Segment anything. In: CVPR (2023)

    Google Scholar 

  40. Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing nerf for editing via feature field distillation. NeurIPS 35, 23311–23330 (2022)

    Google Scholar 

  41. Kuang, Z., Luan, F., Bi, S., Shu, Z., Wetzstein, G., Sunkavalli, K.: PaletteNeRF: palette-based appearance editing of neural radiance fields. In: CVPR (2023)

    Google Scholar 

  42. Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: CVPR (2023)

    Google Scholar 

  43. Lazova, V., Guzov, V., Olszewski, K., Tulyakov, S., Pons-Moll, G.: Control-NeRF: editable feature volumes for scene rendering and manipulation. In: WACV (2023)

    Google Scholar 

  44. Lee, J.H., Kim, D.S.: Ice-NeRF: interactive color editing of nerfs via decomposition-aware weight optimization. In: ICCV, pp. 3491–3501 (2023)

    Google Scholar 

  45. Lei, J., Zhang, Y., Jia, K., et al.: Tango: text-driven photorealistic and robust 3D stylization via lighting decomposition. NeurIPS 35, 30923–30936 (2022)

    Google Scholar 

  46. Li, B., Weinberger, K.Q., Belongie, S., Koltun, V., Ranftl, R.: Language-driven semantic segmentation. In: ICLR (2022)

    Google Scholar 

  47. Li, G., Zheng, H., Wang, C., Li, C., Zheng, C., Tao, D.: 3DDesigner: towards photorealistic 3D object generation and editing with text-guided diffusion models. arXiv:2211.14108 (2022)

  48. Li, L.H., et al.: Grounded language-image pre-training. In: CVPR, pp. 10965–10975 (2022)

    Google Scholar 

  49. Li, Y., et al.: FocalDreamer: text-driven 3D editing via focal-fusion assembly. In: AAAI (2024)

    Google Scholar 

  50. Li, Y., et al.: GLIGEN: open-set grounded text-to-image generation. In: CVPR (2023)

    Google Scholar 

  51. Lin, Y., et al.: CompoNeRF: text-guided multi-object compositional NeRF with editable 3D scene layout. arXiv:2303.13843(2023)

  52. Liu, H., Shen, I., Chen, B.: NeRF-In: free-form NeRF inpainting with RGB-D priors. arXiv:2206.04901 (2022)

  53. Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J., Russell, B.: Editing conditional radiance fields. In: ICCV (2021)

    Google Scholar 

  54. Melas-Kyriazi, L., et al.: IM-3D: iterative multiview diffusion and reconstruction for high-quality 3D generation. In: ICML (2024)

    Google Scholar 

  55. Melas-Kyriazi, L., Rupprecht, C., Laina, I., Vedaldi, A.: RealFusion: 360 reconstruction of any object from a single image. In: CVPR (2023)

    Google Scholar 

  56. Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: ICLR (2022)

    Google Scholar 

  57. Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2Mesh: text-driven neural stylization for meshes. In: CVPR (2022)

    Google Scholar 

  58. Mikaeili, A., Perel, O., Safaee, M., Cohen-Or, D., Mahdavi-Amiri, A.: SKED: sketch-guided text-based 3D editing. In: ICCV (2023)

    Google Scholar 

  59. Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38, 1–14 (2019)

    Article  Google Scholar 

  60. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  61. Mirzaei, A., et al.: Watch your steps: local image and scene editing by text instructions arXiv:2308.08947 (2023)

  62. Mirzaei, A., et al.: Spin-NeRF: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR, pp. 20669–20679 (2023)

    Google Scholar 

  63. Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models. In: CVPR, pp. 6038–6047 (2023)

    Google Scholar 

  64. Nguyen-Phuoc, T., Liu, F., Xiao, L.: SNeRF: stylized neural implicit representations for 3D scenes. ACM Trans. Graph. (TOG) 41(4), 1–11 (2022)

    Article  Google Scholar 

  65. Nichol, A., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741 (2021)

  66. Park, J., Kwon, G., Ye, J.C.: ED-NeRF: efficient text-guided editing of 3D scene using latent space NeRF. In: ICLR (2024)

    Google Scholar 

  67. Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation. In: SIGGRAPH (2023)

    Google Scholar 

  68. Podell, D., et al.: SDXL: improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952 (2023)

  69. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: ICLR (2023)

    Google Scholar 

  70. Qi, C., et al.: FateZero: fusing attentions for zero-shot text-based video editing. In: ICCV (2023)

    Google Scholar 

  71. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, vol. 139, pp. 8748–8763 (2021)

    Google Scholar 

  72. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

    Google Scholar 

  73. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)

    Google Scholar 

  74. Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3D objects. In: ICCV (2023)

    Google Scholar 

  75. Shi, Y., Xue, C., Pan, J., Zhang, W., Tan, V.Y., Bai, S.: DragDiffusion: harnessing diffusion models for interactive point-based image editing. arXiv:2306.14435 (2023)

  76. Song, H., Choi, S., Do, H., Lee, C., Kim, T.: Blending-NeRF: text-driven localized editing in neural radiance fields. In: ICCV, pp. 14383–14393 (2023)

    Google Scholar 

  77. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

    Google Scholar 

  78. Sun, C., Sun, M., Chen, H.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: CVPR (2022)

    Google Scholar 

  79. Sun, C., Liu, Y., Han, J., Gould, S.: NeRFEditor: differentiable style decomposition for full 3D scene editing. In: WACV (2022)

    Google Scholar 

  80. Teng, Y., Xie, E., Wu, Y., Han, H., Li, Z., Liu, X.: Drag-a-video: non-rigid video editing with point-based interaction. arXiv:2312.02936 (2023)

  81. Tretschk, E., Golyanik, V., Zollhöfer, M., Bozic, A., Lassner, C., Theobalt, C.: Scenerflow: time-consistent reconstruction of general dynamic scenes. arXiv:2308.08258 (2023)

  82. Tschernezki, V., Laina, I., Larlus, D., Vedaldi, A.: Neural feature fusion fields: 3D distillation of self-supervised 2D image representation. arXiv:2209.03494 (2022)

  83. Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: CVPR (2023)

    Google Scholar 

  84. Vargas, F., Grathwohl, W.S., Doucet, A.: Denoising diffusion samplers. In: ICLR (2023)

    Google Scholar 

  85. Wang, B., Dutt, N.S., Mitra, N.J.: ProteusNeRF: fast lightweight NeRF editing using 3D-aware image context. arXiv:2310.09965 (2023)

  86. Wang, C., Chai, M., He, M., Chen, D., Liao, J.: CLIP-NeRF: text-and-image driven manipulation of neural radiance fields. In: CVPR (2022)

    Google Scholar 

  87. Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: NeRF-Art: text-driven neural radiance fields stylization. arXiv:2212.08070 (2022)

  88. Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: text-guided 3D inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)

  89. Wang, X., et al.: Seal-3D: interactive pixel-level editing for neural radiance fields. In: ICCV (2023)

    Google Scholar 

  90. Weder, S., et al.: Removing objects from neural radiance fields. In: CVPR, pp. 16528–16538 (2023)

    Google Scholar 

  91. Wu, J.Z., et al.: Tune-a-video: one-shot tuning of image diffusion models for text-to-video generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7623–7633 (2023)

    Google Scholar 

  92. Xu, J., Wang, X., Cao, Y., Cheng, W., Shan, Y., Gao, S.: InstructP2P: learning to edit 3D point clouds with text instructions. arXiv:2306.07154 (2023)

  93. Xu, S., Li, L., Shen, L., Lian, Z.: DeSRF: deformable stylized radiance field. In: CVPR, pp. 709–718 (2023)

    Google Scholar 

  94. Xu, T., Harada, T.: Deforming radiance fields with cages. In: ECCV (2022)

    Google Scholar 

  95. Yang, B., et al.: NeuMesh: learning disentangled neural mesh-based implicit field for geometry and texture editing. In: ECCV (2022)

    Google Scholar 

  96. Yang, B., et al.: Learning object-compositional neural radiance field for editable scene rendering. In: ICCV (2021)

    Google Scholar 

  97. Yu, L., Xiang, W., Han, K.: Edit-diffNeRF: editing 3D neural radiance fields using 2D diffusion model. arXiv preprint arXiv:2306.09551 (2023)

  98. Yuan, Y., Sun, Y., Lai, Y., Ma, Y., Jia, R., Gao, L.: NeRF-editing: geometry editing of neural radiance fields. In: CVPR (2022)

    Google Scholar 

  99. Zhang, H., Feng, Y., Kulits, P., Wen, Y., Thies, J., Black, M.J.: Text-guided generation and editing of compositional 3D avatars. In: 2024 International Conference on 3D Vision (3DV) (2024)

    Google Scholar 

  100. Zhang, J., et al.: Editable free-viewpoint video using a layered neural representation. ACM Trans. Graph. 40(4), 1–18 (2021)

    MathSciNet  Google Scholar 

  101. Zhang, K., et al.: ARF: artistic radiance fields. In: ECCV (2022)

    Google Scholar 

  102. Zhang, K., Mo, L., Chen, W., Sun, H., Su, Y.: MagicBrush: a manually annotated dataset for instruction-guided image editing. In: NeurIPS (2023)

    Google Scholar 

  103. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)

    Google Scholar 

  104. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  105. Zhang, S., et al.: Hive: harnessing human feedback for instructional visual editing. In: CVPR (2024)

    Google Scholar 

  106. Zheng, C., Lin, W., Xu, F.: EditableNeRF: editing topologically varying neural radiance fields by key points. In: CVPR (2023)

    Google Scholar 

  107. Zhou, S., et al.: Feature 3DGS: supercharging 3D gaussian splatting to enable distilled feature fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21676–21685 (2024)

    Google Scholar 

  108. Zhou, X., He, Y., Yu, F.R., Li, J., Li, Y.: Repaint-NeRF: NeRF editting via semantic masks and diffusion models. In: IJCAI (2023)

    Google Scholar 

  109. Zhuang, J., Wang, C., Lin, L., Liu, L., Li, G.: DreamEditor: text-driven 3D scene editing with neural fields. In: SIGGRAPH (2023)

    Google Scholar 

Download references

Acknowledgements

This research is supported by ERC-CoG UNION 101001212. I. L. is also partially supported by the VisualAI EPSRC grant (EP/T028572/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minghao Chen .

Editor information

Editors and Affiliations

Ethics declarations

Ethics

For further details on ethics, data protection, and copyright please see https://www.robots.ox.ac.uk/~vedaldi/research/union/ethics.html.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, M., Laina, I., Vedaldi, A. (2025). DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72904-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72903-4

  • Online ISBN: 978-3-031-72904-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics