Text-guided 3D editing based on neural radiance fields and 3D gaussian splatting： a review

Lu Lihua; Zhang Xiaohui; Wei Hui; Li Ruyang; Du Guoguang; Wang Binqiang

doi:10.11834/jig.240589

Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Text-guided 3D editing based on neural radiance fields and 3D gaussian splatting： a review
Pages: 1-19(2025)
Published Online： 17 February 2025 ，

Accepted： 2025-02-13
DOI： 10.11834/jig.240589
稿件说明：

移动端阅览

卢丽华,张晓辉,魏辉等.基于神经辐射场和三维高斯泼溅的文本指导三维编辑综述[J].中国图象图形学报,

Lu Lihua,Zhang Xiaohui,Wei Hui,et al.Text-guided 3D editing based on neural radiance fields and 3D gaussian splatting： a review[J].Journal of Image and Graphics,
卢丽华,张晓辉,魏辉等.基于神经辐射场和三维高斯泼溅的文本指导三维编辑综述[J].中国图象图形学报, DOI： 10.11834/jig.240589.

Lu Lihua,Zhang Xiaohui,Wei Hui,et al.Text-guided 3D editing based on neural radiance fields and 3D gaussian splatting： a review[J].Journal of Image and Graphics, DOI： 10.11834/jig.240589.

摘要

文本引导的三维编辑可以根据目标文本的引导，改变现有三维资产的几何形状和外观，从而创建多样化和高质量的三维资产。近年来，先进三维神经表示、文本引导图像生成与编辑等一系列关键技术的出现和发展，推动了文本引导三维编辑的进步。本文主要聚焦于基于神经辐射场和三维高斯泼溅的文本指导三维编辑的最新进展，并从方法本质与编辑能力两个维度对现有研究进行梳理与总结。具体地，本文将现有研究按照编辑约束，分为无约束、隐式约束和显式约束三个类别，以深入剖析各方法本质。此外，本文还从编辑类型（如几何、外观）、编辑范围（如物体、场景）、编辑鲁棒性（如全局或局部可控性）等多个方面，对现有研究的编辑能力进行了探讨。最后，本文分析了当前研究所面临的挑战，并展望了未来潜在的研究方向。

Abstract

Artificial Intelligence Generated Content （AIGC）， which refers to the use of artificial intelligence technology to generate digital content such as text， images， videos and three-dimensional assets， has developed rapidly in the past few years， triggering a technological revolution. In the field of 3D AIGC， text-guided 3D editing is a direction with research significance and application value. According to the guidance of the target text， it can change the geometry and appearance of existing 3D assets， so as to create diversified and high-quality 3D assets. Compared with other guiding conditions （such as reference images， sketches， etc.）， the 3D content editing paradigm guided by natural language has the advantages of friendly interaction， high efficiency， and strong practicability， and has wide application potential in virtual/augmented reality， automatic driving， robots and other fields. In recent years， the emergence and development of a series of key technologies， such as advanced neural representation， generative models and text-guided image generation and editing， have promoted significant progress in text-guided 3D editing and achieved certain research results. However， editing 3D content with text guidance is still challenging. Different from the text-guided 3D generation task of generating 3D assets from zero， text-guided 3D editing edits the existing 3D assets and changes their geometric structure， appearance， etc.， to get a new asset that conforms to the description of the target text. In the process of 3D editing， the core problem is to ensure that the non-editing areas are not affected while completing the editing that meets the requirements of the target text. Secondly， it is difficult to correctly understand the target text and edit 3D assets that are semantically consistent with the target text， especially when the target text describes complex scenes including multiple objects and different attributes. In addition， the selection of 3D representations suitable for editing is complex， and both explicit representations （such as voxels， grids， etc.） and implicit representations （such as neural radiation fields， distance functions， etc.） have advantages and disadvantages in terms of representation ability and efficiency. Finally， the lack of a large dataset of text-3D assets and the inconsistency of multiple perspectives make text-guided 3D editing more challenging. In recent years， Neural Radiance Field and 3D Gaussian Splatting have been proposed. Due to their advantages such as continuity and high photorealistic rendering， significant progress has been made in the field of high-quality 3D reconstruction and rendering of scenes. With large pre-trained text-image alignment models， neural radiance fields were extended to text-guided 3D generation. Therefore， a simple way to implement text-guided 3D editing is to fine-tune the pre-trained text-guided 3D generation model， modify the geometry or appearance of the 3D asset， etc.， to make it meet the new target text description. Earlier methods supervised the adjustment of the neural radiance fields by Contrast Language-Image Pre-training loss to align it with the new target text. Recent methods mostly utilize Score Distillation Sampling loss optimization to edit neural radiance fields. However， this approach based on fine-tuning generation models can only change 3D assets globally and does not support fine-grained 3D editing. On the other hand， the emergence of large text-image datasets and pre-trained text-image alignment models has promoted the flourishing development of text-guided image editing techniques. Representative image editing techniques are introduced into 3D editing， which is a promising direction to solve the task of text-guided 3D editing. This editing paradigm avoids the need for text-3D data pairs by elevating 2D image editing to neural radiance fields， enabling key advances in text-guided 3D editing. Early methods conduct image editing on the image rendered by the existing 3D models to conform to the target text， and uses the edited image to reconstruct the target 3D models to complete 3D editing meeting the target text. Subsequent methods further improve the editing quality and efficiency through multi-view consistent editing and generalized editing. However， such methods rely on the ability of text to guide image editing， and can only use image editing to provide implicit constraints， without explicit control of the 3D editing process， which is not ideal for high-quality 3D editing. In order to achieve more accurate editing， recent research work has focused on introducing explicit editing constraints in the editing process， limiting 3D editing to the editable area， and avoiding unnecessary editing while meeting the requirements of the target text. These methods can automatically determine the editing region from the semantic correspondence between the target text and the image. These methods enable impressively high-quality 3D editing. In view of the significant advances mentioned above， it is necessary to systematically summarize and analyze the above literature for researchers interested in the field of text-guided 3D editing. This paper primarily focuses on the latest advancements in text-guided 3D editing based on neural radiance fields and 3D Gaussian splatting， summarizing existing research from both the aspects of methodological essence and editing capabilities. Specifically， this paper categorizes current research into three types according to their editing constraints： unconstrained， implicit constraints， and explicit constraints， in order to deeply analyze the essence of each method. In addition， the paper discusses the editing capabilities of these methods from various perspectives， including types of editing （such as geometry， and appearance）， scope of editing （such as objects， and scenes）， and editing robustness （such as global or local controllability）. Finally， the paper analyzes the challenges faced by current research and offers insights and prospects for potential future research directions. To sum up， the contributions of this paper are as follows：（1） The first review of text-guided 3D editing based on Neural Radiance Field and 3D Gaussian Splatting；（2） Effective classification criteria to summarize the existing research work from the essence of the methods；（3） On the basis of effective classification， the 3D editing capabilities of existing studies are discussed.

关键词

Keywords

references

Liu Anan ， Su Yuting ， Wang Lanjun ， Li Bin ， Qian Zhenxing ， Zhang Weiming ， Zhou Linna ， Zhang Xinpeng ， Zhang Yongdong ， Huang Jiwu ， Yu Nenghai . 2024 . Review on the progress of the AIGC visual content generation and traceability . Journal of Image and Graphics ， 29 （ 06 ）： 1535 - 1554

刘安安，苏育挺，王岚君，李斌，钱振兴，张卫明，周琳娜，张新鹏，张勇东，黄继武，俞能海， 2024 . AIGC 视觉内容生成与溯源研究进展［J］. 中国图象图形学报， 29 （ 6 ）： 1535 - 1554 ［ DOI： 10.11834/jig.240003 http://dx.doi.org/10.11834/jig.240003 ］

Haoyu Zhang ， Tianbao Wang ， Mengze Li ， Zhou Zhao ， Shiliang Pu ， Fei Wu . Comprehensive review of visual-language-oriented multimodal pre-training methods ［J］. Journal of Image and Graphics ， 2022 ， 27 （ 9 ）： 2652 - 2682

张浩宇，王天保，李孟择，赵洲，浦世亮，吴飞， 2022 . 视觉语言多模态预训练综述［J］. 中国图象图形学报， 27 （ 9 ）： 2652 - 2682 ［ DOI： 10.11834/jig.220173 http://dx.doi.org/10.11834/jig.220173 ］

Brooks T ， Holynski A ， Efros A A ， 2023b . InstructPix2Pix： Learning to Follow Image Editing Instructions ［C/OL］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， BC， Canada： IEEE： 18392 - 18402 ［ DOI： 10.1109/CVPR52729.2023.01764 http://dx.doi.org/10.1109/CVPR52729.2023.01764 ］

Chan E R ， Lin C Z ， Chan M A ， Nagano K ， Pan B ， De Mello S ， Gallo O ， Guibas L ， Tremblay J ， Khamis S ， Karras T ， Wetzstein G ， 2022 . Efficient Geometry-aware 3D Generative Adversarial Networks ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 16102 - 16112 ［ DOI： 10.1109/CVPR52688.2022.01565 http://dx.doi.org/10.1109/CVPR52688.2022.01565 ］

Chen A ， Xu Z ， Geiger A ， Yu J ， Su H ， 2022 . TensoRF： Tensorial Radiance Fields ［M/OL］//AVIDANS， BROSTOWG， CISSÉM， FARINELLAGM， HASSNERT. Computer Vision – ECCV 2022：卷 13692 . Cham： Springer Nature Switzerland： 333 - 350 ［ DOI： 10.1007/978-3-031-19824-3_20 http://dx.doi.org/10.1007/978-3-031-19824-3_20 ］

Chen M ， Xie J ， Laina I ， Vedaldi A ， 2024 . Shap-Editor： Instruction-guided Latent 3D Editing in Seconds ［C/OL］// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， WA， USA： IEEE： 26446 - 26456 ［ DOI： 10.1109/CVPR52733.2024.02498 http://dx.doi.org/10.1109/CVPR52733.2024.02498 ］

Chen Y ， Chen Z ， Zhang C ， Wang F ， Yang X ， Wang Y ， Cai Z ， Yang L ， Liu H ， Lin G ， 2024 . GaussianEditor： Swift and Controllable 3D Editing with Gaussian Splatting ［C/OL］// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， WA， USA： IEEE： 21476 - 21485 ［ DOI： 10.1109/CVPR52733.2024.02029 http://dx.doi.org/10.1109/CVPR52733.2024.02029 ］

Chen Y ， Shao G ， Shum K C ， Hua B S ， Yeung S K ， 2023 . Advances in 3D Neural Stylization： A Survey ［A/OL］. arXiv ［ 2023-12-07 ］. http：//arxiv.org/abs/2311.18328 http://arxiv.org/abs/2311.18328

Cheng X ， Yang T ， Wang J ， Li Y ， Zhang L ， Zhang J ， Yuan L ， 2023 . Progressive3D： Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts ［A/OL］. arXiv ［ 2023-10-23 ］. http：//arxiv.org/abs/2310.11784 http://arxiv.org/abs/2310.11784

Choudhary T ， Dewangan V ， Chandhok S ， Priyadarshan S ， Jain A ， Singh A K ， Srivastava S ， Jatavallabhula K M ， Krishna K M ， 2024 . Talk2BEV： Language-enhanced Bird’s-eye View Maps for Autonomous Driving ［C/OL］// 2024 IEEE International Conference on Robotics and Automation （ICRA） . Yokohama， Japan： IEEE： 16345 - 16352 ［ DOI： 10.1109/ICRA57147.2024.10611485 http://dx.doi.org/10.1109/ICRA57147.2024.10611485 ］

Dong J ， Wang Y X . ViCA-NeRF： View-Consistency-Aware 3D Editing of Neural Radiance Fields ［J］

Fang S ， Wang Y ， Yang Y ， Tsai Y H ， Ding W ， Zhou S ， Yang M H ， 2023 . DN2N-Editing 3D Scenes via Text Prompts without Retraining ［A/OL］. arXiv ［ 2023-12-05 ］. http：//arxiv.org/abs/2309.04917 http://arxiv.org/abs/2309.04917

Foo L G ， Rahmani H ， Liu J ， 2023 . AI-Generated Content （AIGC） for Various Data Modalities： A Survey ［A/OL］. arXiv ［ 2023-12-13 ］. http：//arxiv.org/abs/2308.14177 http://arxiv.org/abs/2308.14177

Fridovich-Keil S ， Yu A ， Tancik M ， Chen Q ， Recht B ， Kanazawa A ， 2022 . Plenoxels： Radiance Fields without Neural Networks ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 5491 - 5500 ［ DOI： 10.1109/CVPR52688.2022.00542 http://dx.doi.org/10.1109/CVPR52688.2022.00542 ］

Haque A ， Tancik M ， Efros A A ， Holynski A ， Kanazawa A ， 2023b . Instruct-NeRF2NeRF： Editing 3D Scenes with Instructions ［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France： IEEE： 19683 - 19693 ［ DOI： 10.1109/ICCV51070.2023.01808 http://dx.doi.org/10.1109/ICCV51070.2023.01808 ］

He R ， Huang S ， Nie X ， Hui T ， Liu L ， Dai J ， Han J ， Li G ， Liu S ， 2024 . Customize your NeRF： Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training ［C/OL］// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， WA， USA： IEEE： 6966 - 6975 ［ DOI： 10.1109/CVPR52733.2024.00665 http://dx.doi.org/10.1109/CVPR52733.2024.00665 ］

Ho J ， Jain A ， Abbeel P ， 2020 . Denoising Diffusion Probabilistic Models ［A/OL］. arXiv ［ 2024-09-29 ］. http：//arxiv.org/abs/2006.11239 http://arxiv.org/abs/2006.11239

Hyung J ， Hwang S ， Kim D ， Lee H ， Choo J ， 2023 . Local 3D Editing via 3D Distillation of CLIP Knowledge ［C/OL］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， BC， Canada： IEEE： 12674 - 12684 ［ DOI： 10.1109/CVPR52729.2023.01219 http://dx.doi.org/10.1109/CVPR52729.2023.01219 ］

Jun H ， Nichol A ， 2023 . Shap-E： Generating Conditional 3D Implicit Functions ［A/OL］. arXiv ［ 2023-09-06 ］. http：//arxiv.org/abs/2305.02463 http://arxiv.org/abs/2305.02463

Kamata H ， Sakuma Y ， Hayakawa A ， Ishii M ， Narihira T ， 2023 . Instruct 3D-to-3D： Text Instruction Guided 3D-to-3D conversion ［A/OL］. arXiv ［ 2023-09-11 ］. http：//arxiv.org/abs/2303.15780 http://arxiv.org/abs/2303.15780

Karim N ， Khalid U ， Iqbal H ， Hua J ， Chen C ， 2023 . Free-Editor： Zero-shot Text-driven 3D Scene Editing ［A/OL］. arXiv ［ 2023-12-25 ］. http：//arxiv.org/abs/2312.13663 http://arxiv.org/abs/2312.13663

Khalid U ， Iqbal H ， Karim N ， Hua J ， Chen C ， 2023 . LatentEditor： Text Driven Local Editing of 3D Scenes ［A/OL］. arXiv ［ 2023-12-22 ］. http：//arxiv.org/abs/2312.09313 http://arxiv.org/abs/2312.09313

Kirillov A ， Mintun E ， Ravi N ， Mao H ， Rolland C ， Gustafson L ， Xiao T ， Whitehead S ， Berg A C ， Lo W Y ， Dollár P ， Girshick R ， 2023 . Segment Anything ［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France： IEEE： 3992 - 4003 ［ DOI： 10.1109/ICCV51070.2023.00371 http://dx.doi.org/10.1109/ICCV51070.2023.00371 ］

Li X ， Zhang Q ， Kang D ， Cheng W ， Gao Y ， Zhang J ， Liang Z ， Liao J ， Cao Y P ， Shan Y ， 2024 . Advances in 3D Generation： A Survey ［A/OL］. arXiv ［ 2024-02-13 ］. http：//arxiv.org/abs/2401.17807 http://arxiv.org/abs/2401.17807

Lin J ， Li Z ， Tang X ， Liu J ， Liu S ， Liu J ， Lu Y ， Wu X ， Xu S ， Yan Y ， Yang W ， 2024 . VastGaussian： Vast 3D Gaussians for Large Scene Reconstruction ［A/OL］. arXiv ［ 2024-09-29 ］. http：//arxiv.org/abs/2402.17427 http://arxiv.org/abs/2402.17427

Mikaeili A ， Perel O ， Safaee M ， Cohen-Or D ， Mahdavi-Amiri A ， 2023 . SKED： Sketch-guided Text-based 3D Editing ［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France： IEEE： 14561 - 14573 ［ DOI： 10.1109/ICCV51070.2023.01343 http://dx.doi.org/10.1109/ICCV51070.2023.01343 ］

Mildenhall B ， Srinivasan P P ， Tancik M ， Barron J T ， Ramamoorthi R ， Ng R ， 2022 . NeRF： representing scenes as neural radiance fields for view synthesis ［J/OL］. Communications of the ACM ， 65 （ 1 ）： 99 - 106 ［ DOI： 10.1145/3503250 http://dx.doi.org/10.1145/3503250 ］

Mirzaei A ， Aumentado-Armstrong T ， Brubaker M A ， Kelly J ， Levinshtein A ， Derpanis K G ， Gilitschenski I ， 2023 . RMNE-Watch Your Steps： Local Image and Scene Editing by Text Instructions ［A/OL］. arXiv ［ 2023-12-05 ］. http：//arxiv.org/abs/2308.08947 http://arxiv.org/abs/2308.08947

Müller T ， Evans A ， Schied C ， Keller A ， 2022 . Instant Neural Graphics Primitives with a Multiresolution Hash Encoding ［J/OL］. ACM Transactions on Graphics ， 41 （ 4 ）： 1 - 15 ［ DOI： 10.1145/3528223.3530127 http://dx.doi.org/10.1145/3528223.3530127 ］

Nichol A ， Dhariwal P ， Ramesh A ， Shyam P ， Mishkin P ， McGrew B ， Sutskever I ， Chen M ， 2022 . GLIDE： Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models ［A/OL］. arXiv ［ 2024-04-09 ］. http：//arxiv.org/abs/2112.10741 http://arxiv.org/abs/2112.10741

Palandra F ， Sanchietti A ， Baieri D ， Rodolà E ， 2024 . GSEdit： Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting ［A/OL］. arXiv ［ 2024-03-19 ］. http：//arxiv.org/abs/2403.05154 http://arxiv.org/abs/2403.05154

Park J ， Kwon G ， Ye J C ， 2023 . ED-NeRF： Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF ［A/OL］. arXiv ［ 2023-12-07 ］. http：//arxiv.org/abs/2310.02712 http://arxiv.org/abs/2310.02712

Patashnik O ， Wu Z ， Shechtman E ， Cohen-Or D ， Lischinski D ， 2021 . StyleCLIP： Text-Driven Manipulation of StyleGAN Imagery ［C/OL］// 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， QC， Canada： IEEE： 2065 - 2074 ［ DOI： 10.1109/ICCV48922.2021.00209 http://dx.doi.org/10.1109/ICCV48922.2021.00209 ］

Poole B ， Jain A ， Barron J T ， Mildenhall B ， 2022 . DreamFusion： Text-to-3D using 2D Diffusion ［A/OL］. arXiv ［ 2022-12-02 ］. http：//arxiv.org/abs/2209.14988 http://arxiv.org/abs/2209.14988

Radford A ， Kim J W ， Hallacy C ， Ramesh A ， Goh G ， Agarwal S ， Sastry G ， Askell A ， Mishkin P ， Clark J ， Krueger G ， Sutskever I ， 2021 . Learning Transferable Visual Models From Natural Language Supervision ［A/OL］. arXiv ［ 2024-02-08 ］. http：//arxiv.org/abs/2103.00020 http://arxiv.org/abs/2103.00020

Raj A ， Kaza S ， Poole B ， Niemeyer M ， Ruiz N ， Mildenhall B ， Zada S ， Aberman K ， Rubinstein M ， Barron J ， Li Y ， Jampani V ， 2023 . DreamBooth3 D： Subject-Driven Text-to-3D Generation［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France ： IEEE： 2349 - 2359 ［ DOI： 10.1109/ICCV51070.2023.00223 http://dx.doi.org/10.1109/ICCV51070.2023.00223 ］

Reed S ， Akata Z ， Yan X ， Logeswaran L ， Schiele B ， Lee H ， 2016 . Generative Adversarial Text to Image Synthesis ［A/OL］. arXiv ［ 2024-04-09 ］. http：//arxiv.org/abs/1605.05396 http://arxiv.org/abs/1605.05396

Ren T ， Liu S ， Zeng A ， Lin J ， Li K ， Cao H ， Chen J ， Huang X ， Chen Y ， Yan F ， Zeng Z ， Zhang H ， Li F ， Yang J ， Li H ， Jiang Q ， Zhang L ， 2024 . Grounded SAM： Assembling Open-World Models for Diverse Visual Tasks ［A/OL］. arXiv ［ 2024-04-09 ］. http：//arxiv.org/abs/2401.14159 http://arxiv.org/abs/2401.14159

Rombach R ， Blattmann A ， Lorenz D ， Esser P ， Ommer B ， 2022 . High-Resolution Image Synthesis with Latent Diffusion Models ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 10674 - 10685 ［ DOI： 10.1109/CVPR52688.2022.01042 http://dx.doi.org/10.1109/CVPR52688.2022.01042 ］

Ruiz N ， Li Y ， Jampani V ， Pritch Y ， Rubinstein M ， Aberman K ， 2023b . DreamBooth： Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation ［C/OL］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， BC， Canada： IEEE： 22500 - 22510 ［ DOI： 10.1109/CVPR52729.2023.02155 http://dx.doi.org/10.1109/CVPR52729.2023.02155 ］

Saharia C ， Chan W ， Saxena S ， Li L ， Whang J ， Denton E ， Ghasemipour S K S ， Ayan B K ， Mahdavi S S ， Lopes R G ， Salimans T ， Ho J ， Fleet D J ， Norouzi M ， 2022 . Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding ［A/OL］. arXiv ［ 2024-04-09 ］. http：//arxiv.org/abs/2205.11487 http://arxiv.org/abs/2205.11487

Sanghi A ， Chu H ， Lambourne J G ， Wang Y ， Cheng C Y ， Fumero M ， Malekshan K R ， 2022 . CLIP-Forge： Towards Zero-Shot Text-to-Shape Generation ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 18582 - 18592 ［ DOI： 10.1109/CVPR52688.2022.01805 http://dx.doi.org/10.1109/CVPR52688.2022.01805 ］

Sella E ， Fiebelman G ， Hedman P ， Averbuch-Elor H ， 2023b . Vox-E： Text-guided Voxel Editing of 3D Objects ［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France： IEEE： 430 - 440 ［ DOI： 10.1109/ICCV51070.2023.00046 http://dx.doi.org/10.1109/ICCV51070.2023.00046 ］

Song L ， Cao L ， Gu J ， Jiang Y ， Yuan J ， Tang H ， 2023 . Efficient-NeRF2NeRF： Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models ［A/OL］. arXiv ［ 2023-12-18 ］. http：//arxiv.org/abs/2312.08563 http://arxiv.org/abs/2312.08563

Sun C ， Sun M ， Chen H T ， 2022 . Direct Voxel Grid Optimization： Super-fast Convergence for Radiance Fields Reconstruction ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 5449 - 5459 ［ DOI： 10.1109/CVPR52688.2022.00538 http://dx.doi.org/10.1109/CVPR52688.2022.00538 ］

Tang J ， Ren J ， Zhou H ， Liu Z ， Zeng G ， 2023 . DreamGaussian： Generative Gaussian Splatting for Efficient 3D Content Creation ［A/OL］. arXiv ［ 2023-12-18 ］. http：//arxiv.org/abs/2309.16653 http://arxiv.org/abs/2309.16653

Tseng K W ， Huang J Y ， Chen Y S ， Chen C S ， Hung Y P ， 2022 . Pseudo-3D Scene Modeling for Virtual Reality Using Stylized Novel View Synthesis ［C/OL］// ACM SIGGRAPH 2022 Posters . Vancouver BC Canada： ACM： 1 - 2 ［ DOI： 10.1145/3532719.3543232 http://dx.doi.org/10.1145/3532719.3543232 ］

Wang C ， Chai M ， He M ， Chen D ， Liao J ， 2022 . CLIP-NeRF： Text-and-Image Driven Manipulation of Neural Radiance Fields ［C/OL］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， LA， USA： IEEE： 3825 - 3834 ［ DOI： 10.1109/CVPR52688.2022.00381 http://dx.doi.org/10.1109/CVPR52688.2022.00381 ］

Wang C ， Jiang R ， Chai M ， He M ， Chen D ， Liao J ， 2024 . NeRF-Art： Text-Driven Neural Radiance Fields Stylization ［J/OL］. IEEE Transactions on Visualization and Computer Graphics ， 30 （ 8 ）： 4983 - 4996 ［ DOI： 10.1109/TVCG.2023.3283400 http://dx.doi.org/10.1109/TVCG.2023.3283400 ］

Wang Y ， Yi X ， Wu Z ， Zhao N ， Chen L ， Zhang H ， 2024 . VcEdit-View-Consistent 3D Editing with Gaussian Splatting ［A/OL］. arXiv ［ 2024-03-27 ］. http：//arxiv.org/abs/2403.11868 http://arxiv.org/abs/2403.11868

Wu J ， Bian J W ， Li X ， Wang G ， Reid I ， Torr P ， Prisacariu V A ， 2024 . GaussCtrl： Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing ［A/OL］. arXiv ［ 2024-03-19 ］. http：//arxiv.org/abs/2403.08733 http://arxiv.org/abs/2403.08733

Yu A ， Ye V ， Tancik M ， Kanazawa A ， 2021 . pixelNeRF： Neural Radiance Fields from One or Few Images ［A/OL］. arXiv ［ 2022-10-27 ］. http：//arxiv.org/abs/2012.02190 http://arxiv.org/abs/2012.02190

Zhang B ， Cheng Y ， Yang J ， Wang C ， Zhao F ， Tang Y ， Chen D ， Guo B ， 2024 . GaussianCube： A Structured and Explicit Radiance Representation for 3D Generative Modeling ［A/OL］. arXiv ［ 2024-09-29 ］. http：//arxiv.org/abs/2403.19655 http://arxiv.org/abs/2403.19655

Zhang L ， Rao A ， Agrawala M ， 2023b . Adding Conditional Control to Text-to-Image Diffusion Models ［C/OL］// 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France： IEEE： 3813 - 3824 ［ DOI： 10.1109/ICCV51070.2023.00355 http://dx.doi.org/10.1109/ICCV51070.2023.00355 ］

Zhou X ， He Y ， Yu F R ， Li J ， Li Y ， 2023 . RePaint-NeRF： NeRF Editting via Semantic Masks and Diffusion Models ［C/OL］// Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence . Macau， SAR China： International Joint Conferences on Artificial Intelligence Organization： 1813 - 1821 ［ DOI： 10.24963/ijcai.2023/201 http://dx.doi.org/10.24963/ijcai.2023/201 ］

Zhuang J ， Kang D ， Cao Y P ， Li G ， Lin L ， Shan Y ， 2024 . TIP-Editor： An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts ［J/OL］. ACM Transactions on Graphics ， 43 （ 4 ）： 1 - 12 ［ DOI： 10.1145/3658205 http://dx.doi.org/10.1145/3658205 ］

Zhuang J ， Wang C ， Lin L ， Liu L ， Li G ， 2023 . DreamEditor： Text-Driven 3D Scene Editing with Neural Fields ［C/OL］// SIGGRAPH Asia 2023 Conference Papers . Sydney NSW Australia： ACM： 1 - 10 ［ DOI： 10.1145/3610548.3618190 http://dx.doi.org/10.1145/3610548.3618190 ］

Alert me when the article has been cited

提交

Multi-view intrinsic decomposition of indoor scenes under a 3D Gaussian splatting framework