Current Issue Cover

曾志鸿(中国科学院大学 空天信息创新研究院)

摘 要
目的 基于神经辐射场的3D场景重建与新视角生成工作正受到研究者的广泛重视,然而现有的神经辐射场方法通常对给定的场景进行高度专门化的表征,且将场景的几何与外观表征为“混合场”,这对场景的几何与外观编辑、场景泛化和3D资源的使用造成了不便。方法 本文提出了一个学习对象本征属性的神经辐射场分类网络,通过图像增强的方式去除高光和阴影,并使用分类的方式实现颜色分解,即从现实场景中提取室内场景语义级目标的本征属性,在此基础上进行神经辐射场的重建。本方法提出了前点优胜模块与颜色分类模块。前点优胜模块在体渲染阶段优化射线代表的本征属性,从而提升神经辐射场的语义一致性;颜色分类模块在辐射场重建阶段,通过全连接网络进行本征属性的分类优化,提高辐射场的语义及视角间一致性。两个主要模块共同作用,使重建的辐射场具备良好的针对外观的泛化能力,可支持场景重上色、重光照以及针对阴影与高光的编辑等任务。结果 相比于现有的基于神经辐射场的学习进行本征分解的Intrinsic NeRF方法,在Replica数据集中的充分实验表明,在有限的GPU显存和运行时间下,本方法重建的本征属性神经辐射场具备语义及视角间一致性。针对提升语义一致性的前点优胜模块,本方法在基线模型Semantic NeRF的基础上提高了4.1%,在未加入该模块的基础上提高了 3.9%。针对提升本征分解语义及视角间一致性的颜色分类模块,本方法在Intrinsic NeRF的本征分解工作基础上提升了10.2%,在未加入颜色分类层的基础上提升了1.7%。结论 实验证明,本方法构建的本征属性神经辐射场具备语义及视角间一致性,可描述复杂场景几何关系且具备良好外观泛化性。在场景重上色、重光照、阴影与高光的编辑等任务中取得了视角间一致的逼真效果。
A neural radiance field reconstruction method based on intrinsic attribute classification network


Objective Reconstruction of indoor and outdoor 3D scenes and placement of 3D resources in the real world is an important development direction in the field of computer vision. Early researchers use voxel, occupancy, grid and other computer graphics representation methods to achieve good results in storage and rendering efficiency in a variety of mature application areas. However, these methods require time-consuming and laborious manual modeling, experienced modelers and a lot of time and energy. The novel implicit field representation method also has good computational and storage performance. Mostly, it simplifies the time-consuming, laborious and difficult modeling process, thus greatly enhances the application prospect in the 3D scene reconstruction field. By invoking and calculating the implicit field representation, researchers can obtain a realistic scene end-to-end, eliminating the complicated process of traditional modeling. Neural Radiance Field (NeRF) is the most popular implicit field representation method. Compared with other implicit field methods, neural implicit field is known for its simplicity and ease of use, but its problems still exist and root in the defects of implicit field itself. Implicit field is a multidimensional function defined on spatial and directional coordinates, which codifies the geometry of the scene together with the appearance color, resulting in the entangled representation of the independent attributes of the target, causing inconvenience to the application of the 3D resources. An important direction regarding implicit fields is "disentanglement" between geometry and appearance. These solution lies in intrinsic decomposition. Firstly, the intrinsic decomposition uses some physical priors to avoid the initialization of complex networks. Secondly, it preprocesses the image into a albedo image independent of the observation direction and a shading image dependent on the that. Intrinsic NeRF was the first one who tried to apply intrinsic decomposition methods in NeRF, but the decomposition they have used could not produce more reasonable appearance editing results. Method In this study, a neural radiance field classification network is proposed to learn the intrinsic properties of objects and target characteristics. It separates the specular factors from 2D images by image enhancement method, and extracts the intrinsic colors (perform intrinsic decomposition) through classification method, then gives shading maps and direct illuminations of semantic-level objects in the scenes by intrinsic decomposition expression. On this basis, the neural radiance field is learned, the semantic consistency is provided with the help of "the front-point dominance module", which is a module from volume rendering stage that optimizes the albedo by "front points". The consistency between views of the scene is provided with the help of "the color classification layer module", which is a fully-connected neural network from reconstruction stage that fixes the albedo between different perspectives. Finally, a neural radiation field representing the intrinsic properties of the scene is reconstructed. After the rays are obtained by the internal and external parameters of the image, the position of the sampling points and the direction of the rays are calculated in the neural network, producing the corresponding 3D properties. On the embedding layer, the position and direction are embedded and transformed into high-dimensional embedding features, which is the input of the network. After the 8-layer fully connected multi-layer perceptrons (MLP), the network will output 1-dimensional volume density, 256-dimensional feature vector and N-dimensional semantic vector (where N is the number of semantic classes). Then, the 256-dimensional feature vectors are input into each 1-layer full-connected network to obtain color, shading map and direct illumination respectively. In the inference stage, the model uses Monte Carlo integrals method to transform properties based on sampling points into properties based on rays (i.e., pixels), resulting in a synthetic result of the novel view. The model can disentangle attributes independent of and dependent on the observer. The result albedo output has semantic and multi-view consistency independent of the observation direction. The implicit field shows good generalization for appearance, and supports scene re-coloring, re-lighting, and editing for shadows and specular factors. Result Compared with the existing Intrinsic NeRF method for intrinsic decomposition based on neural radiance field learning, experiments on the Replica dataset show that under limited GPU memory and running time, our work can obtain the intrinsic decomposition results with semantic and multi-view consistency. For the “front-point dominance module” which ensured semantic consistency, our work improved 4.1% compared to Semantic NeRF. Ablation study illustrated the improvement as 3.9% compared to baseline model. For the “color classification layer module” that improves the semantic multi-view consistency, our work improved by 10.2% compared to Intrinsic NeRF"s intrinsic decomposition mothod and 1.7% compared to baseline model. Conclusion In this study, we proposed a novel neural radiance field classification network learning the intrinsic properties of objects and target characteristics. Experiments show that the work in this paper can produce the intrinsic decomposition results with semantic and multi-view consistency. Moreover, an implicit field of albedo classification is constructed, which can describe the geometric relationship of complex scenes and shows good generalization for appearance. Realistic and multi-view-consistent effects are achieved in tasks of scene re-coloring, re-lighting, and shadow and specular factor editing.