Current Issue Cover
室内稀疏全景图的神经辐射场重建

肖强, 陈铭林, 张晔, 黄小红(中山大学电子与通信工程学院)

摘 要
目的 神经辐射场可以为数字人、交互游戏等虚拟现实应用提供沉浸式环境。然而,现有神经辐射场算法往往依赖大量位置的全景图进行大规模室内场景重建,在稀疏全景图条件下的重建效果不佳。为此,本文提出了一种用于室内稀疏全景图的神经辐射场重建算法,以实现低开销、高质量的室内新视角合成。方法 针对稀疏输入问题,本算法首先设计了深度监督策略,以分配更多的采样点在物体表面,从而获取更精细的几何重建结构。然后,本算法引入了未观测视角下的射线形变损失来增强射线约束,从而有效提升了稀疏输入下的室内场景重建质量。结果 本文算法在多个室内全景数据集上与新近神经辐射场重建算法进行了比较。在两张Replica数据集全景图输入条件下,本文算法在PSNR指标上比基准算法提升了6%。即使在单张PNVS数据集全景图输入条件下,本文算法在PSNR指标上也比基准算法提升了11%。结论 实验结果表明,本文提出的神经辐射场重建算法能够从稀疏室内全景图中重建高质量场景,实现高真实感的新视角合成。
关键词
Neural radiance fields reconstruction for sparse indoor panoramas

xiao qiang, chen minglin, zhang ye, huang xiaohong(School of Electronics and Communication Engineering,Sun Yat-Sen University)

Abstract
Objective Neural radiance fields (NeRF) are a crucial technology for creating immersive environments in various applications, including digital human simulations, interactive gaming, and virtual reality property tours. These applications benefit significantly from the highly realistic rendering capabilities of NeRF, which can generate detailed and interactive 3D spaces. However, the reconstruction of NeRF typically necessitates a dense set of multi-view images of indoor scenes, which can be difficult to obtain. Current algorithms that address sparse image inputs often fall short in reconstructing indoor scenes accurately, leading to less than optimal results. To overcome these challenges, we introduce a novel NeRF reconstruction algorithm specifically designed for sparse indoor panoramas. This algorithm aims to enhance the reconstruction process by improving the allocation of sampling points and refining the geometric structure, even with limited image data. By doing so, it promises to enable the creation of high-quality, realistic virtual environments that can be synthesized from sparsely available indoor panoramas, thus advancing the potential applications of NeRF in various fields. Method Firstly, Our algorithm implements a distortion-aware sampling strategy during the ray sampling phase, which is specifically designed to focus on regions of lower latitude. This strategic approach ensures that more rays are sampled from the central areas of the panorama, which are typically richer in visual information and less distorted compared to the peripheral regions. By concentrating on these areas, we are able to achieve a marked improvement in rendering quality, as the algorithm can better capture the essential features and details of the scene. To further enhance the reconstruction process, especially when dealing with sparse image inputs, we employ a panoramic depth estimation network. This network is tasked with generating a depth map that provides crucial information about the spatial arrangement of objects within the scene. With the estimated depth map, our algorithm incorporates a depth sampling auxiliary strategy and a depth loss supervision strategy. These strategies work in tandem to guide the network"s learning process, with the depth sampling strategy allocating a significant portion of the sampling points in a gaussian distribution around the estimated depth. This targeted approach allows the network to gain a more nuanced understanding of the object surfaces, which is essential for accurate scene reconstruction. During the testing phase, our algorithm adopts a coarse-to-fine sampling strategy that aligns with the principles of neural radiance fields. This methodical approach ensures that the network can progressively refine its understanding of the scene, starting with a broad overview and gradually zooming in on finer details. To ensure that both color and depth accuracy are maintained throughout the training process, we have integrated a depth loss function. This function effectively limits the variance of the sampling point distribution, resulting in a more focused and accurate rendering of the scene. Additionally, we tackle the issue of artifacts and improved geometry by introducing distortion loss for unobserved viewpoints. This loss function effectively constrains the distribution of unobserved rays in space, resulting in more realistic and visually pleasing renderings. Moreover, to address the issue of low rendering speed in neural rendering, we have developed a real-time neural rendering algorithm that is divided into two distinct stages. The first stage involves partitioning the bounding box of the scene into a series of octree grids, with each grid"s density determined by its spatial location. This process allows us to efficiently manage the complexity of the scene, ensuring that the rendering process is optimized for both speed and quality. By further screening these grids, we can identify the octree leaf nodes, which are essential for reducing video memory consumption and improving overall performance.In the second stage, our algorithm leverages the network to predict the color values of the leaf nodes from various viewing directions. Spherical harmonics are employed to accurately fit the colors, ensuring that the rendered scene is both vibrant and true to life. By caching the network model as an octree structure, we enable real-time rendering, which is crucial for applications that demand a seamless and immersive experience. This approach not only significantly improves rendering speed but also maintains the high-quality results that are essential for creating realistic virtual environments. Result We evaluate the effectiveness of our proposed algorithm on three panoramic datasets, including two synthetic datasets (i.e., the Replica and PNVS datasets) and one real dataset (i.e., the WanHuaTong dataset). This diverse selection of datasets allows for a thorough assessment of the algorithm"s performance across various conditions and complexities. The outcomes of our evaluation clearly illustrate the effectiveness of our algorithm, revealing its superiority over existing reconstruction methods.Specifically, when tested on the Replica dataset with only two panoramic images as input, our algorithm exhibits a significant leap over the current state-of-the-art DDPNeRF algorithm. It achieves a 6% improvement in Peak Signal-to-Noise Ratio (PSNR) and an 8% reduction in Root Mean Square Error (RMSE), indicators that reflect enhanced image quality and accuracy. Moreover, our algorithm demonstrates an impressive rendering speed of 70 frames per second (FPS) on the WanHuaTong dataset, underscoring its capability to handle real-depth data with equal proficiency.The algorithm"s adaptability is further highlighted in scenarios where panoramic images present challenges such as top cropping and partial depth occlusion. Despite these obstacles, our method effectively recovers complete depth information, showcasing its robustness and reliability in practical applications. Conclusion We proposed a neural radiance fields reconstruction algorithm for sparse indoor panoramas, enabling highly realistic rendering from any viewpoints within the scene. By implementing a panoramic-based ray sampling strategy and depth supervision, the algorithm enhances the geometric reconstruction quality by focusing on object surfaces. Additionally, it incorporates a deformation loss for unobserved viewpoints, which strengthens ray constraints and elevates reconstruction quality under sparse input conditions. Experimental validation on various panoramic datasets demonstrates that our algorithm outperforms current techniques in terms of color and geometry metrics. This leads to the creation of highly realistic novel views and supports real-time rendering, with potential applications in indoor navigation, VR house viewing, mixed reality games, and digital human scene synthesis.
Keywords

订阅号|日报