Neural radiance field reconstruction for sparse indoor panoramas

Xiao Qiang; Chen Minglin; Zhang Ye; Huang Xiaohong

doi:10.11834/jig.230643

Modeling, Generation and Rendering of Digital Human | Views : 0 下载量: 5 CSCD: 0

PDF
Export
Share
Collection
Album

Neural radiance field reconstruction for sparse indoor panoramas
Vol. 29, Issue 9, Pages: 2596-2609(2024)
Published： 16 September 2024 ，
DOI： 10.11834/jig.230643
稿件说明：

移动端阅览

肖强，陈铭林，张晔，黄小红. 2024. 室内稀疏全景图的神经辐射场重建. 中国图象图形学报， 29(09):2596-2609

Xiao Qiang， Chen Minglin， Zhang Ye， Huang Xiaohong. 2024. Neural radiance field reconstruction for sparse indoor panoramas. Journal of Image and Graphics， 29(09):2596-2609
肖强，陈铭林，张晔，黄小红. 2024. 室内稀疏全景图的神经辐射场重建. 中国图象图形学报， 29(09):2596-2609 DOI： 10.11834/jig.230643.

Xiao Qiang， Chen Minglin， Zhang Ye， Huang Xiaohong. 2024. Neural radiance field reconstruction for sparse indoor panoramas. Journal of Image and Graphics， 29(09):2596-2609 DOI： 10.11834/jig.230643.

摘要

目的

神经辐射场（NeRF）可以为数字人、交互游戏等虚拟现实应用提供沉浸式环境。然而，现有神经辐射场算法往往依赖大量位置的全景图进行大规模室内场景重建，在稀疏全景图条件下的重建效果不佳。为此，提出了一种用于室内稀疏全景图的神经辐射场重建算法，以实现低开销、高质量的室内新视角合成。

方法

针对稀疏输入问题，本文首先设计了深度监督策略，以分配更多的采样点在物体表面，从而获取更精细的几何重建结构。然后，引入了未观测视角下的射线形变损失来增强射线约束，从而有效提升了稀疏输入下的室内场景重建质量。

结果

本文算法在多个室内全景数据集上与较新的神经辐射场重建算法进行了比较。在两幅Replica数据集全景图输入条件下，本文算法在峰值信噪比（peak signal-to-noise ratio，PSNR）指标上比基准算法提升了6%。即使在单幅PNVS数据集全景图输入条件下，本文算法在PSNR指标上也比基准算法提升了11%。

结论

实验结果表明，本文提出的神经辐射场重建算法能够从稀疏室内全景图中重建高质量场景，实现高真实感的新视角合成。

Abstract

Objective

Neural radiance field （NeRF） account for a crucial technology used in the creation of immersive environments in various applications， including digital human simulations， interactive gaming， and virtual-reality property tours. These applications benefit considerably from the highly realistic rendering capabilities of NeRF， which can generate detailed and interactive 3D spaces. However， the reconstruction of NeRF typically necessitates to a dense set of multiview images of indoor scenes， which can be difficult to obtain. Current algorithms that address sparse image inputs often perform poorly in the accurate reconstruction of indoor scenes， which leads to less than optimal results. To overcome these challenges， we introduced a novel NeRF reconstruction algorithm specifically designed for sparse indoor panoramas. This algorithm enhances the reconstruction process via the improved allocation of sampling points and refinement of the geometric structure regardless of limited image data. In this manner， high-quality， realistic virtual environments can be synthesized from sparsely available indoor panoramas， which advances the potential applications of NeRF in various fields.

Method

Initially， our algorithm， which is specifically designed to focused on regions of lower latitude， implements a distortion-aware sampling strategy during the ray sampling phase. This strategic approach ensures the sampling of more rays from the central areas of the panorama， which are typically richer in terms of visual information and less distorted compared with the peripheral regions. Concentration on these areas we can attain a marked improvement in rendering quality because the algorithm can better capture the essential features and details of the scene. For the further improvement of the reconstruction process， especially in the case sparse image inputs， a panoramic depth estimation network was employed. This network generates a depth map that provides crucial information on the spatial arrangement of objects within a scene. With the estimated depth map， our algorithm incorporates a depth sampling auxiliary strategy and a depth loss supervision strategy. These strategies work in tandem to guide the learning process of the network. The depth sampling strategy allocated a considerable portion of the sampling points in a Gaussian distribution around the estimated depth. This targeted approach enables the network to further comprehension of nuanced understanding of object surfaces， which is essential for accurate scene reconstruction. During the testing phase， our algorithm adopted a coarse-to-fine sampling strategy that aligns with the principles of NeRF. This methodical approach ensures that the network can progressively refine its understanding of the scene， starting with a broad overview and gradually through zooming in on finer details. To maintain the color and depth accuracy throughout the training process， we integrated a depth loss function during the training phase. This function effectively limits the variance of sampling point distribution， which results in a focused and accurate rendering of the scene. In addition， we tackled the issue involving artifacts and improved geometry through the introduction of distortion loss for unobserved viewpoints. This loss function effectively constrains the distribution of unobserved rays in space， which results in realistic and visually pleasing renderings. Moreover， to address the low rendering speed in neural rendering， we developed a real-time neural rendering algorithm with two distinct stages. The first stage involves partitions the bounding box of the scene into a series of octree grids， with each grid’s density determined via its spatial location. This process enables the efficient management of the complexity of the scene， which ensures that the rendering process is optimized for speed and quality. Further screening these grids leads to the identification of octree leaf nodes， which are essential for reducing memory consumption and improving the performance. In the second stage， our algorithm leverages the network to predict the color values of leaf nodes from various viewing directions. Spherical harmonics are employed to accurately fit the colors， which guaranteed that the rendered scene is vibrant and true to life. By caching the network model as an octree structure， we enable real-time rendering， which is crucial for applications that demand a seamless and immersive experience. This approach not only substantially improves the rendering speed but also maintains high-quality results， which are essential for the creation of realistic virtual environments.

Result

We evaluated the effectiveness of our proposed algorithm on three panoramic datasets， including two synthetic datasets （i.e.， Replica and PNVS datasets） and one real dataset （i.e.， WanHuaTong dataset）. This diverse selection of datasets allowed for a thorough assessment of the algorithm’s performance under various conditions and complexities. Our evaluation outcomes illustrate the effectiveness of our algorithm and its superiority over existing reconstruction methods. Specifically， when tested on a Replica dataset with two panoramic images as input， our algorithm exhibited a considerable leap over the current state-of-the-art dense depth priors for NeRF （DDP-NeRF） algorithm. In addition， it achieved a 6% improvement in peak signal-to-noise ratio （PSNR） and an 8% reduction in root mean square error， which reflect improvements in image quality and accuracy. Moreover， our algorithm demonstrated an impressive rendering speed of 70 frames per second on the WanHuaTong dataset， which underscores its capability to handle real depth data with equal proficiency. The algorithm’s adaptability is further highlighted in scenarios with challenging panoramic images， such as top cropping and partial depth occlusion. Despite these obstacles， our method effectively recovered complete depth information， which showcases its robustness and reliability in practical applications.

Conclusion

We proposed a NeRF reconstruction algorithm for sparse indoor panoramas， which enables highly realistic rendering from any viewpoints within the scene. Through the implementation of a panoramic-based ray sampling strategy and depth supervision， the algorithm improved the geometric reconstruction quality by focusing on object surfaces. In addition， it incorporated a deformation loss for unobserved viewpoints， which strengthened ray constraints and elevated the reconstruction quality under sparse input conditions. Experimental validation of different panoramic datasets demonstrated that our algorithm outperforms current techniques in terms of color and geometry metrics. This condition leads to the creation of highly realistic novel views and supports of real-time rendering， with potential applications in indoor navigation， virtual reality house viewing， mixed-reality games， and digital human scene synthesis.

关键词

神经辐射场（NeRF）重建稀疏输入全景图新视角合成虚拟现实数字人

Keywords

neural radiance field （NeRF） reconstructionsparse inputpanoramanovel view synthesisvirtual realitydigital human

references

Attal B， Ling S， Gokaslan A， Richardt C and Tompkin J. 2020. MatryODShka： real-time 6DoF video view synthesis using multi-sphere images//Proceedings of the 16th European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 441-459 ［DOI： 10.1007/978-3-030-58452-8_26http://dx.doi.org/10.1007/978-3-030-58452-8_26］

Barron J T， Mildenhall B， Tancik M， Hedman P， Martin-Brualla R and Srinivasan P P. 2021. Mip-NeRF： a multiscale representation for anti-aliasing neural radiance fields//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 5835-5844 ［DOI： 10.1109/ICCV48922.2021.00580http://dx.doi.org/10.1109/ICCV48922.2021.00580］

Barron J T， Mildenhall B， Verbin D， Srinivasan P P and Hedman P. 2022. Mip-NeRF 360： unbounded anti-aliased neural radiance fields//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 5460-5469 ［DOI： 10.1109/CVPR52688.2022.00539http://dx.doi.org/10.1109/CVPR52688.2022.00539］

Chang A， Dai A， Funkhouser T， Halber M， Niebner M， Savva M， Song S R， Zeng A and Zhang Y D. 2017. Matterport3D： learning from RGB-D data in indoor environments//Proceedings of 2017 International Conference on 3D Vision （3DV）. Qingdao， China： IEEE： 667-676 ［DOI： 10.1109/3dv.2017.00081http://dx.doi.org/10.1109/3dv.2017.00081］

Chen A P， Xu Z X， Zhao F Q， Zhang X S， Xiang F B， Yu J Y and Su H. 2021a. MVSNeRF： fast generalizable radiance field reconstruction from multi-view stereo//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 14104-14113 ［DOI： 10.1109/ICCV48922.2021.01386http://dx.doi.org/10.1109/ICCV48922.2021.01386］

Chen H X， Li K H， Fu Z H， Liu M Y， Chen Z H and Guo Y L. 2021b. Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Processing Letters， 28： 334-338 ［DOI： 10.1109/LSP.2021.3050712http://dx.doi.org/10.1109/LSP.2021.3050712］

Deng K L， Liu A， Zhu J Y and Ramanan D. 2022. Depth-supervised NeRF： fewer views and faster training for free//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12872-12881 ［DOI： 10.1109/CVPR52688.2022.01254http://dx.doi.org/10.1109/CVPR52688.2022.01254］

Hsu C Y， Sun C and Chen H T. 2021. Moving in a 360 world： synthesizing panoramic parallaxes from a single panorama ［EB/OL］. ［2023-09-04］. https://arxiv.org/pdf/2106.10859. 2021.pdfhttps://arxiv.org/pdf/2106.10859.2021.pdf

Jain A， Tancik M and Abbeel P. 2021. Putting NeRF on a diet： semantically consistent few-shot view synthesis//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 5865-5874 ［DOI： 10.1109/ICCV48922.2021.00583http://dx.doi.org/10.1109/ICCV48922.2021.00583］

Kellnhofer P， Jebe L C， Jones A， Spicer R， Pulli K and Wetzstein G. 2021. Neural lumigraph rendering//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 4285-4295 ［DOI： 10.1109/CVPR46437.2021.00427http://dx.doi.org/10.1109/CVPR46437.2021.00427］

Kim M， Seo S and Han B. 2022. InfoNeRF： ray entropy minimization for few-shot neural volume rendering//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12902-12911 ［DOI： 10.1109/CVPR52688.2022.01257http://dx.doi.org/10.1109/CVPR52688.2022.01257］

Kulkarni S， Yin P and Scherer S. 2023. 360FusionNeRF： panoramic neural radiance fields with joint guidance//Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Detroit， USA： IEEE： 7202-7209 ［DOI： 10.1109/IROS55552.2023.10341346http://dx.doi.org/10.1109/IROS55552.2023.10341346］

Li D， Zhang Y D， Häne C， Tang D H， Varshney A and Du R F. 2022a. OmniSyn： synthesizing 360 videos with wide-baseline panoramas//Proceedings of 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops （VRW）. Christchurch， New Zealand： IEEE： 670-671 ［DOI： 10.1109/VRW55335.2022.00186http://dx.doi.org/10.1109/VRW55335.2022.00186］

Li M， Wang S B， Yuan W H， Shen W C， Sheng Z and Dong Z L. 2023. S2Net： accurate panorama depth estimation on spherical surface. IEEE Robotics and Automation Letters， 8（2）： 1053-1060 ［DOI： 10.1109/LRA.2023.3234820http://dx.doi.org/10.1109/LRA.2023.3234820］

Lin K E， Xu Z X， Mildenhall B， Srinivasan P P， Hold-Geoffroy Y， Diverdi S， Sun Q， Sunkavalli K and Ramamoorthi R. 2020. Deep multi depth panoramas for view synthesis//Proceedings of the 16th European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 328-344 ［DOI： 10.1007/978-3-030-58601-0_20http://dx.doi.org/10.1007/978-3-030-58601-0_20］

Liu LJ， Gu JT， Lin KZ， Chua TS and Theobalt C. 2020. Neural sparse voxel fields//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： #1313

Long X X， Cheng X J， Zhu H， Zhang P J， Liu H M， Li J， Zheng L T， Hu Q Y， Liu H， Cao X， Yang R G， Wu Y H， Zhang G F， Liu Y B， Xu K， Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics， 26（6）： 1389-1428

龙霄潇，程新景，朱昊，张朋举，刘浩敏，李俊，郑林涛，胡庆拥，刘浩，曹汛，杨睿刚，吴毅红，章国锋，刘烨斌，徐凯，郭裕兰，陈宝权. 2021. 三维视觉前沿进展. 中国图象图形学报， 26（6）： 1389-1428 ［DOI： 10.11834/jig.210043http://dx.doi.org/10.11834/jig.210043］

Mildenhall B， Srinivasan P P， Tancik M， Barron J T， Ramamoorthi R and Ng R. 2020. NeRF： representing scenes as neural radiance fields for view synthesis//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 405-421 ［DOI： 10.1007/978-3-030-58452-8_24http://dx.doi.org/10.1007/978-3-030-58452-8_24］

Niemeyer M， Barron J T， Mildenhall B， Sajjadi M S M， Geiger A and Radwan N. 2022. RegNeRF： regularizing neural radiance fields for view synthesis from sparse inputs//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 5470-5480 ［DOI： 10.1109/CVPR52688.2022.00540http://dx.doi.org/10.1109/CVPR52688.2022.00540］

Otonari T， Ikehata S and Aizawa K. 2022. Non-uniform sampling strategies for NeRF on 360° images ［EB/OL］. ［2023-09-04］. https://arxiv.org/pdf/2212.03635v1.pdfhttps://arxiv.org/pdf/2212.03635v1.pdf

Pittaluga F， Koppal S J， Kang S B and Sinha S N. 2019. Revealing scenes by inverting structure from motion reconstructions//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 145-154 ［DOI： 10.1109/CVPR.2019.00023http://dx.doi.org/10.1109/CVPR.2019.00023］

Roessle B， Barron J T， Mildenhall B， Srinivasan P P and Niebner M. 2022. Dense depth priors for neural radiance fields from sparse input views//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12882-12891 ［DOI： 10.1109/CVPR52688.2022.01255http://dx.doi.org/10.1109/CVPR52688.2022.01255］

Straub J， Whelan T， Ma LN， Chen YF， Wijmans E， Green S， Engel JJ， Mur-Artal R， Ren C， Verma S， Clarkson A， Yan MF， Budge B， Yan YJ， Pan XQ， Yon J， Zou YY， Leon K， Carter N， Briales J， Gillingham T， Mueggler E， Pesqueira L， Savva M， Batra D， Strasdat HM， De Nardi R， Goesele M， Lovegrove S and Newcombe R. 2019. The replica dataset： a digital Replica of indoor spaces ［EB/OL］. ［2023-09-04］. https://arxiv.org/pdf/1906.05797.pdfhttps://arxiv.org/pdf/1906.05797.pdf

Wang G C， Chen Z X， Loy C C and Liu Z W. 2023. SparseNeRF： distilling depth ranking for few-shot novel view synthesis//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision （ICCV）. Paris， France： IEEE： 9031-9042 ［DOI： 10.1109/ICCV51070.2023.00832http://dx.doi.org/10.1109/ICCV51070.2023.00832］

Wang P， Liu L J， Liu Y， Theobalt C， Komura T and Wang W P. 2021. NeuS： learning neural implicit surfaces by volume rendering for multi-view reconstruction//Proceedings of the 35th International Conference on Neural Information Processing Systems. ［s.l.］：［s.n.］： 27171-27183

Wang Z， Bovik A C， Sheikh H R and Simoncelli E P. 2004. Image quality assessment： from error visibility to structural similarity. IEEE Transactions on Image Processing， 13（4）： 600-612 ［DOI： 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861］

Xu J L， Zheng J， Xu Y Y， Tang R and Gao S H. 2021. Layout-guided novel view synthesis from a single indoor panorama//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 16433-16442 ［DOI： 10.1109/CVPR46437.2021.01617http://dx.doi.org/10.1109/CVPR46437.2021.01617］

Yang H， Chen R， An S P， Wei H and Zhang H. 2023. The growth of image-related three dimensional reconstruction techniques in deep learning-driven era： a critical summary. Journal of Image and Graphics， 28（8）： 2396-2409

杨航，陈瑞，安仕鹏，魏豪，张衡. 2023. 深度学习背景下的图像三维重建技术进展综述. 中国图象图形学报， 28（8）： 2396-2409 ［DOI： 10.11834/jig.220376http://dx.doi.org/10.11834/jig.220376］

Yariv L， Gu JT， Kasten Y and Lipman Y. 2021. Volume rendering of neural implicit surfaces//Proceedings of the 35th International Conference on Neural Information Processing Systems. ［s.l.］：［s.n.］： 4805-4815

Yu A， Li R L， Tancik M， Li H， Ng R and Kanazawa A. 2021a. PlenOctrees for real-time rendering of neural radiance fields//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 5732-5741 ［DOI： 10.1109/ICCV48922.2021.00570http://dx.doi.org/10.1109/ICCV48922.2021.00570］

Yu A， Ye V， Tancik M and Kanazawa A. 2021b. pixelNeRF： neural radiance fields from one or few images//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 4576-4585 ［DOI： 10.1109/CVPR46437.2021.00455http://dx.doi.org/10.1109/CVPR46437.2021.00455］

Yuan Y J， Lai Y K， Huang Y H， Kobbelt L and Gao L. 2023. Neural radiance fields from sparse RGB-D images for high-quality view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（7）： 8713-8728 ［DOI： 10.1109/TPAMI.2022.3232502http://dx.doi.org/10.1109/TPAMI.2022.3232502］

Zhang R， Isola P， Efros A A， Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 586-595 ［DOI： 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068］

Zhou T H， Tucker R， Flynn J， Fyffe G and Snavely N. 2018. Stereo magnification： learning view synthesis using multiplane images. ACM Transactions on Graphics， 37（4）： #65 ［DOI： 10.1145/3197517.3201323http://dx.doi.org/10.1145/3197517.3201323］

Alert me when the article has been cited

提交

Multi-agent path planning based on improved double DQN

An attention mechanism based inter-reflection compensation network for immersive projection system

Preliminary exploration on the construction of urban video real-scene map

Diversified real-time fracturing simulation of rigid body

Multi-mode interactive method for digital cultural relics based on haptics