深度学习背景下的图像三维重建技术进展综述

杨航; 陈瑞; 安仕鹏; 魏豪; 张衡

doi:10.11834/jig.220376

综述 | 浏览量 : 0 下载量: 6 CSCD: 2

PDF
导出
分享
收藏
专辑

深度学习背景下的图像三维重建技术进展综述
The growth of image-related three dimensional reconstruction techniques in deep learning-driven era： a critical summary
2023年28卷第8期页码：2396-2409
纸质出版日期： 2023-08-16 ，
DOI： 10.11834/jig.220376
稿件说明：

移动端阅览

杨航，陈瑞，安仕鹏，魏豪，张衡. 2023. 深度学习背景下的图像三维重建技术进展综述. 中国图象图形学报， 28(08):2396-2409

Yang Hang， Chen Rui， An Shipeng， Wei Hao， Zhang Heng. 2023. The growth of image-related three dimensional reconstruction techniques in deep learning-driven era： a critical summary. Journal of Image and Graphics， 28(08):2396-2409
杨航，陈瑞，安仕鹏，魏豪，张衡. 2023. 深度学习背景下的图像三维重建技术进展综述. 中国图象图形学报， 28(08):2396-2409 DOI： 10.11834/jig.220376.

Yang Hang， Chen Rui， An Shipeng， Wei Hao， Zhang Heng. 2023. The growth of image-related three dimensional reconstruction techniques in deep learning-driven era： a critical summary. Journal of Image and Graphics， 28(08):2396-2409 DOI： 10.11834/jig.220376.

摘要

三维重建是指从单幅或多幅二维图像中重建出物体的三维模型并对三维模型进行纹理映射的过程。三维重建可获取从任意视角观测并具有色彩纹理的三维模型，是计算机视觉领域的一个重要研究方向。传统的三维重建方法通常需要输入大量图像，并进行相机参数估计、密集点云重建、表面重建和纹理映射等多个步骤。近年来，深度学习背景下的图像三维重建受到了广泛关注，并表现出了优越的性能和发展前景。本文对深度学习背景下的图像三维重建的技术方法、评测方法和数据集进行全面综述。首先对三维重建进行分类，根据三维模型的表示形式可将图像三维重建方法分类为基于体素的三维重建、基于点云的三维重建和基于网格的三维重建；根据输入图像的类型可将图像三维重建分类为单幅图像三维重建和多幅图像三维重建。随后介绍了不同类别的三维重建方法，从三维重建方法的输入、三维模型表示形式、模型纹理颜色、重建网络的基准值类型和特点等方面进行总结，归纳了深度学习背景下的图像三维重建方法的常用数据集和实验对比，最后总结了当前图像三维重建领域的待解决问题以及未来的研究方向。

Abstract

Image-related three dimensional reconstruction techniques refer to the process of reconstructing the three dimensional model derived of a single image or multi-view images. It can illustrate a three dimensional model relevant to any view-derived color texture.Traditional three dimensional reconstruction methods are often required for a large number of images in relevance to such multiple contexts like sparse point cloud reconstruction， camera parameter estimation， dense point cloud reconstruction， surface reconstruction and texture mapping. In recent years， deep learning-driven image-related three dimensional reconstruction techniques have been concerned about， and current literatures are focused on introducing the traditional methods of image or special objects-based three dimensional reconstruction. The critical summary of image-based three dimensional reconstructions is called for further in terms of deep learning contexts. We summarize recent situation for deep learning based three dimensional reconstructions in terms of image analysis. First， three dimensional reconstructions are mainly introduced from two aspects： traditional-based and deep learning-based. Three sorts of three dimensional models are listed below： voxel model， point cloud model and mesh model. Voxel is similar to a cube in three-dimensional space， which is equivalent to pixels in three-dimensional space； Mesh is a polyhedral structure composed of the triangles， which is used to simulate the surface of complex objects； Point cloud is a collection of points in the coordinated system， which consists of the information of three-dimensional coordinates， colors and classification. For voxel model， the two-dimensional convolution used in image analysis can be easily extended to three-dimensional space， but the reconstruction of voxel model usually requires large of computing memory. The memory and calculation requirements of the method based on voxel model are cubic proportional to the resolution of voxel model. The point cloud-based shape reconstruction is smoother and takes less memory than voxel model based method. Compared to voxel model and point cloud model， mesh model can be used to analyze the object surface more completely. Then， we faciliate the classification of image-based three dimensional reconstructions， which can be classified from two aspects： the representation of three dimensional models and the type of input images. For the types of three dimensional reconstruction targets， we segment the existing three dimensional reconstruction methods into two categories： single image-related and multi-view images-related. For single image-related three dimensional reconstructions， we divide the method into three categories according to the representation of single image-related three dimensional reconstructions： voxel-based， point cloud based and mesh based. For three dimensional reconstructions in related to multi-view images， we divide the method into two categories as well： voxel-based and mesh-based. Then， existing image-based three dimensional reconstruction methods are introduced in detail， the methods are summarized critically in relevance to the input of three dimensional reconstruction method， three dimensional model representation， model texture color， ground truth and property of reconstruction network. The experiments of three dimensional reconstructions are analyzed from three aspects： evaluation method， dataset and comparison method. For the experimental aspect， current three dimensional reconstruction-related datasets are introduced， e.g.， repository of shapes represented by 3D CAD models （ShapeNet） dataset， pattern analysis， statistical modeling and computational learning （PASCAL） 3D+ dataset， 3D CAD model dataset （ModelNet） dataset， database for 3D object recognition （ObjectNet3D） dataset， benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment （Pix3D） dataset， Danmarks Tekniske Universitet （DTU） dataset， New York University （NYU） depth dataset， and Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago （KITTI） dataset. For the experiment of three dimensional reconstructions， the ShapeNet dataset is selected and benched for comparison， the pros and cons of the existing methods are analyzed further. Finally， future research direction of image-based three dimensional reconstruction is predicted and its challenging problems and future potentials are summarized from five aspects further as following： the generalization ability of three dimensional reconstruction methods； the fineness of three dimensional reconstruction； the combination of three dimensional reconstruction and the methods of segmentation and recognition； the texture mapping of three dimensional model； and the evaluation system of three dimensional reconstruction.

关键词

三维重建深度学习体素模型点云模型网格模型

Keywords

three dimensional reconstructiondeep learningvoxel modelpoint cloud modelmesh model

references

Aanæs H， Jensen R R， Vogiatzis G， Tola E and Dahl A B. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision， 120（2）： 153-168 ［DOI： 10.1007/s11263-016-0902-9http://dx.doi.org/10.1007/s11263-016-0902-9］

Abrevaya V F， Boukhayma A， Torr P H S and Boyer E. 2020. Cross-modal deep face normals with deactivable skip connections//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 4978-4988 ［DOI： 10.1109/CVPR42600.2020.00503http://dx.doi.org/10.1109/CVPR42600.2020.00503］

Andrew A M. 2001. Multiple view geometry in computer vision. Kybernetes， 30（9/10）： 1333-1341 ［DOI： 10.1108/k.2001.30.9_10.1333.2http://dx.doi.org/10.1108/k.2001.30.9_10.1333.2］

Bautista M A， Talbott W， Zhai S F， Srivastava N and Susskind J M. 2021. On the generalization of learning-based 3D reconstruction//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 2179-2188 ［DOI： 10.1109/WACV48630.2021.00223http://dx.doi.org/10.1109/WACV48630.2021.00223］

Chang A X， Funkhouser T， Guibas L， Hanrahan P， Huang Q X， Li Z M， Savarese S， Savva M， Song S R， Su H， Xiao J X， Yi L and Yu F. 2015. ShapeNet： an information-rich 3D model repository ［EB/OL］. ［2022-04-22］. http://arxiv.org/pdf/1512.03012.pdfhttp://arxiv.org/pdf/1512.03012.pdf

Chen G Y， Han K and Wong K Y K. 2018. PS-FCN： a flexible learning framework for photometric stereo//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01240-3_1http://dx.doi.org/10.1007/978-3-030-01240-3_1］

Chen W Z， Gao J， Ling H， Smith E J， Lehtinen J， Jacobson A and Fidler S. 2019a. Learning to predict 3D objects with an interpolation-based differentiable renderer//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 862

Chen X T， Chen X J and Zha Z J. 2019b. Structure-aware residual pyramid network for monocular depth estimation//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao， China： ijcai.org： 694-700 ［DOI： 10.24963/ijcai.2019/98http://dx.doi.org/10.24963/ijcai.2019/98］

Chen Z Q and Zhang H. 2019. Learning implicit fields for generative shape modeling//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 5932-5941 ［DOI： 10.1109/CVPR.2019.00609http://dx.doi.org/10.1109/CVPR.2019.00609］

Choy C B， Xu D F， Gwak J， Chen K and Savarese S. 2016. 3D-R2N2： a unified approach for single and multi-view 3D object reconstruction//Proceedings of 2016 European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 628-644 ［DOI： 10.1007/978-3-319-46484-8_38http://dx.doi.org/10.1007/978-3-319-46484-8_38］

Eigen D， Puhrsch C and Fergus R. 2014. Depth map prediction from a single image using a multi-scale deep network//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2366-2374

Fan H Q， Su H and Guibas L. 2017. A point set generation network for 3D object reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2463-2471 ［DOI： 10.1109/CVPR.2017.264http://dx.doi.org/10.1109/CVPR.2017.264］

Geiger A， Lenz P and Urtasun R. 2012. Are we ready for autonomous driving？ The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 3354-3361 ［DOI： 10.1109/CVPR.2012.6248074http://dx.doi.org/10.1109/CVPR.2012.6248074］

Goodfellow I J， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2672-2680

Groueix T， Fisher M， Kim V G， Russell B C and Aubry M. 2018. A papier-mâché approach to learning 3D surface generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 216-224 ［DOI： 10.1109/CVPR.2018.00030http://dx.doi.org/10.1109/CVPR.2018.00030］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hu J J， Ozay M， Zhang Y and Okatani T. 2019. Revisiting single image depth estimation： toward higher resolution maps with accurate object boundaries//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 1043-1051 ［DOI： 10.1109/WACV.2019.00116http://dx.doi.org/10.1109/WACV.2019.00116］

Jiang L， Shi S S， Qi X J and Jia J Y. 2018. GAL： geometric adversarial loss for single-view 3D-object reconstruction//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 820-834 ［DOI： 10.1007/978-3-030-01237-3_49http://dx.doi.org/10.1007/978-3-030-01237-3_49］

Ju Y， Dong J Y and Chen S. 2021. Recovering surface normal and arbitrary images： a dual regression network for photometric stereo. IEEE Transactions on Image Processing， 30： 3676-3690 ［DOI： 10.1109/TIP.2021.3064230http://dx.doi.org/10.1109/TIP.2021.3064230］

Kanazawa A， Black M J， Jacobs D W and Malik J. 2018. End-to-end recovery of human shape and pose//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7122-7131 ［DOI： 10.1109/CVPR.2018.00744http://dx.doi.org/10.1109/CVPR.2018.00744］

Kar A， Häne C and Malik J. 2017. Learning a multi-view stereo machine//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 364-375

Karras T， Laine S and Aila T. 2019. A style-based generator architecture for generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4401-4410 ［DOI： 10.1109/CVPR.2019.00453http://dx.doi.org/10.1109/CVPR.2019.00453］

Kolotouros N， Pavlakos G and Daniilidis K. 2019. Convolutional mesh regression for single-image human shape reconstruction//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4496-4505 ［DOI： 10.1109/CVPR.2019.00463http://dx.doi.org/10.1109/CVPR.2019.00463］

Lin K， Wang L J and Liu Z C. 2021. End-to-end human pose and mesh reconstruction with transformers//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 1954-1963 ［DOI： 10.1109/CVPR46437.2021.00199http://dx.doi.org/10.1109/CVPR46437.2021.00199］

Liu F， Tran L and Liu X M. 2021. Fully understanding generic objects： modeling， segmentation， and reconstruction//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 7419-7429 ［DOI： 10.1109/CVPR46437.2021.00734http://dx.doi.org/10.1109/CVPR46437.2021.00734］

Liu S K， Giles L and Ororbia A. 2018. Learning a hierarchical latent-variable model of 3D shapes//Proceedings of 2018 International Conference on 3D Vision. Verona， Italy： IEEE： 542-551 ［DOI： 10.1109/3DV.2018.00068http://dx.doi.org/10.1109/3DV.2018.00068］

Long X X， Cheng X J， Zhu H， Zhang P J， Liu H M， Li J， Zheng L T， Hu Q Y， Liu H， Cao X， Yang R G， Wu Y H， Zhang G F， Liu Y B， Xu K， Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics， 26（6）： 1389-1428

龙霄潇，程新景，朱昊，张朋举，刘浩敏，李俊，郑林涛，胡庆拥，刘浩，曹汛，杨睿刚，吴毅红，章国锋，刘烨斌，徐凯，郭裕兰，陈宝权. 2021. 三维视觉前沿进展. 中国图象图形学报， 26（6）： 1389-1428 ［DOI： 10.11834/jig.210043http://dx.doi.org/10.11834/jig.210043］

Mandikal P， Navaneet K L， Agarwal M and Radhakrishnan V B. 2018. 3D-LMNet： latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image//Proceedings of the British Machine Vision Conference 2018. Newcastle， UK： BMVA Press： #55

Mandikal P and Radhakrishnan V B. 2019. Dense 3D point cloud reconstruction using a deep pyramid network//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision （WACV）. Waikoloa， USA： IEEE： 1052-1060 ［DOI： 10.1109/WACV.2019.00117http://dx.doi.org/10.1109/WACV.2019.00117］

Mescheder L， Oechsle M， Niemeyer M， Nowozin S and Geiger A. 2019. Occupancy networks： learning 3D reconstruction in function space//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4455-4465 ［DOI： 10.1109/CVPR.2019.00459http://dx.doi.org/10.1109/CVPR.2019.00459］

Murez Z， van As T， Bartolozzi J， Sinha A， Badrinarayanan V and Rabinovich A. 2020. Atlas： end-to-end 3d scene reconstruction from posed images//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 414-431 ［DOI： 10.1007/978-3-030-58571-6_25http://dx.doi.org/10.1007/978-3-030-58571-6_25］

Niemeyer M， Mescheder L， Oechsle M and Geiger A. 2020. Differentiable volumetric rendering： learning implicit 3D representations without 3D supervision//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 3501-3512 ［DOI： 10.1109/CVPR42600.2020.00356http://dx.doi.org/10.1109/CVPR42600.2020.00356］

Popov S， Bauszat P and Ferrari V. 2020. CoReNet： coherent 3D scene reconstruction from a single RGB image//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 366-383 ［DOI： 10.1007/978-3-030-58536-5_22http://dx.doi.org/10.1007/978-3-030-58536-5_22］

Richardson E， Sela M， Or-El R and Kimmel R. 2017. Learning detailed face reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5553-5562 ［DOI： 10.1109/CVPR.2017.589http://dx.doi.org/10.1109/CVPR.2017.589］

Saito S， Huang Z， Natsume R， Morishima S， Li H and Kanazawa A. 2019. PIFu： pixel-aligned implicit function for high-resolution clothed human digitization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2304-2314 ［DOI： 10.1109/ICCV.2019.00239http://dx.doi.org/10.1109/ICCV.2019.00239］

Saito S， Simon T， Saragih J and Joo H. 2020. PIFuHD： multi-level pixel-aligned implicit function for high-resolution 3D human digitization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 81-90 ［DOI： 10.1109/CVPR42600.2020.00016http://dx.doi.org/10.1109/CVPR42600.2020.00016］

Sengupta S， Kanazawa A， Castillo C D and Jacobs D W. 2018. SfSNet： learning shape， reflectance and illuminance of faces “in the Wild”//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6296-6305 ［DOI： 10.1109/CVPR.2018.00659http://dx.doi.org/10.1109/CVPR.2018.00659］

Shi B X， Tan P， Matsushita Y and Ikeuchi K. 2014. Bi-polynomial modeling of low-frequency reflectances. IEEE Transactions on Pattern Analysis and Machine Intelligence， 36（6）： 1078-1091 ［DOI： 10.1109/TPAMI.2013.196http://dx.doi.org/10.1109/TPAMI.2013.196］

Shrestha R， Fan Z W， Su Q K， Dai Z Z， Zhu S Y and Tan P. 2021. MeshMVS： multi-view stereo guided mesh reconstruction//Proceedings of 2021 International Conference on 3D Vision. London， UK： IEEE： 1290-1300 ［DOI： 10.1109/3DV53792.2021.00136http://dx.doi.org/10.1109/3DV53792.2021.00136］

Silberman N， Hoiem D， Kohli P and Fergus R. 2012. Indoor segmentation and support inference from RGBD images//Proceedings of the 12th European Conference on Computer Vision. Florence， Italy： Springer： 746-760 ［DOI： 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54］

Song W， Zhu M F， Zhang M H， Zhao D F and He Q. 2022. A review of monocular depth estimation techniques based on deep learning. Journal of Image and Graphics， 27（2）： 292-328

宋巍，朱孟飞，张明华，赵丹枫，贺琪. 2022. 基于深度学习的单目深度估计技术综述. 中国图象图形学报， 27（2）： 292-328 ［DOI： 10.11834/jig.210554http://dx.doi.org/10.11834/jig.210554］

Spezialetti R， Tan D J， Tonioni A， Tateno K and Tombari F. 2020. A divide et impera approach for 3D shape reconstruction from multiple views//Proceedings of 2020 International Conference on 3D Vision. Fukuoka， Japan： IEEE： 160-170 ［DOI： 10.1109/3DV50981.2020.00026http://dx.doi.org/10.1109/3DV50981.2020.00026］

Sun J M， Xie Y M， Chen L H， Zhou X W and Bao H J. 2021. NeuralRecon： real-time coherent 3D reconstruction from monocular video//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 15593-15602 ［DOI： 10.1109/CVPR46437.2021.01534http://dx.doi.org/10.1109/CVPR46437.2021.01534］

Sun X Y， Wu J J， Zhang X M， Zhang Z T， Zhang C K， Xue T F， Tenenbaum J B and Freeman W T. 2018. Pix3D： dataset and methods for single-image 3D shape modeling//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2974-2983 ［DOI： 10.1109/CVPR.2018.00314http://dx.doi.org/10.1109/CVPR.2018.00314］

Tang J P， Han X G， Pan J Y， Jia K and Tong X. 2019. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single RGB images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4536-4545 ［DOI： 10.1109/CVPR.2019.00467http://dx.doi.org/10.1109/CVPR.2019.00467］

Tatarchenko M， Dosovitskiy A and Brox T. 2017. Octree generating networks： efficient convolutional architectures for high-resolution 3D outputs//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2107-2115 ［DOI： 10.1109/ICCV.2017.230http://dx.doi.org/10.1109/ICCV.2017.230］

Tulsiani S， Efros A A and Malik J. 2018. Multi-view consistency as supervisory signal for learning shape and pose prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2897-2905 ［DOI： 10.1109/CVPR.2018.00306http://dx.doi.org/10.1109/CVPR.2018.00306］

Tulsiani S， Zhou T H， Efros A A and Malik J. 2017. Multi-view supervision for single-view reconstruction via differentiable ray consistency//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 209-217 ［DOI： 10.1109/CVPR.2017.30http://dx.doi.org/10.1109/CVPR.2017.30］

Wang D， Cui X R， Chen X， Zou Z X， Shi T Y， Salcudean S， Wang Z J and Ward R. 2021. Multi-view 3D reconstruction with transformers//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 5702-5711 ［DOI： 10.1109/ICCV48922.2021.00567http://dx.doi.org/10.1109/ICCV48922.2021.00567］

Wang N Y， Zhang Y D， Li Z W， Fu Y W， Liu W and Jiang Y G. 2018a. Pixel2Mesh： generating 3D mesh models from single RGB images//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 55-71 ［DOI： 10.1007/978-3-030-01252-6_4http://dx.doi.org/10.1007/978-3-030-01252-6_4］

Wang P S， Sun C Y， Liu Y and Tong X. 2018b. Adaptive O-CNN： a patch-based deep representation of 3D shapes. ACM Transactions on Graphics， 37（6）： #217 ［DOI： 10.1145/3272127.3275050http://dx.doi.org/10.1145/3272127.3275050］

Wang W Y， Xu Q G， Ceylan D， Mech R and Neumann U. 2019. DISN： deep implicit surface network for high-quality single-view 3D reconstruction//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： #45

Wen C， Zhang Y D， Li Z W and Fu Y W. 2019. Pixel2Mesh++： multi-view 3D mesh generation via deformation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1042-1051 ［DOI： 10.1109/ICCV.2019.00113http://dx.doi.org/10.1109/ICCV.2019.00113］

Woodham R J. 1980. Photometric method for determining surface orientation from multiple images. Optical Engineering， 19（1）： #191139 ［DOI： 10.1117/12.7972479http://dx.doi.org/10.1117/12.7972479］

Wu B J and Huang H. 2020. Survey on 3D reconstruction of transparent objects. Journal of Computer-Aided Design and Computer Graphics， 32（2）： 173-180

吴博剑，黄惠. 2020. 透明物体的三维重建综述. 计算机辅助设计与图形学学报， 32（2）： 173-180 ［DOI： 10.3724/SP.J.1089.2020.18101http://dx.doi.org/10.3724/SP.J.1089.2020.18101］

Wu Z R， Song S R， Khosla A， Yu F， Zhang L G， Tang and Xiao J X. 2015. 3D ShapeNets： a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 1912-1920 ［DOI： 10.1109/CVPR.2015.7298801http://dx.doi.org/10.1109/CVPR.2015.7298801］

Xiang Y， Kim W， Chen W， Ji J W， Choy C， Su H， Mottaghi R， Guibas L and Savarese S. 2016. ObjectNet3D： a large scale database for 3D object recognition//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 160-176 ［DOI： 10.1007/978-3-319-46484-8_10http://dx.doi.org/10.1007/978-3-319-46484-8_10］

Xiang Y， Mottaghi R and Savarese S. 2014. Beyond PASCAL： a benchmark for 3D object detection in the wild//Proceedings of 2014 IEEE Winter Conference on Applications of Computer Vision. Steamboat Springs， USA： IEEE： 75-82 ［DOI： 10.1109/WACV.2014.6836101http://dx.doi.org/10.1109/WACV.2014.6836101］

Xie H Z， Yao H X， Sun X S， Zhou S C and Zhang S P. 2019. Pix2Vox： context-aware 3D reconstruction from single and multi-view images//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2690-2698 ［DOI： 10.1109/ICCV.2019.00278http://dx.doi.org/10.1109/ICCV.2019.00278］

Xie H Z， Yao H X， Zhang S P， Zhou S C and Sun W X. 2020. Pix2Vox++： multi-scale context-aware 3D object reconstruction from single and multiple images. International Journal of Computer Vision， 128（12）： 2919-2935 ［DOI： 10.1007/s11263-020-01347-6http://dx.doi.org/10.1007/s11263-020-01347-6］

Yang B， Rosa S， Markham A， Trigoni N and Wen H K. 2019. Dense 3D object reconstruction from a single depth view. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（12）： 2820-2834 ［DOI： 10.1109/TPAMI.2018.2868195http://dx.doi.org/10.1109/TPAMI.2018.2868195］

Yao Y， Schertler N， Rosales E， Rhodin H， Sigal L and Sheffer A. 2020. Front2Back： single view 3D shape reconstruction via front to back prediction//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 531-540 ［DOI： 10.1109/CVPR42600.2020.00061http://dx.doi.org/10.1109/CVPR42600.2020.00061］

Yu H and Oh J. 2022. Anytime 3D object reconstruction using multi-modal variational autoencoder. IEEE Robotics and Automation Letters， 7（2）： 2162-2169 ［DOI： 10.1109/LRA.2022.3142439http://dx.doi.org/10.1109/LRA.2022.3142439］

Yuan Y， Tang J L and Zou Z X. 2021. Vanet： a view attention guided network for 3D reconstruction from single and multi-view images//Proceedings of 2021 IEEE International Conference on Multimedia and Expo. Shenzhen， China： IEEE： 1-6 ［DOI： 10.1109/ICME51207.2021.9428171http://dx.doi.org/10.1109/ICME51207.2021.9428171］

Zhang Y X， Chen W Z， Ling H， Gao J， Zhang Y N， Torralba A and Fidler S. 2021a. Image GANs meet differentiable rendering for inverse graphics and interpretable 3D neural rendering//Proceedings of the 9th International Conference on Learning Representations. Vienna， Austria： OpenReview.net

Zhang Z， Ge Y， Chen R W， Tai Y， Yan Y， Yang J， Wang C J， Li J L and Huang F Y. 2021b. Learning to aggregate and personalize 3D face from in-the-wild photo collection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 14209-14219 ［DOI： 10.1109/CVPR46437.2021.01399http://dx.doi.org/10.1109/CVPR46437.2021.01399］

Zheng T X， Huang S， Li Y F and Feng M C. 2020. Key techniques for vision based 3D reconstruction： a review. Acta Automatica Sinica， 46（4）： 631-652

郑太雄，黄帅，李永福，冯明驰. 2020. 基于视觉的三维重建关键技术研究综述. 自动化学报， 46（4）： 631-652 ［DOI： 10.16383/j.aas.2017.c170502http://dx.doi.org/10.16383/j.aas.2017.c170502］

Zhu X Y， Liu X M， Lei Z and Li S Z. 2019. Face alignment in full pose range： a 3D total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（1）： 78-92 ［DOI： 10.1109/TPAMI.2017.2778152http://dx.doi.org/10.1109/TPAMI.2017.2778152］

文章被引用时，请邮件提醒。

提交