Recent progress in 3D vision

Xiaoxiao Long; Xinjing Cheng; Hao Zhu; Pengju Zhang; Haomin Liu; Jun Li; Lintao Zheng; Qingyong Hu; Hao Liu; Xun Cao; Ruigang Yang; Yihong Wu; Guofeng Zhang; Yebin Liu; Kai Xu; Yulan Guo; Baoquan Chen

doi:10.11834/jig.210043

3D Vision & Graphics Technology | Views : 0 下载量: 725 CSCD: 26

PDF
Export
Share
Collection
Album

Recent progress in 3D vision
Vol. 26, Issue 6, Pages: 1389-1428(2021)
Received：21 January 2021，

Revised：2021-2-2，

Accepted：09 February 2021，

Published：16 June 2021
DOI： 10.11834/jig.210043
稿件说明：

移动端阅览

DOI：

Xiaoxiao Long, Xinjing Cheng, Hao Zhu, Pengju Zhang, Haomin Liu, Jun Li, Lintao Zheng, Qingyong Hu, Hao Liu, Xun Cao, Ruigang Yang, Yihong Wu, Guofeng Zhang, Yebin Liu, Kai Xu, Yulan Guo, Baoquan Chen. Recent progress in 3D vision[J]. Journal of Image and Graphics, 2021, 26(6): 1389-1428. DOI： 10.11834/jig.210043.

摘要

在自动驾驶、机器人、数字城市以及虚拟/混合现实等应用的驱动下，三维视觉得到了广泛的关注。三维视觉研究主要围绕深度图像获取、视觉定位与制图、三维建模及三维理解等任务而展开。本文围绕上述三维视觉任务，对国内外研究进展进行了综合评述和对比分析。首先，针对深度图像获取任务，从非端到端立体匹配、端到端立体匹配及无监督立体匹配3个方面对立体匹配研究进展进行了回顾，从深度回归网络和深度补全网络两个方面对单目深度估计研究进展进行了回顾。其次，针对视觉定位与制图任务，从端到端视觉定位和非端到端视觉定位两个方面对大场景下的视觉定位研究进展进行了回顾，并从视觉同步定位与地图构建和融合其他传感器的同步定位与地图构建两个方面对同步定位与地图构建的研究进展进行了回顾。再次，针对三维建模任务，从深度三维表征学习、深度三维生成模型、结构化表征学习与生成模型以及基于深度学习的三维重建等4个方面对三维几何建模研究进展进行了回顾，并从多视RGB重建、单深度相机和多深度相机方法以及单视图RGB方法等3个方面对人体动态建模研究进展进行了回顾。最后，针对三维理解任务，从点云语义分割和点云实例分割两个方面对点云语义理解研究进展进行了回顾。在此基础上，给出了三维视觉研究的未来发展趋势，旨在为相关研究者提供参考。

Abstract

3D vision has numerous applications in various areas

such as autonomous vehicles

robotics

digital city

virtual/mixed reality

human-machine interaction

entertainment

and sports. It covers a broad variety of research topics

ranging from 3D data acquisition

3D modeling

shape analysis

rendering

to interaction. With the rapid development of 3D acquisition sensors (such as low-cost LiDARs

depth cameras

and 3D scanners)

3D data become even more accessible and available. Moreover

the advances in deep learning techniques further boost the development of 3D vision

with a large number of algorithms being proposed recently. We provide a comprehensive review on progress of 3D vision algorithms in recent few years

mostly in the last year. This survey covers seven different topics

including stereo matching

monocular depth estimation

visual localization in large-scale scenes

simultaneous localization and mapping (SLAM)

3D geometric modeling

dynamic human modeling

and point cloud understanding. Although several surveys are now available in the area of 3D vision

this survey is different from few aspects. First

this study covers a wide range of topics in 3D vision and can therefore benefit a broad research community. On the contrary

most existing works mainly focus on a specific topic

such as depth estimation or point cloud learning. Second

this study mainly focuses on the progress in very recent years. Therefore

it can provide the readers with up-to-date information. Third

this paper presents a direct comparison between the progresses in China and abroad. The recent progress in depth image acquisition

including stereo matching and monocular depth estimation

is initially reviewed. The stereo matching algorithms are divided into non-end-to-end stereo matching

end-to-end stereo matching

and unsupervised stereo matching algorithms. The monocular depth estimation algorithms are categorized into depth regression networks and depth completion networks. The depth regression networks are further divided into encoder-decoder networks and composite networks. Then

the recent progress in visual localization

including visual localization in large-scale scenes and SLAM is reviewed. The visual localization algorithms for large-scale scenes are divided into end-to-end and non-end-to-end algorithms

and these non-end-to-end algorithms are further categorized into deep learning-based feature description algorithms

2D image retrieval-based visual localization algorithms

2D-3D matching-based visual localization algorithms

and visual localization algorithms based on the fusion of 2D image retrieval and 2D-3D matching. SLAM algorithms are divided into visual SLAM algorithms and multisensor fusion based SLAM algorithms. The recent progress in 3D modeling and understanding

including 3D geometric modeling

dynamic human modeling

and point cloud understanding is further reviewed. 3D geometric modeling algorithms consist of several components

including deep 3D representation learning

deep 3D generative models

structured representation learning and generative models

and deep learning-based 3D modeling. Dynamic human modeling algorithms are divided into multiview RGB modeling algorithms

single-depth camera-based and multiple-depth camera-based algorithms

and single-view RGB modeling methods. Point cloud understanding algorithms are further categorized into semantic segmentation methods and instance segmentation methods for point clouds. The paper is organized as follows. In Section 1

we present the progress in 3D vision outside China. In Section 2

we introduce the progress of 3D vision in China. In Section 3

the 3D vision techniques developed in China and abroad are compared and analyzed. In Section 4

we point out several future research directions in the area.

关键词

Keywords

references

Agamennoni G, Fontana S,Siegwart R Y and Sorrenti D G. 2016. Point clouds registration with probabilistic data association//Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea (South): IEEE: 4092-4098[ DOI: 10.1109/IROS.2016.7759602 http://dx.doi.org/10.1109/IROS.2016.7759602 ]

Aleotti F, Tosi F, Zhang L, Poggi M and Mattoccia S. 2020. Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation[EB/OL]. [2020-10-10] . https://arxiv.org/pdf/2008.07130.pdf https://arxiv.org/pdf/2008.07130.pdf

Almalioglu Y, Saputra M R U, de Gusmão P P B, Markham A and Trigoni N. 2019. GANVO: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 5474-5480[ DOI: 10.1109/ICRA.2019.8793512 http://dx.doi.org/10.1109/ICRA.2019.8793512 ]

Arandjelovic R, Gronat P, Torii A, Pajdla T and Sivic J. 2016. NetVLAD: CNN architecture for weakly supervised place recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5297-5307[ DOI: 10.1109/CVPR.2016.572 http://dx.doi.org/10.1109/CVPR.2016.572 ]

Averkiou M, Kim V G, Zheng Y Y and Mitra N J. 2014. ShapeSynth: parameterizing model collections for coupled shape exploration and synthesis. Computer Graphics Forum, 33(2): 125-134[DOI:10.1111/cgf.12310]

Badki A, Troccoli A, Kim K, Kautz J, Sen P and Gallo O. 2020. Bi3D: stereo depth estimation via binary classifications//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1597-1605[ DOI: 10.1109/CVPR42600.2020.00167 http://dx.doi.org/10.1109/CVPR42600.2020.00167 ]

Balntas V, Lenc K, Vedaldi A and Mikolajczyk K. 2017. HPatches: a benchmark and evaluation of handcrafted and learned local descriptors//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3852-3861[ DOI: 10.1109/CVPR.2017.410 http://dx.doi.org/10.1109/CVPR.2017.410 ]

Bao W, Wang W, Xu Y H, Guo Y L, Hong S Y and Zhang X H. 2020. InStereo2K: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences, 63(11): #212101[DOI:10.1007/s11432-019-2803-x]

Barron J T and Poole B. 2016. The fast bilateral solver//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 617-632[ DOI: 10.1007/978-3-319-46487-9_38 http://dx.doi.org/10.1007/978-3-319-46487-9_38 ]

Bhowmik A, Gumhold S, Rother C and Brachmann E. 2020. Reinforced feature points: optimizing feature detection and description for a high-level task//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4947-4956[ DOI: 10.1109/CVPR42600.2020.00500 http://dx.doi.org/10.1109/CVPR42600.2020.00500 ]

Bloesch M, Czarnowski J, Clark R, Leutenegger S and Davison A J. 2018. CodeSLAM-learning a compact, optimisable representation for dense visual SLAM//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2560-2568[ DOI: 10.1109/CVPR.2018.00271 http://dx.doi.org/10.1109/CVPR.2018.00271 ]

Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S and Rother C. 2017. DSAC-differentiable RANSAC for camera localization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2492-2500[ DOI: 10.1109/CVPR.2017.267 http://dx.doi.org/10.1109/CVPR.2017.267 ]

Brachmann E and Rother C. 2018. Learning less is more-6D camera localization via 3D surface regression//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4654-4662[ DOI: 10.1109/CVPR.2018.00489 http://dx.doi.org/10.1109/CVPR.2018.00489 ]

Brachmann E and Rother C. 2019. Expert sample consensus applied to camera re-localization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7524-7533[ DOI: 10.1109/ICCV.2019.00762 http://dx.doi.org/10.1109/ICCV.2019.00762 ]

Brahmbhatt S, Gu J W, Kim K, Hays J and Kautz J. 2018. Geometry-aware learning of maps for camera localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2616-2625[ DOI: 10.1109/CVPR.2018.00277 http://dx.doi.org/10.1109/CVPR.2018.00277 ]

Brandao P, Mazomenos E and Stoyanov D. 2019. Widening siamese architectures for stereo matching. Pattern Recognition Letters, 120: 75-81[DOI:10.1016/j.patrec.2018.12.002]

Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I and Leonard J J. 2016. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Transactions on Robotics, 32(6): 1309-1332[DOI:10.1109/TRO.2016.2624754]

Campos C, Elvira R, Rodríguez J J G, Montiel J M M and Tardós J D. 2020. ORB-SLAM3: an accurate open-source library for visual, visual-inertial and multi-map SLAM[EB/OL]. [2020-07-23] . https://arxiv.org/pdf/2007.11898.pdf https://arxiv.org/pdf/2007.11898.pdf

Caselitz T, Steder B, Ruhnke M and Burgard W. 2016. Monocular camera localization in 3D LiDAR maps//Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea (South): IEEE: 1926-1931[ DOI: 10.1109/IROS.2016.7759304 http://dx.doi.org/10.1109/IROS.2016.7759304 ]

Chakrabarti A, Shao J and Shakhnarovich G. 2016. Depth from a single image by harmonizing overcomplete local network predictions//Advances in Neural Information Processing Systems. Barcelona, Spain: Curran Associates, Inc. : 2658-2666

Chan S H, Wu P T and Fu L C. 2018. Robust 2D indoor localization through laser SLAM and visual SLAM fusion//Proceedings of 2018 IEEE International Conference on Systems, Man, and Cybernetics. Miyazaki, Japan: IEEE, 2018: 1263-1268[DOI:10.1109/SMC.2018.00221]

Chang J R and Chen Y S. 2018. Pyramid stereo matching network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5410-5418[ DOI: 10.1109/CVPR.2018.00567 http://dx.doi.org/10.1109/CVPR.2018.00567 ]

Chen C H, Rosa S, Miao Y S, Lu C X, Wu W, Markham A and Trigoni N. 2019. Selective sensor fusion for neural visual-inertial odometry//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10534-10543[ DOI: 10.1109/CVPR.2019.01079 http://dx.doi.org/10.1109/CVPR.2019.01079 ]

Chen H X, Li K H, Fu Z H, Liu M Y, Chen Z H and Guo Y L. 2021. Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Processing Letters, 28: 334-338[DOI:10.1109/LSP.2021.3050712]

Chen M X, Yang S W, Yi X D and Wu D. 2017. Real-time 3D mapping using a 2D laser scanner and IMU-aided visual SLAM//Proceedings of 2017 IEEE International Conference on Real-time Computing and Robotics. Okinawa, Japan: IEEE: 297-302[ DOI: 10.1109/RCAR.2017.8311877 http://dx.doi.org/10.1109/RCAR.2017.8311877 ]

Chen Z, Badrinarayanan V, Drozdov G and Rabinovich A. 2018. Estimating depth from rgb and sparse sensing//Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 167-182[ DOI: 10.1007/978-3-030-01225-0_11 http://dx.doi.org/10.1007/978-3-030-01225-0_11 ]

Chen Z Y, Sun X, Wang L, Yu Y and Huang C. 2015. A deep visual correspondence embedding model for stereo matching costs//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 972-980[ DOI: 10.1109/ICCV.2015.117 http://dx.doi.org/10.1109/ICCV.2015.117 ]

Cheng X J, Wang P, Guan C Y and Yang R G. 2020a. CSPN++: learning context and resource aware convolutional spatial propagation networks for depth completion. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 10615-10622[DOI:10.1609/aaai.v34i07.6635]

Cheng X J, Wang P and Yang R G. 2018. Depth estimation via affinity learned with convolutional spatial propagation network//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 108-125[ DOI: 10.1007/978-3-030-01270-0_7 http://dx.doi.org/10.1007/978-3-030-01270-0_7 ]

Cheng X J, Wang P and Yang R G. 2020b. Learning depth with convolutional spatial propagation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10): 2361-2379[DOI:10.1109/TPAMI.2019.2947374]

Chodosh N, Wang C Y and Lucey S. 2019. Deep convolutional compressed sensing for LiDAR depth completion//Proceedings of Asian Conference on Computer Vision. Cham: Springer: 499-513[ DOI: 10.1007/978-3-030-20887-5_31 http://dx.doi.org/10.1007/978-3-030-20887-5_31 ]

Choy C, Gwak J and Savarese S. 2019. 4D spatio-temporal convnets: minkowski convolutional neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3070-3079[ DOI: 10.1109/CVPR.2019.00319 http://dx.doi.org/10.1109/CVPR.2019.00319 ]

Clark R, Wang S, Wen H K, Markham A and Trigoni N. 2017. VINet: visual-inertial odometry as a sequence-to-sequence learning problem[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1701.08376.pdf https://arxiv.org/pdf/1701.08376.pdf

Dai A and Nieβner M. 2018. 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 458-474[ DOI: 10.1007/978-3-030-01249-6_28 http://dx.doi.org/10.1007/978-3-030-01249-6_28 ]

Davison A J. 2003. Real-time simultaneous localisation and mapping with a single camera//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE, 1403-1410[ DOI: 10.1109/ICCV.2003.1238654 http://dx.doi.org/10.1109/ICCV.2003.1238654 ]

DeTone D, Malisiewicz T and Rabinovich A. 2018. SuperPoint: self-supervised interest point detection and description//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 337-349[ DOI: 10.1109/CVPRW.2018.00060 http://dx.doi.org/10.1109/CVPRW.2018.00060 ]

Doria D and Radke R J. 2012. Filling large holes in LiDAR data by inpainting depth gradients//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA: IEEE: 65-72[ DOI: 10.1109/CVPRW.2012.6238916 http://dx.doi.org/10.1109/CVPRW.2012.6238916 ]

Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, van der Smagt P, Cremers D and Brox T. 2015. FlowNet: learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2758-2766[ DOI: 10.1109/ICCV.2015.316 http://dx.doi.org/10.1109/ICCV.2015.316 ]

Dou M S, Khamis S, Degtyarev Y, Davidson P, Fanello S R, Kowdle A, Escolano S O, Rhemann C, Kim D, Taylor J, Kohli P, Tankovich V and Izadi S. 2016. Fusion4D: real-time performance capture of challenging scenes. ACM Transactions on Graphics, 35(4): #114[DOI:10.1145/2897824.2925969]

Du H, Wang W, Xu C W, Xiao R and Sun C Y. 2020. Real-time onboard 3D state estimation of an unmanned aerial vehicle in multi-environments using multi-sensor data fusion. Sensors, 20(3): #919[DOI:10.3390/s20030919]

Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A and Sattler T. 2019. D2-Net: a trainable CNN for joint description and detection of local features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8084-8093[ DOI: 10.1109/CVPR.2019.00828 http://dx.doi.org/10.1109/CVPR.2019.00828 ]

Eigen D and Fergus R. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2650-2658[ DOI: 10.1109/ICCV.2015.304 http://dx.doi.org/10.1109/ICCV.2015.304 ]

Eigen D, Puhrsch C and Fergus R. 2014. Depth map prediction from a single image using a multi-scale deep network//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press: 2366-2374

Engel J, Schöps T and Cremers D. 2014. LSD-SLAM: large-scale direct monocular SLAM//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 834-849[ DOI: 10.1007/978-3-319-10605-2_54 http://dx.doi.org/10.1007/978-3-319-10605-2_54 ]

Engel J, Stückler J and Cremers D. 2015. Large-scale direct SLAM with stereo cameras//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE: 1935-1942[ DOI: 10.1109/IROS.2015.7353631 http://dx.doi.org/10.1109/IROS.2015.7353631 ]

Engel J, Koltun V and Cremers D. 2017. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3): 611-625

Engelmann F, Bokeloh M, Fathi A,Leibe B and Nieβner M. 2020. 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9028-9037[ DOI: 10.1109/CVPR42600.2020.00905 http://dx.doi.org/10.1109/CVPR42600.2020.00905 ]

Engelmann F, Kontogianni T, Hermans A and Leibe B. 2017. Exploring spatial context for 3D semantic segmentation of point clouds//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE: 716-724[ DOI: 10.1109/ICCVW.2017.90 http://dx.doi.org/10.1109/ICCVW.2017.90 ]

Fan H Q, Su H and Guibas L. 2017. A point set generation network for 3D object reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2463-2471[ DOI: 10.1109/CVPR.2017.264 http://dx.doi.org/10.1109/CVPR.2017.264 ]

Feng Y J, Fan L Xand Wu Y H. 2016. Fast localization in large-scale environments using supervised indexing of binary features. IEEE Transactions on Image Processing, 25(1): 343-358[DOI:10.1109/TIP.2015.2500030]

Ferstl D, Reinbacher C, Ranftl R, Rather M and Bischof H. 2013. Image guided depth upsampling using anisotropic total generalized variation//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 993-1000[ DOI: 10.1109/ICCV.2013.127 http://dx.doi.org/10.1109/ICCV.2013.127 ]

Flynn J, Neulander I, Philbin J and Snavely N. 2016. Deep stereo: learning to predict new views from the world's imagery//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5515-5524[ DOI: 10.1109/CVPR.2016.595 http://dx.doi.org/10.1109/CVPR.2016.595 ]

Forster C, Pizzoli M, Scaramuzza D. SVO: Fast semi-direct monocular visual odometry[C]//2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014: 15-22

Fu H, Gong M M, Wang C H, Batmanghelich K and Tao D C. 2018. Deep ordinal regression network for monocular depth estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2002-2011[ DOI: 10.1109/CVPR.2018.00214 http://dx.doi.org/10.1109/CVPR.2018.00214 ]

Gálvez-López D and Tardos J D. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5): 1188-1197[DOI:10.1109/TRO.2012.2197158]

Gan Y K, Xu X Y, Sun W X and Lin L. 2018. Monocular depth estimation with affinity, vertical pooling, and label enhancement//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 232-247[ DOI: 10.1007/978-3-030-01219-9_14 http://dx.doi.org/10.1007/978-3-030-01219-9_14 ]

Gao X, Wang R, Demmel N and Cremers D. 2018. LDSO: direct sparse odometry with loop closure//Proceeding of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE: 2198-2204[ DOI: 10.1109/IROS.2018.8593376 http://dx.doi.org/10.1109/IROS.2018.8593376 ]

Garg R, Bg V K, Carneiro G and Reid I. 2016. Unsupervised cnn for single view depth estimation: geometry to the rescue//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 740-756[ DOI: 10.1007/978-3-319-46484-8_45 http://dx.doi.org/10.1007/978-3-319-46484-8_45 ]

Gawel A, Cieslewski T, DubéR, Bosse M, Siegwart R and Nieto J. 2016. Structure-based vision-laser matching//Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea (South): IEEE: 182-188[ DOI: 10.1109/IROS.2016.7759053 http://dx.doi.org/10.1109/IROS.2016.7759053 ]

Ge Y X, Wang H B, Zhu F, Zhao R and Li H S. 2020. Self-supervising fine-grained region similarities for large-scale image localization[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2006.03926.pdf https://arxiv.org/pdf/2006.03926.pdf

Genova K, Cole F, Sud A, Sarna A and Funkhouser T. 2019. Deep structured implicit functions[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1912.06126.pdf https://arxiv.org/pdf/1912.06126.pdf

Gidaris S and Komodakis N. 2017. Detect, replace, refine: deep structured prediction for pixel wise labeling//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7187-7196[ DOI: 10.1109/CVPR.2017.760 http://dx.doi.org/10.1109/CVPR.2017.760 ]

Girdhar R, Fouhey D F, Rodriguez M and Gupta A. 2016. Learning a predictable and generative vector representation for objects//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 484-499[ DOI: 10.1007/978-3-319-46466-4_29 http://dx.doi.org/10.1007/978-3-319-46466-4_29 ]

Godard C, Mac Aodha O and Brostow G J. 2017. Unsupervised monocular depth estimation with left-right consistency//Proceedings of 2017 IEEE Conference on Computer Vision andPattern Recognition. Honolulu, USA: IEEE: 6602-6611[ DOI: 10.1109/CVPR.2017.699 http://dx.doi.org/10.1109/CVPR.2017.699 ]

Graham B, Engelcke M and van der Maaten L. 2018. 3D semantic segmentation with submanifold sparse convolutional networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9224-9232[ DOI: 10.1109/CVPR.2018.00961 http://dx.doi.org/10.1109/CVPR.2018.00961 ]

Groueix T, Fisher M, Kim V G, Russell B C and Aubry M. 2018. A papier-mache approach to learning 3D surface generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 216-224[ DOI: 10.1109/CVPR.2018.00030 http://dx.doi.org/10.1109/CVPR.2018.00030 ]

Gu X D, Fan Z W, Zhu S Y, Dai Z Z, Tan F T and Tan P. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2492-2501[ DOI: 10.1109/CVPR42600.2020.00257 http://dx.doi.org/10.1109/CVPR42600.2020.00257 ]

Güney F and Geiger A. 2015. Displets: resolving stereo ambiguities using object knowledge//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 4165-4175[ DOI: 10.1109/CVPR.2015.7299044 http://dx.doi.org/10.1109/CVPR.2015.7299044 ]

Guo K W, Xu F, Wang Y G, Liu Y B and Dai Q H. 2015. Robust non-rigid motion tracking and surface reconstruction using L0regularization//Proceeding of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3083-3091[ DOI: 10.1109/ICCV.2015.353 http://dx.doi.org/10.1109/ICCV.2015.353 ]

Guo K W, Xu F, Yu T, Liu X Y, Dai Q H and Liu Y B. 2017. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Transactions on Graphics, 36(3): #32[DOI:10.1145/3083722]

Guo X Y, Yang K, Yang W K, Wang X G and Li H S. 2019. Group-wise correlation stereo network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3268-3277[ DOI: 10.1109/CVPR.2019.00339 http://dx.doi.org/10.1109/CVPR.2019.00339 ]

Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2020. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence: #3005434[ DOI: 10.1109/TPAMI.2020.3005434 http://dx.doi.org/10.1109/TPAMI.2020.3005434 ]

Hambarde P and Murala S. 2020. S2DNet: depth estimation from single image and sparse samples. IEEE Transactions on Computational Imaging, 6: 806-817[DOI:10.1109/TCI.2020.2981761]

Han L, Zheng T, Xu L and Fang L. 2020. OccuSeg: occupancy-aware 3D instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2937-2946[ DOI: 10.1109/CVPR42600.2020.00301 http://dx.doi.org/10.1109/CVPR42600.2020.00301 ]

Han X G, Gao C and Yu Y Z. 2017. DeepSketch2Face: a deep learning based sketching system for 3D face and caricature modeling. ACM Transactions on Graphics, 36(4): #126[DOI:10.1145/3072959.3073629]

Han X F, Leung T, Jia Y Q, Sukthankar R and Berg A C. 2015. MatchNet: unifying feature and metric learning for patch-based matching//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3279-3286[ DOI: 10.1109/CVPR.2015.7298948 http://dx.doi.org/10.1109/CVPR.2015.7298948 ]

He K, Lu Y and Sclaroff S. 2018. Local descriptors optimized for average precision//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 596-605[ DOI: 10.1109/CVPR.2018.00069 http://dx.doi.org/10.1109/CVPR.2018.00069 ]

Helmer S and Lowe D. 2010. Using stereo for object recognition//Proceedings of 2010 IEEE International Conference on Robotics and Automation. Anchorage, USA: IEEE: 3121-3127[ DOI: 10.1109/ROBOT.2010.5509826 http://dx.doi.org/10.1109/ROBOT.2010.5509826 ]

Hou J, Dai A and Nieβner M. 2019. 3D-SIS: 3D semantic instance segmentation of RGB-D scans//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4416-4425[ DOI: 10.1109/CVPR.2019.00455 http://dx.doi.org/10.1109/CVPR.2019.00455 ]

Houseago C, Bloesch M and Leutenegger S. 2019. KO-fusion: dense visual SLAM with tightly-coupled kinematic and odometric tracking//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 4054-4060[ DOI: 10.1109/ICRA.2019.8793471 http://dx.doi.org/10.1109/ICRA.2019.8793471 ]

Hu Q Y, Yang B, Xie L H, Rosa S, Guo Y L, Wang Z H, Trigoni N and Markham A. 2020. RandLA-Net: efficient semantic segmentation of large-scale point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Location: Seattle, USA: IEEE: 11105-11114[ DOI: 10.1109/CVPR42600.2020.01112 http://dx.doi.org/10.1109/CVPR42600.2020.01112 ]

Huang Z X, Fan J M, Cheng S G, Yi S, Wang X G and Li H S. 2019a. HMS-Net: hierarchical multi-scale sparsity-invariant network for sparse depth completion. IEEE Transactions on Image Processing, 29: 3429-3441[DOI:10.1109/TIP.2019.2960589]

Huang Z Y, Xu Y, Shi J P, Zhou X W, Bao H J and Zhang G F. 2019b. Prior guided dropout for robust visual localization in dynamic environments//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2791-2800[ DOI: 10.1109/ICCV.2019.00288 http://dx.doi.org/10.1109/ICCV.2019.00288 ]

Innmann M, Zollhöfer M, Nieβner M, Theobalt C and Stamminger M. 2020. VolumeDeform: real-time volumetric non-rigid reconstruction//Proceedings of European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 362-379[ DOI: 10.1007/978-3-319-46484-8_22 http://dx.doi.org/10.1007/978-3-319-46484-8_22 ]

Jain A, Thormählen T, Ritschel T and Seidel H P. 2012. Exploring shape variations by 3D-model decomposition and part-based recombination. Computer Graphics Forum, 31: 631-640[DOI:10.1111/j.1467-8659.2012.03042.x]

Jaritz M, De Charette R, Wirbel E, Perrotton X and Nashashibi F. 2018. Sparse and dense data with CNNs: depth completion and semantic segmentation//Proceedings of 2018 International Conference on 3D Vision. Verona, Italy: IEEE: 52-60[ DOI: 10.1109/3DV.2018.00017 http://dx.doi.org/10.1109/3DV.2018.00017 ]

Jaritz M, Gu J Y and Su H. 2019. Multi-view PointNet for 3D scene understanding//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3995-4003[ DOI: 10.1109/ICCVW.2019.00494 http://dx.doi.org/10.1109/ICCVW.2019.00494 ]

Jiang L, Zhao H S, Liu S, Shen X Y, Fu C W and Jia J Y. 2019. Hierarchical point-edge interaction network for point cloud semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 10432-10440[ DOI: 10.1109/ICCV.2019.01053 http://dx.doi.org/10.1109/ICCV.2019.01053 ]

Jiang L, Zhao H S, Shi S S, Liu S, Fu C W and Jia J Y. 2020. PointGroup: dual-set point grouping for 3D instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE 4866-4875[ DOI: 10.1109/CVPR42600.2020.00492 http://dx.doi.org/10.1109/CVPR42600.2020.00492 ]

Jiang M Y, Wu Y R, Zhao T Q, Zhao Z L and Lu C W. 2018. PointSIFT: a SIFT-like network module for 3D point cloud semantic segmentation[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1807.00652.pdf https://arxiv.org/pdf/1807.00652.pdf

Jie Z Q, Wang P F, Ling Y G, Zhao B, Wei Y C, Feng J S and Liu W. 2018. Left-right comparative recurrent model for stereo matching//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3838-3846[ DOI: 10.1109/CVPR.2018.00404 http://dx.doi.org/10.1109/CVPR.2018.00404 ]

Kanazawa A, Zhang Y J, Felsen P and Malik J. 2019. Learning 3D human dynamics from video//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5607-5616[ DOI: 10.1109/CVPR.2019.00576 http://dx.doi.org/10.1109/CVPR.2019.00576 ]

Kanhere O and Rappaport T S. 2019. Position locationing for millimeter wave systems//Proceedings of 2018 IEEE Global Communications Conference. Abu Dhabi, United Arab Emirates: IEEE: 206-212[ DOI: 10.1109/GLOCOM.2018.8647983 http://dx.doi.org/10.1109/GLOCOM.2018.8647983 ]

Kar A, Häne C and Malik J. 2017. Learning a multi-view stereo machine//Advances in Neural Information Processing Systems. Long Beach, USA: [s. n.]: 364-375

Kendall A, Grimes M and Cipolla R. 2015. PoseNet: a convolutional network for real-time 6-DOF camera relocalization//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2938-2946[ DOI: 10.1109/ICCV.2015.336 http://dx.doi.org/10.1109/ICCV.2015.336 ]

Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A and Bry A. 2017. End-to-end learning of geometry and context for deep stereo regression//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 66-75[ DOI: 10.1109/ICCV.2017.17 http://dx.doi.org/10.1109/ICCV.2017.17 ]

Khattak S, Papachristos C and Alexis K. 2019. Keyframe-based direct thermal-inertial odometry//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 3563-3569[ DOI: 10.1109/ICRA.2019.8793927 http://dx.doi.org/10.1109/ICRA.2019.8793927 ]

Kiechle M, Hawe S and Kleinsteuber M. 2013. A joint intensity and depth co-sparse analysis model for depth map super-resolution//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1545-1552[ DOI: 10.1109/ICCV.2013.195 http://dx.doi.org/10.1109/ICCV.2013.195 ]

Kim K R and Kim C S. 2016. Adaptive smoothness constraints for efficient stereo matching using texture and edge information//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, USA: IEEE: 3429-3433[ DOI: 10.1109/ICIP.2016.7532996 http://dx.doi.org/10.1109/ICIP.2016.7532996 ]

Kim Y, Jeong J and Kim A. 2018. Stereo camera localization in 3D LiDAR maps//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE: 1-9[ DOI: 10.1109/IROS.2018.8594362 http://dx.doi.org/10.1109/IROS.2018.8594362 ]

Klein G and Murray D. 2007. Parallel tracking and mapping for small AR workspaces//Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Nara, Japan: IEEE: 225-234[ DOI: 10.1109/ISMAR.2007.4538852 http://dx.doi.org/10.1109/ISMAR.2007.4538852 ]

Knöbelreiter P, Reinbacher C, Shekhovtsov A and Pock T. 2017. End-to-end training of hybrid CNN-CRF models for stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1456-1465[ DOI: 10.1109/CVPR.2017.159 http://dx.doi.org/10.1109/CVPR.2017.159 ]

Kusupati U, Cheng S, Chen R and Su H. 2020. Normal assisted stereo depth estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2186-2196[ DOI: 10.1109/cvpr42600.2020.00226 http://dx.doi.org/10.1109/cvpr42600.2020.00226 ]

Lahoud J, Ghanem B, Oswald M R and Pollefeys M. 2019. 3D instance segmentation via multi-task metric learning//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9255-9265[ DOI: 10.1109/ICCV.2019.00935 http://dx.doi.org/10.1109/ICCV.2019.00935 ]

Laina I, Rupprecht C, Belagiannis V, Tombari F and Navab N. 2016. Deeper depth prediction with fully convolutional residual networks//Proceedings of the 4th International Conference on 3DVision. Stanford, USA: IEEE: 239-248[ DOI: 10.1109/3DV.2016.32 http://dx.doi.org/10.1109/3DV.2016.32 ]

Landrieu L and Boussaha M. 2019. Point cloud oversegmentation with graph-structured deep metric learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7432-7441[ DOI: 10.1109/CVPR.2019.00762 http://dx.doi.org/10.1109/CVPR.2019.00762 ]

Landrieu L and Simonovsky M. 2018. Large-scale point cloud semantic segmentation with superpoint graphs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4558-4567[ DOI: 10.1109/CVPR.2018.00479 http://dx.doi.org/10.1109/CVPR.2018.00479 ]

Lawin F J, Danelljan M, Tosteberg P, Bhat G, Khan F S and Felsberg M. 2017. Deep projective 3D semantic segmentation//Proceedings of International Conference on Computer Analysis of Images and Patterns. Ystad, Sweden: Springer: 95-107[ DOI: 10.1007/978-3-319-64689-3_8 http://dx.doi.org/10.1007/978-3-319-64689-3_8 ]

Lee S H and Civera J. 2019. Loosely-coupled semi-direct monocular SLAM. IEEE Robotics and Automation Letters, 4(2): 399-406[DOI:10.1109/LRA.2018.2889156]

Lee W, Eckenhoff K, Geneva P and Huang G Q. 2020. Intermittent GPS-aided VIO: online initialization and calibration//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE: 5724-5731[ DOI: 10.1109/ICRA40945.2020.9197029 http://dx.doi.org/10.1109/ICRA40945.2020.9197029 ]

Leutenegger S, Lynen S, Bosse M. 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3): 314-334

Li B, Shen C H, Dai Y C, Van Den Hengel A and He M Y. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1119-1127[ DOI: 10.1109/CVPR.2015.7298715 http://dx.doi.org/10.1109/CVPR.2015.7298715 ]

Li B Y, Zou D P, Sartori D, Pei L and Yu W X. 2020a. TextSLAM: visual SLAM with planar text features//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE: 2102-2108[ DOI: 10.1109/ICRA40945.2020.9197233 http://dx.doi.org/10.1109/ICRA40945.2020.9197233 ]

Li C J, Pan H, Liu Y, Tong X, Sheffer A and Wang W P. 2018a. Robust flow-guided neural prediction for sketch-based freeform surface modeling. ACM Transactions on Graphics, 37(6): #238[DOI:10.1145/3272127.3275051]

Li J Y, Bao H J and Zhang G. 2019a. Rapid and robust monocular visual-inertial initialization with gravity estimation via vertical edges//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE: 6230-6236[ DOI: 10.1109/IROS40897.2019.8968456 http://dx.doi.org/10.1109/IROS40897.2019.8968456 ]

Li J, Niu C J and Xu K. 2020c. Learning part generation and assembly for structure-aware shape synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 11362-11369[DOI:10.1609/aaai.v34i07.6798]

Li J Q, Pei L, Zou D P, Xia S P C, Wu Q, Li T, Sun Z and Yu W X. 2020b. Attention-SLAM: a visual monocular SLAM learning from human gaze[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2009.06886.pdf https://arxiv.org/pdf/2009.06886.pdf

Li J, Xu K, Chaudhuri S, Yumer E, Zhang H and Guibas L. 2017. GRASS: generative recursive autoencoders for shape structures. ACM Transactions on Graphics, 36(4): #52[DOI:10.1145/3072959.3073637]

Li J Y, Yang B B, Huang K, Zhang G F and Bao H J. 2019b. Robust and efficient visual-inertial odometry with multi-plane priors//Proceedings of Chinese Conference on Pattern Recognition and Computer Vision. Xi'an, China: Springer: 283-295[ DOI: 10.1007/978-3-030-31726-3_24 http://dx.doi.org/10.1007/978-3-030-31726-3_24 ]

Li M Y, Patil A G, Xu K, Chaudhuri S, Khan O, Shamir A, Tu C H, Chen B Q, Cohen-Or D and Zhang H. 2019c. GRAINS: generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics, 38(2): #12[DOI:10.1145/3303766]

Li R H, Wang S, Long Z Q and Gu D B. 2018b. UnDeepVO: monocular visual odometry through unsupervised deep learning//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 7286-7291[ DOI: 10.1109/ICRA.2018.8461251 http://dx.doi.org/10.1109/ICRA.2018.8461251 ]

Li S K, Xue F, Wang X, Yan Z K and Zha H B. 2019d. Sequential adversarial learning for self-supervised deep visual odometry//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2851-2860[ DOI: 10.1109/ICCV.2019.00294 http://dx.doi.org/10.1109/ICCV.2019.00294 ]

Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018c. PointCNN: convolution on X -transformed points//Advances in Neural Information Processing Systems. Montréal, Canada: [s. n.]: 820-830

Li Y, Ushiku Y and Harada T. 2019e. Pose graph optimization for unsupervised monocular visual odometry//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 5439-5445[ DOI: 10.1109/ICRA.2019.8793706 http://dx.doi.org/10.1109/ICRA.2019.8793706 ]

Liang M, Guo X, Li H, Wang X and Song Y. 2019. Unsupervised cross-spectral stereo matching by learning to synthesize//Proceedings of the AAAI Conference on Artificial Intelligence, 33: 8706-8713[ DOI: 10.1609/aaai.v33i01.33018706 http://dx.doi.org/10.1609/aaai.v33i01.33018706 ]

Liang Z F, Feng Y L, Guo Y L, Liu H Z, Chen W, Qiao L B, Zhou L and Zhang J F. 2018. Learning for disparity estimation through feature constancy//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2811-2820[ DOI: 10.1109/CVPR.2018.00297 http://dx.doi.org/10.1109/CVPR.2018.00297 ]

Liang Z F, Guo Y L, Feng Y L, Chen W, Qiao L B, Zhou L, Zhang J F and Liu H Z. 2021. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1): 300-315[DOI:10.1109/TPAMI.2019.2928550]

Liao M, Lu F X, Zhou D F, Zhang S B, Li W and Yang R G. 2020. DVI: depth guided video inpainting for autonomous driving[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2007.08854.pdf https://arxiv.org/pdf/2007.08854.pdf

Liao Y Y, Huang L C, Wang Y, Kodagoda S, Yu Y and Liu Y. 2017. Parse geometry from a line: monocular depth estimation with partial laser observation//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE: 5059-5066[ DOI: 10.1109/ICRA.2017.7989590 http://dx.doi.org/10.1109/ICRA.2017.7989590 ]

Liao Z W. 2016. Research on Autonomous Mapping and Navigation Technology in Indoor Environment based on Lidar and MEMS Intertial Components. Nanjing: Nanjing University of Aeronautics and Astronautics

廖自威. 2016. 激光雷达/微惯性室内自主建图与导航技术研究. 南京: 南京航空航天大学

Liebel L and Körner M. 2019. MultiDepth: single-image depth estimation via multi-task regression and classification//Proceedings of 2019 IEEE Intelligent Transportation Systems Conference. Auckland, New Zealand: IEEE: 1440-1447[ DOI: 10.1109/ITSC.2019.8917177 http://dx.doi.org/10.1109/ITSC.2019.8917177 ]

Liu F Y, Li S P, Zhang L Q, Zhou C H, Ye R T, Wang Y B and Lu J W. 2017a. 3DCNN-DQN-RNN: a deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5679-5688[ DOI: 10.1109/ICCV.2017.605 http://dx.doi.org/10.1109/ICCV.2017.605 ]

Liu H M, Chen M Y, Zhang G F and Bao H J. 2018. Ice-ba: Incremental, consistent and efficient bundle adjustment for visual-inertial slam//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1974-1982

Liu F Y, Shen C H, Lin G S and Reid I. 2016. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10): 2024-2039[DOI:10.1109/TPAMI.2015.2505283]

Liu H, Guo Y L, Ma Y N, Lei Y J and Wen G J. 2020a. Semantic context encoding for accurate 3D point cloud segmentation. IEEE Transactions on Multimedia[ DOI: 10.1109/TMM.2020.3007331 http://dx.doi.org/10.1109/TMM.2020.3007331 ]

Liu L, Li H D and Dai Y C. 2017b. Efficient global 2D-3D matching for camera localization in a large-scale 3D map//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2391-2400[ DOI: 10.1109/ICCV.2017.260 http://dx.doi.org/10.1109/ICCV.2017.260 ]

Liu L, Li H D and Dai Y C. 2019a. Stochastic attraction-repulsion embedding for large scale image localization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2570-2579[ DOI: 10.1109/ICCV.2019.00266 http://dx.doi.org/10.1109/ICCV.2019.00266 ]

Liu P P, King I, Lyu M R and Xu J. 2020b. Flow2Stereo: effective self-supervised learning of optical flow and stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6648-6657[ DOI: 10.1109/CVPR42600.2020.00668 http://dx.doi.org/10.1109/CVPR42600.2020.00668 ]

Liu Y C, Fan B, Xiang S M and Pan C H. 2019b. Relation-shape convolutional neural network for point cloud analysis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8887-8896[ DOI: 10.1109/CVPR.2019.00910 http://dx.doi.org/10.1109/CVPR.2019.00910 ]

Liu Y, Shen Z H, Lin Z X, Peng S D, Bao H J and Zhou X W. 2019c. GIFT: learning transformation-invariant dense visual descriptors via group CNNs//Advances in Neural Information Processing Systems. Vancouver, Canada: [s. n.]: 6992-7003

Liu Y B, Ye G Z, Wang Y G, Dai Q H and Theobalt C. 2014. Human performance capture using multiple handheld kinects//Computer Vision and Machine Learning with RGB-D Sensors. Switzerland: Springer: 91-108[ DOI: 10.1007/978-3-319-08651-4_5 http://dx.doi.org/10.1007/978-3-319-08651-4_5 ]

Liu Z J, Tang H T, Lin Y J and Han S. 2019d. Point-voxel CNN for efficient 3D deep learning//Advances in Neural Information Processing Systems. Vancouver, Canada: [s. n.]

Lu C H, Uchiyama H, Thomas D, Shimada A and Taniguchi R I. 2018. Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11): #1844[DOI:10.3390/rs10111844]

Lu G Y, Yan Y, Ren L, Song J K, Sebe N and Kambhamettu C. 2015. Localize me anywhere, anytime: a multi-task point-retrieval approach//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2434-2442[ DOI: 10.1109/ICCV.2015.280 http://dx.doi.org/10.1109/ICCV.2015.280 ]

Lu K Y, Barnes N, Anwar S and Zheng L. 2020. From depth what can you see? Depth completion via auxiliary image reconstruction//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11303-11312[ DOI: 10.1109/CVPR42600.2020.01132 http://dx.doi.org/10.1109/CVPR42600.2020.01132 ]

Luo W J, Schwing A G and Urtasun R. 2016. Efficient deep learning for stereo matching//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, USA: IEEE: 5695-5703[ DOI: 10.1109/CVPR.2016.614 http://dx.doi.org/10.1109/CVPR.2016.614 ]

Luo Y, Ren J, Lin M D, Pang J H, Sun W X, Li H S and Lin L. 2018a. Single view stereo matching//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 155-163[ DOI: 10.1109/CVPR.2018.00024 http://dx.doi.org/10.1109/CVPR.2018.00024 ]

Luo Z X, Shen T W, Zhou L, Zhu S Y, Zhang R, Yao Y, Fang T and Quan L. 2018b. GeoDesc: learning local descriptors by integrating geometry constraints//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 170-185[ DOI: 10.1007/978-3-030-01240-3_11 http://dx.doi.org/10.1007/978-3-030-01240-3_11 ]

Luo Z X, Zhou L, Bai X Y, Chen H K, Zhang J H, Yao Y, Li S W, Fang T and Quan L. 2020. Aslfeat: learning local features of accurate shape and localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6588-6597[ DOI: 10.1109/CVPR42600.2020.00662 http://dx.doi.org/10.1109/CVPR42600.2020.00662 ]

Lynen S, Achtelik M W, Weiss S, Chli M and Siegwart R. 2013. A robust and modular multi-sensor fusion approach applied to MAV navigation//Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE: 3923-3929[ DOI: 10.1109/IROS.2013.6696917 http://dx.doi.org/10.1109/IROS.2013.6696917 ]

Lynen S, Sattler T, Bosse M, Hesch J, Pollefeys M and Siegwart R. 2015. Get out of my lab: large-scale, real-Time visual-inertial localization//Proceedings of Robotics: Science and Systems. Rome, Italy: [s. n.][ DOI: 10.15607/RSS.2015.XI.037 http://dx.doi.org/10.15607/RSS.2015.XI.037 ]

Ma F C and Karaman S. 2018. Sparse-to-dense: depth prediction from sparse depth samples and a single image//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 4796-4803[ DOI: 10.1109/icra.2018.8460184 http://dx.doi.org/10.1109/icra.2018.8460184 ]

Ma F C, Cavalheiro G V and Karaman S. 2019. Self-supervised sparse-to-dense: self-supervised depth completion from LiDAR and monocular camera//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 3288-3295[ DOI: 10.1109/ICRA.2019.8793637 http://dx.doi.org/10.1109/ICRA.2019.8793637 ]

Ma Y N, Guo Y L, Liu H, Lei Y J and Wen G J. 2020. Global context reasoning for semantic segmentation of 3D point clouds//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 2920-2929[ DOI: 10.1109/WACV45572.2020.9093411 http://dx.doi.org/10.1109/WACV45572.2020.9093411 ]

Mac Aodha O, Campbell N D F, Nair A and Brostow G J. 2012. Patch based synthesis for single depth image super-resolution//Proceedings of European Conference on Computer Vision. Florence, Italy: Springer: 71-84[ DOI: 10.1007/978-3-642-33712-3_6 http://dx.doi.org/10.1007/978-3-642-33712-3_6 ]

Maddern W, Stewart A D and Newman P. 2014. LAPS-Ⅱ: 6-DOF day and night visual localisation with prior 3D structure for autonomous road vehicles//Proceedings of 2014 IEEE Intelligent Vehicles Symposium Proceedings. Dearborn, USA: IEEE: 330-337[ DOI: 10.1109/IVS.2014.6856471 http://dx.doi.org/10.1109/IVS.2014.6856471 ]

Mao J G, Wang X G and Li H S.2019. Interpolated convolutional networks for 3D point cloud understanding//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1578-1587[ DOI: 10.1109/ICCV.2019.00166 http://dx.doi.org/10.1109/ICCV.2019.00166 ]

Mascaro R, Teixeira L, Hinzmann T, Siegwart R and Chli M. 2018. GOMSF: graph-optimization based multi-sensor fusion for robust UAV pose estimation//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 1421-1428[ DOI: 10.1109/ICRA.2018.8460193 http://dx.doi.org/10.1109/ICRA.2018.8460193 ]

Matsuo K and Aoki Y. 2015. Depth image enhancement using local tangent plane approximations//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3574-3583[ DOI: 10.1109/CVPR.2015.7298980 http://dx.doi.org/10.1109/CVPR.2015.7298980 ]

Matusik W, Buehler C, Raskar R, Gortler S J and McMillan L. 2000. Image-based visual hulls//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New York, United States: ACM: 369-374[ DOI: 10.1145/344779.344951 http://dx.doi.org/10.1145/344779.344951 ]

Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A and Brox T. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4040-4048[ DOI: 10.1109/CVPR.2016.438 http://dx.doi.org/10.1109/CVPR.2016.438 ]

Menze M and Geiger A. 2015. Object scene flow for autonomous vehicles//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3061-3070[ DOI: 10.1109/CVPR.2015.7298925 http://dx.doi.org/10.1109/CVPR.2015.7298925 ]

Mishchuk A, Mishkin D, Radenović F and Matas J. 2017. Working hard to know your neighbor's margins: local descriptor learning loss//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, United States: Curran Associates Inc. : 4829-4840

Mitra N J, Wand M, Zhang H, Cohen-Or D, Kim V and Huang Q X. 2014. Structure-aware shape processing//ACM SIGGRAPH 2014 Courses. Vancouver, Canada: ACM: 1-21[ DOI: 10.1145/2614028.2615401 http://dx.doi.org/10.1145/2614028.2615401 ]

Mo K C, Guerrero P, Li Y, Su H, Wonka P, Mitra N and Guibas L J. 2019a. StructEdit: learning structural shape variations[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1911.11098.pdf https://arxiv.org/pdf/1911.11098.pdf

Mo K C, Guerrero P, Li Y, Su H, Wonka P, Mitra N and Guibas L J. 2019b. StructurEnet: hierarchical graph networks for 3D shape generation[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1908.00575.pdf https://arxiv.org/pdf/1908.00575.pdf

Mourikis A I and Roumeliotis S I. 2007. A multi-state constraint kalman filter for vision-aided inertial navigation//Proceedings of 2007 IEEE International Conference on Robotics and Automation. Rome, Italy: IEEE: 3565-3572[ DOI: 10.1109/ROBOT.2007.364024 http://dx.doi.org/10.1109/ROBOT.2007.364024 ]

Mur-Artal R, Montiel J M M and Tardós J D. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5): 1147-1163[DOI:10.1109/TRO.2015.2463671]

Mur-Artal R and Tardós J D. 2017. ORB-SLAM2: an open-source SLAM system for monocular, stereo,and rgb-d cameras. IEEE Transactions on Robotics, 33(5): 1255-1262[DOI:10.1109/TRO.2017.2705103]

Neubert P, Schubert S and Protzel P. 2017. Sampling-based methods for visual navigation in 3D maps by synthesizing depth images//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver BC, Canada: IEEE: 2492-2498[ DOI: 10.1109/IROS.2017.8206067 http://dx.doi.org/10.1109/IROS.2017.8206067 ]

Newcombe R A, Fox D and Seitz S M. 2015. DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 343-352[ DOI: 10.1109/CVPR.2015.7298631 http://dx.doi.org/10.1109/CVPR.2015.7298631 ]

Ng T, Balntas V, Tian Y and Mikolajczyk K. 2020. SOLAR: second-order loss and attention for image retrieval[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2001.08972.pdf https://arxiv.org/pdf/2001.08972.pdf

Niu C J, Li J and Xu K. 2018. Im2Struct: recovering 3D shape structure from a single RGB image//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4521-4529[ DOI: 10.1109/CVPR.2018.00475 http://dx.doi.org/10.1109/CVPR.2018.00475 ]

Pang J H, Sun W X, Ren J S J, Yang C X and Yan Q. 2017. Cascade residual learning: a two-stage convolutional neural network for stereo matching//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE: 878-886[ DOI: 10.1109/ICCVW.2017.108 http://dx.doi.org/10.1109/ICCVW.2017.108 ]

Park C, Kim S, Moghadam P, Guo J D, Sridharan S and Fookes C. 2019a. Robust photogeometric localization over time for map-centric loop closure. IEEE Robotics and Automation Letters, 4(2): 1768-1775[DOI:10.1109/LRA.2019.2895262]

Park J J, Florence P, Straub J, Newcombe R and Lovegrove S. 2019b. DeepSDF: learning continuous signed distance functions for shape representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 165-174[ DOI: 10.1109/CVPR.2019.00025 http://dx.doi.org/10.1109/CVPR.2019.00025 ]

Park J, Joo K, Hu Z, Liu C K and Kweon I S. 2020. Non-local spatial propagation network for depth completion//Proceedings of the European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 120-136[ DOI: 10.1007/978-3-030-58601-0_8 http://dx.doi.org/10.1007/978-3-030-58601-0_8 ]

Pascoe G, Maddern W, Stewart A D and Newman P. 2015. FARLAP: fast robust localisation using appearance priors//Proceedings of 2015 IEEE International Conference on Robotics and Automation. Seattle, USA: IEEE, 2015: 6366-6373[DOI:10.1109/ICRA.2015.7140093]

Patil V, Van Gansbeke W, Dai D X and Van Gool L. 2020. Don't forget the past: recurrent depth estimation from monocular video. IEEE Robotics and Automation Letters, 5(4): 6813-6820[DOI:10.1109/LRA.2020.3017478]

Pham Q H, Nguyen T, Hua B S, Roig G and Yeung S K. 2019. JSIS3D: joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8819-8828[ DOI: 10.1109/CVPR.2019.00903 http://dx.doi.org/10.1109/CVPR.2019.00903 ]

Poggi M and Mattoccia S. 2017. Learning to predict stereo reliability enforcing local consistency of confidence maps//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4541-4550[ DOI: 10.1109/CVPR.2017.483 http://dx.doi.org/10.1109/CVPR.2017.483 ]

Qi C R, Su H, Mo K C and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[ DOI: 10.1109/CVPR.2017.16 http://dx.doi.org/10.1109/CVPR.2017.16 ]

Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space//Advances in Neural Information Processing Systems. Long Beach, USA: [s. n.]

Qin T, Li P L and Shen S J. 2018. VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4): 1004-1020[DOI:10.1109/TRO.2018.2853729]

Qin T, Pan J, Cao S Z and Shen S J. 2019. A general optimization-based framework for local odometry estimation with multiple sensors[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1901.03638.pdf https://arxiv.org/pdf/1901.03638.pdf

Radenović F, Tolias G and Chum O. 2019. Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7): 1655-1668[DOI:10.1109/TPAMI.2018.2846566]

Rappaport TS, Xing Y C, Kanhere O, Ju S H, Madanayake A, Mandal S, Alkhateeb A and Trichopoulos G C. 2019. Wireless communications and applications above 100 GHz: opportunities and challenges for 6G and beyond. IEEE Access, 7: 78729-78757[DOI:10.1109/ACCESS.2019.2921522]

Revaud J, Weinzaepfel P, De Souza C, Pion N, Csurka G, Cabon Y and Humenberger M. 2019. R2D2: repeatable and reliable detector and descriptor[EB/OL]. [2021-01-21] . https://arxov.org/pdf/1906.06195.pdf https://arxov.org/pdf/1906.06195.pdf

Rosu R A, Schutt P, Quenzel J and Behnke S. 2019. LatticeNet: fast point cloud segmentation using permutohedral lattices[EB/OL]. [2021-01-21] . https://arxir.org/pdf/1912.05905.pdf https://arxir.org/pdf/1912.05905.pdf

Roy A and Todorovic S. 2016. Monocular depth estimation using neural regression forest//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5506-5514[ DOI: 10.1109/CVPR.2016.594 http://dx.doi.org/10.1109/CVPR.2016.594 ]

Saito S, Huang Z, Natsume R, Morishima S, Li H and Kanazawa A. 2019. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2304-2314[ DOI: 10.1109/ICCV.2019.00239 http://dx.doi.org/10.1109/ICCV.2019.00239 ]

Saito S, Simon T, Saragih J and Joo H. 2020. PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 81-90[ DOI: 10.1109/CVPR42600.2020.00016 http://dx.doi.org/10.1109/CVPR42600.2020.00016 ]

Saputra M R U, de Gusmao P P B, Wang S, Markham A and Trigoni N. 2019. Learning monocular visual odometry through geometry-aware curriculum learning//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 3549-3555[ DOI: 10.1109/ICRA.2019.8793581 http://dx.doi.org/10.1109/ICRA.2019.8793581 ]

Sarlin P E, Cadena C, Siegwart R and Dymczyk M. 2019. From coarse to fine: robust hierarchical localization at large scale//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12708-12717[ DOI: 10.1109/CVPR.2019.01300 http://dx.doi.org/10.1109/CVPR.2019.01300 ]

Sarlin P E, Debraine F, Dymczyk M, Siegwart R and Cadena C. 2018. Leveraging deep visual descriptors for hierarchical efficient localization[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1809.01019.pdf https://arxiv.org/pdf/1809.01019.pdf

Sarlin P E, DeTone D, Malisiewicz T and Rabinovich A. 2020. Superglue: learning feature matching with graph neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4937-4946[ DOI: 10.1109/CVPR42600.2020.00499 http://dx.doi.org/10.1109/CVPR42600.2020.00499 ]

Sattler T, Leibe B and Kobbelt L. 2017. Efficient and effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1744-1756[DOI:10.1109/TPAMI.2016.2611662]

Sattler T, Maddern W, Toft C, Torii A, Hammarstrand L, Stenborg E, Safari D, Okutomi M, Pollefeys M, Sivic J and Kahl F. 2018. Benchmarking 6DOF outdoor visual localization in changing conditions//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8601-8610[ DOI: 10.1109/CVPR.2018.00897 http://dx.doi.org/10.1109/CVPR.2018.00897 ]

Sattler T, Zhou Q J, Pollefeys M and Leal-TaixéL. 2019. Understanding the limitations of CNN-based absolute camera pose regression//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3297-3307[ DOI: 10.1109/CVPR.2019.00342 http://dx.doi.org/10.1109/CVPR.2019.00342 ]

Saxena A, Sun M and Ng A Y. 2009. Make3D: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5): 824-840[DOI:10.1109/TPAMI.2008.132]

Saxena A, Sung C H and Ng A Y. 2005. Learning depth from single monocular images//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press: 1161-1168

Schmid K, Tomic T, Ruess F, Hirschmüller H and Suppa M. 2013. Stereo vision based indoor/outdoor navigation for flying robots//Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE: 3955-3962[ DOI: 10.1109/IROS.2013.6696922 http://dx.doi.org/10.1109/IROS.2013.6696922 ]

Seki A and Pollefeys M. 2017. SGM-Nets: semi-global matching with neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6640-6649[ DOI: 10.1109/CVPR.2017.703 http://dx.doi.org/10.1109/CVPR.2017.703 ]

Shamwell E J, Lindgren K, Leung S and Nothwang W D. 2020. Unsupervised deep visual-inertial odometry with online error correction for RGB-D imagery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10): 2478-2493[DOI:10.1109/TPAMI.2019.2909895]

Shao W Z, Vijayarangan S, Li C and Kantor G. 2019. Stereo visual inertial LiDAR simultaneous localization and mapping//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE: 370-377[ DOI: 10.1109/IROS40897.2019.8968012 http://dx.doi.org/10.1109/IROS40897.2019.8968012 ]

Shean D E, Alexandrov O, Moratto Z M, Smith B E, Joughin I R, Porter C and Morin P. 2016. An automated, open-source pipeline for mass production of digital elevation models (DEMs) from very-high-resolution commercial stereo satellite imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 116: 101-117[DOI:10.1016/j.isprsjprs.2016.03.012]

Sheng L, Xu D, Ouyang W L and Wang X G. 2019. Unsupervised collaborative learning of keyframe detection and visual odometry towards monocular deep SLAM//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4301-4310[ DOI: 10.1109/ICCV.2019.00440 http://dx.doi.org/10.1109/ICCV.2019.00440 ]

Shi T X, Cui H N, Song Z and Shen S H. 2020. Dense semantic 3D map based long-term visual localization with hybrid features[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2005.10766.pdf https://arxiv.org/pdf/2005.10766.pdf

Shi T X, Shen S H, Gao X and Zhu L J. 2019. Visual localization using sparse semantic 3D map//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 315-319[ DOI: 10.1109/ICIP.2019.8802957 http://dx.doi.org/10.1109/ICIP.2019.8802957 ]

Shivakumar S S, Nguyen T, Miller I D, Chen S W, Kumar V and Taylor C J. 2019. DFuseNet: deep fusion of RGB and sparse depth information for image guided dense depth completion//Proceedings of 2019 IEEE Intelligent Transportation Systems Conference. Auckland, New Zealand: IEEE: 13-20[ DOI: 10.1109/ITSC.2019.8917294 http://dx.doi.org/10.1109/ITSC.2019.8917294 ]

Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P and Moreno-Noguer F. 2015. Discriminative learning of deep convolutional feature point descriptors//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 118-126[ DOI: 10.1109/ICCV.2015.22 http://dx.doi.org/10.1109/ICCV.2015.22 ]

Sinha A, Bai J and Ramani K. 2016. Deep learning 3D shape surfaces using geometry images//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 223-240

Song X, Zhao X, Hu H W and Fang L J. 2018. EdgeStereo: a context integrated residual pyramid network for stereo matching//Proceedings of 2018 Asian Conference on Computer Vision. Perth, Australia: Springer: 20-35[ DOI: 10.1007/978-3-030-20873-8_2 http://dx.doi.org/10.1007/978-3-030-20873-8_2 ]

Stewart A D and Newman P. 2012. LAPS-localisation using appearance of prior structure: 6-DoF monocular camera localisation using prior pointclouds//Proceedings of 2012 IEEE International Conference on Robotics and Automation. Saint Paul, USA: IEEE: 2625-2632[ DOI: 10.1109/ICRA.2012.6224750 http://dx.doi.org/10.1109/ICRA.2012.6224750 ]

Strasdat H, Davison A J, Montiel J M M and Konolige K. 2011. Double window optimisation for constant time visual SLAM//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE: 2352-2359[ DOI: 10.1109/ICCV.2011.6126517 http://dx.doi.org/10.1109/ICCV.2011.6126517 ]

Strasdat H, Montiel J M M and Davison A J. 2010. Scale drift-aware large scale monocular SLAM//Robotics: Science and Systems VI. Zaragoza, Spain: [s. n.]

Su H, Jampani V, Sun D Q, Maji S, Kalogerakis E, Yang M H and Kautz J. 2018. SPLATNet: sparse lattice networks for point cloud processing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2530-2539[ DOI: 10.1109/CVPR.2018.00268 http://dx.doi.org/10.1109/CVPR.2018.00268 ]

Su H, Maji S, Kalogerakis E and Learned-Miller E. 2015. Multi-view convolutional neural networks for 3D shape recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 945-953[ DOI: 10.1109/ICCV.2015.114 http://dx.doi.org/10.1109/ICCV.2015.114 ]

Su Z, Xu L, Zheng Z R, Yu T, Liu Y B and Fang L. 2020. RobustFusion: human volumetric capture with data-driven visual cues using a RGBD camera//Proceedings of European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 246-264[ DOI: 10.1007/978-3-030-58548-8_15 http://dx.doi.org/10.1007/978-3-030-58548-8_15 ]

Sun X, Xie Y F, Luo P and Wang L. 2017. A dataset for benchmarking image-based localization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5641-5649[ DOI: 10.1109/CVPR.2017.598 http://dx.doi.org/10.1109/CVPR.2017.598 ]

Svärm L, Enqvist O, Kahl F and Oskarsson M. 2017. City-scale localization for cameras with known vertical direction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7): 1455-1461[DOI:10.1109/TPAMI.2016.2598331]

Tan F T, Zhu H, Cui Z P, Zhu S Y, Pollefeys M and Tan P. 2020. Self-supervised human depth estimation from monocular videos//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 647-656[ DOI: 10.1109/CVPR42600.2020.00073 http://dx.doi.org/10.1109/CVPR42600.2020.00073 ]

Tang F L, Li H P and Wu Y H. 2019. FMD stereo SLAM: fusing MVG and direct formulation towards accurate and fast stereo SLAM//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 133-139[ DOI: 10.1109/ICRA.2019.8793664 http://dx.doi.org/10.1109/ICRA.2019.8793664 ]

Taniai T, Matsushita Y, Sato Y and Naemura T. 2018. Continuous 3D label stereo matching using local expansion moves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11): 2725-2739[DOI:10.1109/TPAMI.2017.2766072]

Tatarchenko M, Park J, Koltun V and Zhou Q Y. 2018. Tangent convolutions for dense prediction in 3D//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3887-389[ DOI: 10.1109/CVPR.2018.00409 http://dx.doi.org/10.1109/CVPR.2018.00409 ]

Tateno K, Tombari F, Laina I and Navab N. 2017. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6565-6574[ DOI: 10.1109/CVPR.2017.695 http://dx.doi.org/10.1109/CVPR.2017.695 ]

Tchapmi L, Choy C, Armeni I, Gwak J and Savarese S. 2017. SEGCloud: semantic segmentation of 3D point clouds//Proceedings of 2017 International Conference on 3D Vision. Qingdao, China: IEEE: 537-547[ DOI: 10.1109/3DV.2017.00067 http://dx.doi.org/10.1109/3DV.2017.00067 ]

Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F and Guibas L. 2019. KPConv: flexible and deformable convolution for point clouds//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6410-6419[ DOI: 10.1109/ICCV.2019.00651 http://dx.doi.org/10.1109/ICCV.2019.00651 ]

Tian Y R, Fan B and Wu F C. 2017. L2-net: deep learning of discriminative patch descriptor in euclidean space//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6128-6136[ DOI: 10.1109/CVPR.2017.649 http://dx.doi.org/10.1109/CVPR.2017.649 ]

Tian Y R, Yu X, Fan B, Wu F C, Heijnen H and Balntas V. 2019. SOSNet: second order similarity regularization for local descriptor learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11008-11017[ DOI: 10.1109/CVPR.2019.01127 http://dx.doi.org/10.1109/CVPR.2019.01127 ]

Torii A, ArandjelovićR, Sivic J, Okutomi M and Pajdla T. 2015. 24/7 place recognition by view synthesis//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1808-1817[ DOI: 10.1109/CVPR.2015.7298790 http://dx.doi.org/10.1109/CVPR.2015.7298790 ]

Uhrig J, Schneider N, Schneider L, Franke U, Brox T and Geiger A. 2017. Sparsity invariant CNNs//Proceedings of 2017 International Conference on 3D Vision. Qingdao, China: IEEE: 11-20[ DOI: 10.1109/3DV.2017.00012 http://dx.doi.org/10.1109/3DV.2017.00012 ]

Ummenhofer B, Zhou H Z, Uhrig J, Mayer N, Ilg E, Dosovitskiy A and Brox T. 2017. DeMoN: depth and motion network for learning monocular stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5622-5631[ DOI: 10.1109/CVPR.2017.596 http://dx.doi.org/10.1109/CVPR.2017.596 ]

Vlasic D, Peers R, Baran I, Debevec P, Popović J, Rusinkiewicz S and Matusik W. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics, 28(5): #174[DOI:10.1145/1618452.1618520]

Wang B, Chen C H, Lu C X, Zhao P J, Trigoni N and Markham A. 2020a. AtLoc: attention guided camera localization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(6): 10393-10401[DOI:10.1609/aaai.v34i06.6608]

Wang H C, Liu Q, Yue X Y, Lasenby J and Kusner M J. 2020b. Pre-training by completing point clouds[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2005.10766.pdf https://arxiv.org/pdf/2005.10766.pdf

Wang H, Schor N, Hu R Z, Huang H B, Cohen-Or D and Huang H. 2020c. Global-to-local generative model for 3D shapes. ACM Transactions on Graphics, 37(6): #214[DOI:10.1145/3272127.3275025]

Wang L G, Guo Y L, Wang Y Q, Liang Z F, Lin Z P, Yang J G and An W. 2020d. Parallax attention for unsupervised stereo correspondence learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020: #3026899[DOI:10.1109/TPAMI.2020.3026899]

Wang L, Huang Y C, Hou Y L, Zhang S M and Shan J. 2019a. Graph attention convolution for point cloud semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10288-10297[ DOI: 10.1109/CVPR.2019.01054 http://dx.doi.org/10.1109/CVPR.2019.01054 ]

Wang P S, Liu Y, Guo Y X, Sun C Y and Tong X. 2017a. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics, 36(4): #72[DOI:10.1145/3072959.3073608]

Wang P S, Sun C Y, Liu Y and Tong X. 2018a. Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Transactions on Graphics, 37(6): #217[DOI:10.1145/3272127.3275050]

Wang Q Y, Yan Z K, Wang J Q, Xue F, Ma W and Zha H B. 2020e. Line flow based SLAM. [EB/OL]. [2021-02-03] . https://arxiv.org/pdf/2009.09972.pdf https://arxiv.org/pdf/2009.09972.pdf

Wang Q, Zhou X, Hariharan B, Snavely N. 2020. Learning feature descriptors using camera pose supervision//Proceedings of 2020 European Conference on Computer Vision. [s. l.]: Springer, Cham: 757-774

Wang Q Q, Zhou X W, Hariharan B and Snavely N. 2020f. Learning feature descriptors using camera pose supervision[EB/OL]. [2021-02-03] . https://arxiv.org/pdf/2004.13324.pdf https://arxiv.org/pdf/2004.13324.pdf

Wang S, Clark R, Wen H K and Trigoni N. 2017b. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE: 2043-2050[ DOI: 10.1109/ICRA.2017.7989236 http://dx.doi.org/10.1109/ICRA.2017.7989236 ]

Wang S L, Fidler S and Urtasun R. 2015. Lost shopping! Monocular localization in large indoor spaces//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2695-2703[ DOI: 10.1109/ICCV.2015.309 http://dx.doi.org/10.1109/ICCV.2015.309 ]

Wang S L, Suo S M, Ma W C, Pokrovsky A and Urtasun R. 2018b. Deep parametric continuous convolutional neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2589-2597[ DOI: 10.1109/CVPR.2018.00274 http://dx.doi.org/10.1109/CVPR.2018.00274 ]

Wang W Y, Yu R, Huang Q G and Neumann U. 2018c. SGPN: similarity group proposal network for 3D point cloud instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2569-2578[ DOI: 10.1109/CVPR.2018.00272 http://dx.doi.org/10.1109/CVPR.2018.00272 ]

Wang X L, Liu S, Shen X Y, Shen C H and Jia J Y. 2019b. Associatively segmenting instances and semantics in point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4091-4100[ DOI: 10.1109/CVPR.2019.00422 http://dx.doi.org/10.1109/CVPR.2019.00422 ]

Wang Y R, Huang Z H, Zhu H, Li W, Cao X and Yang R G. 2020g. Interactive free-viewpoint video generation. Virtual Reality and Intelligent Hardware, 2(3): 247-260[DOI:10.1016/j.vrih.2020.04.004]

Wang Y, Wang P, Yang Z H, Luo C X, Yang Y and Xu W. 2019c. UnOS: unified unsupervised optical-flow and stereo-depth estimation by watching videos//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8063-8073[ DOI: 10.1109/CVPR.2019.00826 http://dx.doi.org/10.1109/CVPR.2019.00826 ]

Wang Y G, Liu Y B, Tong X, Dai Q H and Tan P. 2018d. Outdoor markerless motion capture with sparse handheld video cameras. IEEE Transactions on Visualization and Computer Graphics, 24(5): 1856-1866[DOI:10.1109/TVCG.2017.2693151]

Wang Z H, Liang D T, Liang D, Zhang J C and Liu H J. 2018. A SLAM method based on inertial/magnetic sensors and monocular vision fusion. Robot, 40(6): 933-941

王泽华, 梁冬泰, 梁丹, 章家成, 刘华杰. 2018. 基于惯性/磁力传感器与单目视觉融合的SLAM方法. 机器人, 40(6): 933-941[DOI:10.13973/j.cnki.robot.170683]

Wei J C, Lin G S, Yap K H, Hung T Y and Xie L H. 2020. Multi-path region mining for weakly supervised 3D semantic segmentation on point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4383-4392[ DOI: 10.1109/CVPR42600.2020.00444 http://dx.doi.org/10.1109/CVPR42600.2020.00444 ]

Weiss S, Achtelik M W, Lynen S, Chli M and Siegwart R. 2012. Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments//Proceedings of 2012 IEEE International Conference on Robotics and Automation. Saint Paul, USA: IEEE: 957-964[ DOI: 10.1109/ICRA.2012.6225147 http://dx.doi.org/10.1109/ICRA.2012.6225147 ]

Wolcott R W and Eustice R M. 2014. Visual localization within lidar maps for automated urban drivingin///Proceedings of 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA: IEEE: 176-183.

Wong D, Kawanishi Y, Deguchi D, Ide I and Murase H. 2017. Monocular localization within sparse voxel maps//Proceedings of 2017 IEEE Intelligent Vehicles Symposium (Ⅳ). Los Angeles, USA: IEEE: 499-504[ DOI: 10.1109/IVS.2017.7995767 http://dx.doi.org/10.1109/IVS.2017.7995767 ]

Wu B C, Wan A, Yue X Y and Keutzer K. 2018a. SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 1887-1893[ DOI: 10.1109/ICRA.2018.8462926 http://dx.doi.org/10.1109/ICRA.2018.8462926 ]

Wu B C, Zhou X Y, Zhao S C, Yue X Y and Keutzer K. 2019a. SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 4376-4382[ DOI: 10.1109/ICRA.2019.8793495 http://dx.doi.org/10.1109/ICRA.2019.8793495 ]

Wu J J, Zhang C K, Xue T F, Freeman B T and Tenenbaum J B. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, United States: Curran Associates Inc. : 82-90

Wu L and Wu Y H. 2019. Similarity hierarchy based place recognition by deep supervised hashing for SLAM. IR-OS

Wu R D, Zhuang Y X, Xu K, Zhang H and Chen B Q. 2019b. PQ-NET: a generative part Seq2Seq network for 3D shapes[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1911.10949.pdf https://arxiv.org/pdf/1911.10949.pdf

Wu Y H and Hu Z Y. 2006. PnP problem revisited. Journal of Mathematical Imaging and Vision, 24(1) 131-141[DOI:10.1007/s10851-005-3617-z]

Wu Y H, Tang F L and Li H P. 2018b. Image-based camera localization: an overview. Visual Computing for Industry, Biomedicine, and Art, 1: #8[DOI:10.1186/s42492-018-0008-z]

Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920[ DOI: 10.1109/cvpr.2015.7298801 http://dx.doi.org/10.1109/cvpr.2015.7298801 ]

Wu Z J, Wang X, Lin D, Lischinski D, Cohen-Or D and Huang H. 2019c. SAGNet: structure-aware generative network for 3D-shape modeling. ACM Transactions on Graphics, 38(4): #91[DOI:10.1145/3306346.3322956]

Xiao L H, Wang J, Qiu X S, Rong Z and Zou X D. 2019. Dynamic-SLAM: semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robotics and Autonomous Systems, 117: 1-16[DOI:10.1016/j.robot.2019.03.012]

Xie J Y, Girshick R and Farhadi A. 2016. Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 842-857[ DOI: 10.1007/978-3-319-46493-0_51 http://dx.doi.org/10.1007/978-3-319-46493-0_51 ]

Xie S N, Gu J T, Guo D M, Qi C R, Guibas L and Litany O. 2020. PointContrast: unsupervised pre-training for 3D point cloud understanding//Proceedings of European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 574-591[ DOI: 10.1007/978-3-030-58580-8_34 http://dx.doi.org/10.1007/978-3-030-58580-8_34 ]

Xu C, Wu B, Wang Z, Tomizuka M. 2020. Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation[C]//European Conference on Computer Vision. Springer, Cham: 1-19.

Xu K, Zhang H, Cohen-Or D and Chen B Q. 2012. Fit and diverse: set evolution for inspiring 3D shape galleries. ACM Transactions on Graphics, 31(4): #57[DOI:10.1145/2185520.2185553]

Xu L, Cheng W, Guo K W, Han L, Liu Y B and Fang L. 2021. FlyFusion: realtime dynamic scene reconstruction using a flying depth camera. IEEE Transactions on Visualization and Computer Graphics, 27(1): 68-82[DOI:10.1109/TVCG.2019.2930691]

Xu L, Liu Y B, Cheng W, Guo K W, Zhou G Y, Dai Q H and Fang L. 2018a. FlyCap: markerless motion capture using multiple autonomous flying cameras. IEEE Transactions on Visualization and Computer Graphics, 24(8): 2284-2297[DOI:10.1109/TVCG.2017.2728660]

Xu L, Su Z, Han L, Yu T, Liu Y B and Fang L. 2020. UnstructuredFusion: realtime 4D geometry and texture reconstruction using commercial RGBD cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10): 2508-2522[DOI:10.1109/TPAMI.2019.2915229]

Xu Y F, Fan T Q, Xu M Y, Zeng L and Qiao Y. 2018b. SpiderCNN: deep learning on point sets with parameterized convolutional filters//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 90-105[ DOI: 10.1007/978-3-030-01237-3_6 http://dx.doi.org/10.1007/978-3-030-01237-3_6 ]

Xu Y, Zhu X, Shi J P, Zhang G F, Bao H J and Li H S. 2019. Depth completion from sparse LiDAR data with depth-normal constraints//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2811-2820[ DOI: 10.1109/ICCV.2019.00290 http://dx.doi.org/10.1109/ICCV.2019.00290 ]

Xue F, Wang X, Li S K, Wang Q Y, Wang J Q and Zha H B. 2019. Beyond tracking: selecting memory and refining poses for deep visual odometry//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8567-8575[ DOI: 10.1109/CVPR.2019.00877 http://dx.doi.org/10.1109/CVPR.2019.00877 ]

Yan M, Wang J Z, Li J and Zhang C. 2017. Loose coupling visual-lidar odometry by combining VISO2 and LOAM//Proceedings of the 36th Chinese Control Conference. Dalian, China: IEEE: 6841-6846[ DOI: 10.23919/ChiCC.2017.8028435 http://dx.doi.org/10.23919/ChiCC.2017.8028435 ]

Yang B, Wang J, Clark R, Hu Q Y, Wang S, Markham A and Trigoni N. 2019. Learning object bounding boxes for 3D instance segmentation on point clouds//Advances in Neural Information Processing Systems. Vancouver, Canada: [s. n.]: 6737-6746

Yang G, Zhao H, Shi J, Deng Z and Jia J. 2018a. SegStereo: exploiting semantic information for disparity estimation//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 660-676[ DOI: 10.1007/978-3-030-01234-2_39 http://dx.doi.org/10.1007/978-3-030-01234-2_39 ]

Yang N, von Stumberg L, Wang R and Cremers D. 2020. D3VO: deep depth, deep pose and deep uncertainty for monocular visual odometry//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1278-1289[ DOI: 10.1109/CVPR42600.2020.00136 http://dx.doi.org/10.1109/CVPR42600.2020.00136 ]

Yang N, Wang R, Stückler J and Cremers D. 2018b. Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 835-852[ DOI: 10.1007/978-3-030-01237-3_50 http://dx.doi.org/10.1007/978-3-030-01237-3_50 ]

Ye H Y, Huang H Y and Liu M. 2020a. Monocular direct sparse localization in a prior 3D surfel map. [EB/DL]. [2021-02-03] . https://arxiv.org/pdf/2002.09923.pdf https://arxiv.org/pdf/2002.09923.pdf

Ye W L, Zheng R J, Zhang F Q, Ouyang Z Z and Liu Y. 2019. Robust and efficient vehicles motion estimation with low-cost multi-camera and odometer-gyroscope//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE: 4490-4496[ DOI: 10.1109/IROS40897.2019.8968048 http://dx.doi.org/10.1109/IROS40897.2019.8968048 ]

Ye X C, Chen S D and Xu R. 2020b. DPNet: detail-preserving network for high quality monocular depth estimation. Pattern Recognition, 109: #107578[DOI:10.1016/j.patcog.2020.107578]

Ye X Q, Li J M, Huang H X, Du L and Zhang X L. 2018. 3D recurrent neural networks with contextfusion for point cloud semantic segmentation//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 415-430[ DOI: 10.1007/978-3-030-01234-2_25 http://dx.doi.org/10.1007/978-3-030-01234-2_25 ]

Yi L, Zhao W, Wang H, Sung M and Guibas L J. 2019. GSPN: generative shape proposal network for 3D instance segmentation in point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3942-3951[ DOI: 10.1109/CVPR.2019.00407 http://dx.doi.org/10.1109/CVPR.2019.00407 ]

Yin Z C and Shi J P. 2018. GeoNet: unsupervised learning of dense depth, optical flow and camera pose//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1983-1992[ DOI: 10.1109/CVPR.2018.00212 http://dx.doi.org/10.1109/CVPR.2018.00212 ]

Yu H L, Ye W C, Feng Y J, Bao H J and Zhang G F. 2020. Learning bipartite graph matching for robust visual localization//Proceedings of 2020 IEEE International Symposium on Mixed and Augmented Reality. Porto de Galinhas, Brazil: IEEE: 146-155[ DOI: 10.1109/ISMAR50242.2020.00036 http://dx.doi.org/10.1109/ISMAR50242.2020.00036 ]

Yu T, Guo K W, Xu F, Dong Y, Su Z Q, Zhao J H, Li J G, Dai Q H and Liu Y B. 2017. BodyFusion: real-time capture of human motion and surface geometry using a single depth camera//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 910-919[ DOI: 10.1109/ICCV.2017.104 http://dx.doi.org/10.1109/ICCV.2017.104 ]

Yu T, Zhao J H, Huang Y H, Li Y P and Liu Y B. 2019a. Towards robust and accurate single-view fast human motion capture. IEEE Access, 7: 85548-85559[DOI:10.1109/ACCESS.2019.2920633]

Yu T, Zheng Z R, Guo K W, Zhao J H, Dai Q H, Li H, Pons-Moll G and Liu Y B. 2018. DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7287-7296[ DOI: 10.1109/CVPR.2018.00761 http://dx.doi.org/10.1109/CVPR.2018.00761 ]

Yu T, Zheng Z R, Zhong Y, Zhao J H, Dai Q H, Pons-Moll G and Liu Y B. 2019b. SimulCap: single-view human performance capture with cloth simulation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5499-5509[ DOI: 10.1109/CVPR.2019.00565 http://dx.doi.org/10.1109/CVPR.2019.00565 ]

Zagoruyko S and Komodakis N. 2015. Learning to compare image patches via convolutional neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 4353-4361[ DOI: 10.1109/CVPR.2015.7299064 http://dx.doi.org/10.1109/CVPR.2015.7299064 ]

Žbontar J and LeCun Y. 2015. Computing the stereo matching cost with a convolutional neural network//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1592-1599[ DOI: 10.1109/CVPR.2015.7298767 http://dx.doi.org/10.1109/CVPR.2015.7298767 ]

Zeisl B, Sattler T and Pollefeys M. 2015. Camera pose voting for large-scale image-based localization//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2704-2712[ DOI: 10.1109/ICCV.2015.310 http://dx.doi.org/10.1109/ICCV.2015.310 ]

Zhan H Y, Garg R, Weerasekera C S, Li K J, Agarwal H and Reid I M. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 340-349[ DOI: 10.1109/CVPR.2018.00043 http://dx.doi.org/10.1109/CVPR.2018.00043 ]

Zhan H Y, Weerasekera C S, Bian J W and Reid I. 2020. Visual odometry revisited: what should be learnt?//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE: 4203-4210[ DOI: 10.1109/ICRA40945.2020.9197374 http://dx.doi.org/10.1109/ICRA40945.2020.9197374 ]

Zhang J and Singh S. 2018. Laser-visual-inertial odometry and mapping with high robustness and low drift. Journal of Field Robotics, 35(8): 1242-1264[DOI:10.1002/rob.21809]

Zhang L, Chen W H, Hu C, Wu X M and Li Z G. 2019a. S & CNet: monocular depth completion for autonomous systems and 3D reconstruction[EB/OL]. [2021-02-03] . https://arxiv.org/pdf/1907.06071.pdf https://arxiv.org/pdf/1907.06071.pdf

Zhang P J, Wu Y H and Liu B X. 2020a. Leveraging local and global descriptors in parallel to search correspondences for visual localization[EB/OL]. [2021-02-03] . https://arxiv.org/pdf/2009.10891.pdf https://arxiv.org/pdf/2009.10891.pdf

Zhang Y G and Li Q. 2018. Multi-frame fusion method for point cloud of LiDAR based on IMU. Journal of System Simulation, 30(11): 4334-4339

张艳国, 李擎. 2018. 基于惯性测量单元的激光雷达点云融合方法. 系统仿真学报, 30(11): 4334-4339

Zhang Y, Zhou Z X, David P, Yue X Y, Xi Z R, Gong B Q and Foroosh H. 2020b. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9598-9607[ DOI: 10.1109/CVPR42600.2020.00962 http://dx.doi.org/10.1109/CVPR42600.2020.00962 ]

Zhang Z Y, Hua B S and Yeung S K. 2019b. ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics//Proceedingsof 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1607-1616[ DOI: 10.1109/ICCV.2019.00169 http://dx.doi.org/10.1109/ICCV.2019.00169 ]

Zhao C, Sun L, Purkait P, Duckett T and Stolkin R. 2018. Learning monocular visual odometry with dense 3D mapping from dense 3D flow//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE: 6864-6871[ DOI: 10.1109/IROS.2018.8594151 http://dx.doi.org/10.1109/IROS.2018.8594151 ]

Zhao H S, Jiang L, Fu C W and Jia J Y. 2019. PointWeb: enhancing local neighborhood features for point cloud processing//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5560-5568[ DOI: 10.1109/CVPR.2019.00571 http://dx.doi.org/10.1109/CVPR.2019.00571 ]

Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239[ DOI: 10.1109/CVPR.2017.660 http://dx.doi.org/10.1109/CVPR.2017.660 ]

Zheng B and Zhang Z X. 2019. An improved EKF-SLAM for Mars surface exploration. International Journal of Aerospace Engineering, 2019: #7637469[ DOI: 10.1155/2019/7637469 http://dx.doi.org/10.1155/2019/7637469 ]

Zheng Z R, Yu T, Li H, Guo K W, Dai Q H, Fang L and Liu Y B. 2018. HybridFusion: real-time performance capture using a single depth sensor and sparse IMUs//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 389-406[ DOI: 10.1007/978-3-030-01240-3_24 http://dx.doi.org/10.1007/978-3-030-01240-3_24 ]

Zhi S F, Bloesch M, Leutenegger S and Davison A J. 2019. SceneCode: monocular dense semantic reconstruction using learned encoded scene representations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11768-11777[ DOI: 10.1109/CVPR.2019.01205 http://dx.doi.org/10.1109/CVPR.2019.01205 ]

Zhong Y R, Li H D and Dai Y C. 2018. Open-world stereo video matching with deep RNN//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 104-119[ DOI: 10.1007/978-3-030-01216-8_7 http://dx.doi.org/10.1007/978-3-030-01216-8_7 ]

Zhou C, Zhang H, Shen X Y and Jia J Y. 2017a. Unsupervised learning of stereo matching//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1576-1584[ DOI: 10.1109/ICCV.2017.174 http://dx.doi.org/10.1109/ICCV.2017.174 ]

Zhou H, Zhu X, Song X, Ma Y C, Wang Z, Li H S and Lin D H. 2020a. Cylinder3D: an effective 3D framework for driving-scene LiDAR semantic segmentation[EB/OL]. [2021-02-03] . https://arxiv.org/pdf/2008.01550.pdf https://arxiv.org/pdf/2008.01550.pdf

Zhou T H, Brown M, Snavely N and Lowe D G. 2017b. Unsupervised learning of depth and ego-motion from video//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6612-6619[ DOI: 10.1109/CVPR.2017.700 http://dx.doi.org/10.1109/CVPR.2017.700 ]

Zhou Y, Wan G W, Hou S H, Yu L, Wang G, Rui X F and Song S Y. 2020b. DA4AD: end-to-end deep attention-based visual localization for autonomous driving[EB/OL]. [2021-02-03] . https://arxiv.org/pdf/2003.03026.pdf https://arxiv.org/pdf/2003.03026.pdf

Zhu C, Giorgi G, Lee Y H and Günther C. 2018a. Enhancing accuracy in visual SLAM by tightly coupling sparse ranging measurements between two rovers//Proceedings of 2018 IEEE/ION Position, Location and Navigation Symposium. Monterey, USA: IEEE: 440-446[ DOI: 10.1109/PLANS.2018.8373412 http://dx.doi.org/10.1109/PLANS.2018.8373412 ]

Zhu C Y, Xu K, Chaudhuri S, Yi R J and Zhang H. 2018b. SCORES: shape composition with recursive substructure priors. ACM Transactions on Graphics, 37(6): #211[DOI:10.1145/3272127.3275008]

Zhu H, Su H, Wang P, Cao X and Yang R G. 2018c. View extrapolation of human body from a single image//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4450-4459[ DOI: 10.1109/CVPR.2018.00468 http://dx.doi.org/10.1109/CVPR.2018.00468 ]

Zhu X, Zhou H, Wang T, Hong F Z. 2020. cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/2011.10033.pdf https://arxiv.org/pdf/2011.10033.pdf

Zhu Z L, Yang S W, Dai H D and Li F. 2018d. Loop detection and correction of 3D laser-based SLAM with visual information//Proceedings of the 31st International Conference on Computer Animation and Social Agents. Beijing, China: ACM: 53-58[ DOI: 10.1145/3205326.3205357 http://dx.doi.org/10.1145/3205326.3205357 ]

Zoph B, Vasudevan V, Shlens J and Le Q V. 2018. Learning transferable architectures for scalable image recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8697-8710[ DOI: 10.1109/CVPR.2018.00907 http://dx.doi.org/10.1109/CVPR.2018.00907 ]

Zubizarreta J, Aguinaga I and Montiel J M M. 2020. Direct sparse mapping. IEEE Transactions on Robotics, 36(4): 1363-1370[DOI:10.1109/TRO.2020.2991614]

Zuo X X, Geneva P, Yang Y L, Ye W L, Liu Y and Huang G Q. 2019. Visual-inertial localization with prior LiDAR map constraints. IEEE Robotics and Automation Letters, 4(4): 3394-3401[DOI:10.1109/LRA.2019.2927123]

Alert me when the article has been cited

提交

Indoor large-scale panoramic visual localization dataset

Multilevel perceptual conditional random field model for monocular depth estimation

Lightweight visual-based localization technology

Transformer network for stereo matching of weak texture objects

Super-resolution reconstruction of binocular image based on multi-level fusion attention network