Progress in the large-scale outdoor image 3D reconstruction

Shen Yan; Maojun Zhang; Yachun Fan; Xiaohui Tan; Yu Liu; Yang Peng; Yuxiang Liu

doi:10.11834/jig.200842

3D Vision & Graphics Technology | Views : 0 下载量: 152 CSCD: 6

PDF
Export
Share
Collection
Album

Progress in the large-scale outdoor image 3D reconstruction
Vol. 26, Issue 6, Pages: 1429-1449(2021)
Received：31 December 2020，

Revised：13 January 2021，

Accepted：20 January 2021，

Published：16 June 2021
DOI： 10.11834/jig.200842
稿件说明：

移动端阅览

DOI：

Shen Yan, Maojun Zhang, Yachun Fan, Xiaohui Tan, Yu Liu, Yang Peng, Yuxiang Liu. Progress in the large-scale outdoor image 3D reconstruction[J]. Journal of image and graphics, 2021, 26(6): 1429-1449. DOI： 10.11834/jig.200842.

摘要

基于图像的3维重建旨在从一组2维多视角图像中精确地恢复真实场景的几何形状，是计算机视觉和摄影测量中基础且活跃的研究课题，具有重要的理论研究意义和应用价值，在智慧城市、虚拟旅游、数字遗产保护、数字地图和导航等领域有着广泛应用。随着图像采集系统（智能手机、消费级数码相机和民用无人机等）的普及和互联网的高速发展，通过搜索引擎可以获取大量关于某个室外场景的互联网图像。利用这些图像进行高效鲁棒准确的3维重建，为用户提供真实感知和沉浸式体验已经成为研究热点，引发了学术界和产业界的广泛关注，涌现了多种方法。深度学习的出现为大规模室外图像的3维重建提供了新的契机。首先阐述大规模室外图像3维重建的基本串行过程，包括图像检索、图像特征点匹配、运动恢复结构和多视图立体。然后从传统方法和基于深度学习的方法两个角度，分别系统全面地回顾大规模室外图像3维重建技术在各重建子过程中的发展和应用，总结各子过程中适用于大规模室外场景的数据集和评价指标。最后介绍现有主流的开源和商业3维重建系统以及国内相关产业的发展现状。

Abstract

3D reconstruction aims to accurately restore the geometry of an actual scene. It is a fundamental and active research field in computer vision and photogrammetry with important theoretical significance and application value. Acquisition of 3D models is highly relevant for various applications

including smart city

virtual tourism

digital heritage preservation

mapping

and navigation. Various technologies that enable 3D modeling have been developed

and each of them has its own benefits and drawbacks for certain applications. The methods can be classified into two categories

namely

active acquisition methods (e.g.

LiDAR and radar) and passive ones (i.e.

cameras). As a passive acquisition method

cameras are especially power efficient and do not need direct physical contact with the actual world

and 3D model can be effectively rebuilt from a set of 2D multiview images. In addition

with the increasing availability of cameras as commodity sensors in consumer devices

the cost of camera hardware has decreased significantly. Over the last decades

with the popularization of image acquisition systems (including smart phones

consumer-grade digital cameras

and civil drones) and the rapid development of the Internet

normal people can easily obtain a large number of Internet images about an outdoor scene through search engines (such as Google

Bing

or Baidu). Organizing and utilizing these extremely rich and diverse data source to perform efficient

robust

and accurate 3D reconstruction to provide users with actual perception and immersive experience have become a research hotspot and have attracted widespread attention from the academic and industrial circles. For a human

building an accurate and complete 3D representation of the actual world on the fly is natural

but abstracting the underlying problem in a computer program is extremely hard. Nowadays

many of the underlying problems in large-scale outdoor 3D reconstruction are gradually understood

but many problems

which the research community has not deeply understood

still exist. 3D modeling becomes feasible in computer programming by decomposing the entire reconstruction into several simpler subproblems. Thus far

a growing amount and diversity of methods have been proposed to solve the challenging problem. Some researchers focus on solving the overall modeling problem

and more approaches focus on dealing with subreconstruction tasks. In particular

in recent years

modern convolutional neural network (CNN) models have achieved the best quality for object recognition

image segmentation

image translation

and some other challenging computer vision problems. The emergence of deep learning provides new opportunities and increasing interests for the research on large-scale outdoor image 3D reconstruction. However

3D reconstruction experiences rapid development from traditional period to deep learning era. Interestingly

to the best of our knowledge

no previous work has presented an overview of recent progress in the large-scale outdoor image 3D reconstruction in detail. To conclude the rapid evolution of this field

traditional image-based 3D reconstruction approaches are presented

a comprehensive survey of the recent learning-based developments is provided. Specifically

the basic serial pipeline of large-scale outdoor image 3D reconstruction

including image retrieval

image feature matching

structure from motion

and multiview stereo is described. Then

traditional methods and deep learning-based methods are distinguished

and the development and application of large-scale outdoor image 3D reconstruction technology in each reconstruction subprocess are systematically and comprehensively reviewed. We show that

although deep learning-based methods have achieved overwhelming advantages in other computer vision and natural language processing tasks

geometric-based methods

which are adopted by some common 3D reconstruction systems

still illuminate higher robust and accurate performance in 3D reconstruction. This finding indicates that deep learning methods can be remarkably improved. Subsequently

the datasets and evaluation indicators applicable to large-scale outdoor scenes in each subprocess are summarized in detail. Furthermore

we introduce the datasets used in each subtask and present a comprehensive dataset specifically for 3D reconstruction. Finally

the current mainstream open source and commercial 3D reconstruction systems and the development status of domestic related industries are introduced. Although the image-based 3D reconstruction technology has made great progress in the past 10 years

the current method still has some problems

as follows: 1) For scenes with repeated textures (such as the Temple of Heaven)

the structure from motion process fails

resulting in inaccurate registered camera posed and incomplete reconstruction models; for scenes with weak textures (such as lake surface

glass curtain wall)

multiview stereo process fails

thereby resulting in holes in the reconstructed model. 2) The current 3D reconstruction system consumes considerable time to reconstruct scenes (especially large-scale scenes); this approach is different from real-time reconstruction. 3) The price of 3D sensors (such as LiDAR and ToF) has dropped significantly; thus

they become closer to consumer applications. Using these sensors to effectively compensate for the lack of image-based 3D reconstruction is still an unsolved problem.

关键词

Keywords

references

Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M and Szeliski R. 2011. Building Rome in a day. Communications of the ACM, 54(10): 105-112[DOI:10.1145/2001269.2001293]

Alahi A, Ortiz R and Vandergheynst P. 2012. Freak: fast retina keypoint//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 510-517[ DOI: 10.1109/CVPR.2012.6247715 http://dx.doi.org/10.1109/CVPR.2012.6247715 ]

ArandjelovićR, Gronat P, Torii A, Pajdla T and Sivic J. 2016. NetVLAD: CNN architecture for weakly supervised place recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5297-5307[ DOI: 10.1109/CVPR.2016.572 http://dx.doi.org/10.1109/CVPR.2016.572 ]

ArandjelovićR and Zisserman A. 2012. Three things everyone should know to improve object retrieval//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2911-2918[ DOI: 10.1109/CVPR.2012.6248018 http://dx.doi.org/10.1109/CVPR.2012.6248018 ]

Balntas V, Riba E, Ponsa D and Mikolajczyk K. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks//Proceeding of 2016 British Machine Vision Conference. York, UK: BMVA Press: 119.1-119.11[ DOI: 10.5244/C.30.119 http://dx.doi.org/10.5244/C.30.119 ]

Barroso Laguna A, Riba E, Ponsa D and Mikolajczyk K. 2019. Key. Net: keypoint detection by handcrafted and learned cnn filters//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5835-5843[ DOI: 10.1109/ICCV.2019.00593 http://dx.doi.org/10.1109/ICCV.2019.00593 ]

Beckmann N, Kriegel H P, Schneider R and Seeger B. 1990. The R*-tree: an efficient and robust access method for points and rectangles//Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data. Atlantic City, USA: Association for Computing Machinery: 322-331[ DOI: 10.1145/93597.98741 http://dx.doi.org/10.1145/93597.98741 ]

Bhowmick B, Patra S, Chatterjee A, Govindu V M and Banerjee S. 2015. Divide and conquer: efficient large-scale structure from motion using graph partitioning//Proceedings of the 12th Asian Conference on Computer Vision. Singapore, Singapore: Springer: 273-287[ DOI: 10.1007/978-3-319-16808-1_19 http://dx.doi.org/10.1007/978-3-319-16808-1_19 ]

Bleyer M, Rhemann C and Rother C. 2011. PatchMatch stereo-stereo matching with slanted support windows//Proceedings of 2011 British Machine Vision Conference. Dundee, UK: BMVA Press: #14[ DOI: 10.5244/C.25.14 http://dx.doi.org/10.5244/C.25.14 ]

Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S and Rother C. 2017. DSAC-differentiable RANSAC for camera localization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2492-2500[ DOI: 10.1109/CVPR.2017.267 http://dx.doi.org/10.1109/CVPR.2017.267 ]

Brachmann E and Rother C. 2019. Neural-guided RANSAC: learning where to sample model hypotheses//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4321-4330[ DOI: 10.1109/ICCV.2019.00442 http://dx.doi.org/10.1109/ICCV.2019.00442 ]

Bradley D, Boubekeur T and Heidrich W. 2008. Accurate multi-view reconstruction using robust binocular stereo and surface meshing//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[ DOI: 10.1109/CVPR.2008.4587792 http://dx.doi.org/10.1109/CVPR.2008.4587792 ]

Calonder M, Lepetit V, Strecha C and Fua P. 2010. Brief: binary robust independent elementary features//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 778-792[ DOI: 10.1007/978-3-642-15561-1_56 http://dx.doi.org/10.1007/978-3-642-15561-1_56 ]

Chatterjee A and Govindu V M. 2018. Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 958-972[DOI:10.1109/TPAMI.2017.2693984]

Chen R, Han S F, Xu J and Su H. 2019. Point-based multi-view stereo network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1538-1547[ DOI: 10.1109/ICCV.2019.00162 http://dx.doi.org/10.1109/ICCV.2019.00162 ]

Chen Y, Shen S H, Chen Y S and Wang G P. 2020. Graph-based parallel large scale structure from motion. Pattern Recognition, 107: 107537[DOI:10.1016/j.patcog.2020.107537]

Cheng S, Xu Z X, Zhu S L, Li Z W, Li L E, Ramamoorthi R and Su H. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2521-2531[ DOI: 10.1109/CVPR42600.2020.00260 http://dx.doi.org/10.1109/CVPR42600.2020.00260 ]

Chopra S, Hadsell R and LeCun Y. 2005. Learning a similarity metric discriminatively, with application to face verification//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 539-546[ DOI: 10.1109/CVPR.2005.202 http://dx.doi.org/10.1109/CVPR.2005.202 ]

Chum O and Matas J. 2010. Large-scale discovery of spatially related images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2): 371-377[DOI:10.1109/TPAMI.2009.166]

Chum O, Mikulik A, Perdoch M and Matas J. 2011. Total recall Ⅱ: query expansion revisited//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 889-896[ DOI: 10.1109/CVPR.2011.5995601 http://dx.doi.org/10.1109/CVPR.2011.5995601 ]

DeTone D, MalisiewiczT and Rabinovich A. 2018. Superpoint: self-supervised interest point detection and description//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 224-236[ DOI: 10.1109/CVPRW.2018.00060 http://dx.doi.org/10.1109/CVPRW.2018.00060 ]

Dong J M and Soatto S. 2015. Domain-size pooling in local descriptors: DSP-SIFT//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5097-5106[ DOI: 10.1109/CVPR.2015.7299145 http://dx.doi.org/10.1109/CVPR.2015.7299145 ]

Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A and Sattler T. 2019. D2-Net: a trainable CNN for joint description and detection of local features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8084-8093[ DOI: 10.1109/CVPR.2019.00828 http://dx.doi.org/10.1109/CVPR.2019.00828 ]

Ebel P, Trulls E, Yi K M, Fua P and Mishchuk A. 2019. Beyond Cartesian representations for local descriptors//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 253-262[ DOI: 10.1109/ICCV.2019.00034 http://dx.doi.org/10.1109/ICCV.2019.00034 ]

Eriksson A, Olsson C, Kahl F and Chin T J. 2018. Rotation averaging and strong duality//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 127-135[ DOI: 10.1109/CVPR.2018.00021 http://dx.doi.org/10.1109/CVPR.2018.00021 ]

Fuhrmann S, Langguth F and Goesele M. 2014. MVE-A Multi-View Reconstruction Environment//Proceedings of the 12th Eurographics Workshop on Graphics and Cultural Heritage. Darmstadt, Germany: Eurographics Association: 11-18[ DOI: 10.2312/gch.20141299 http://dx.doi.org/10.2312/gch.20141299 ]

Frahm J M, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C C, Jen Y H, Dunn E, Clipp B, Lazebnik S and Pollefeys M. 2010. Building Rome on a cloudless day//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer: 368-381[ DOI: 10.1007/978-3-642-15561-1_27 http://dx.doi.org/10.1007/978-3-642-15561-1_27 ]

Frahm J M and Pollefeys M. 2006. RANSAC for (quasi-) degenerate data (QDEGSAC)//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 453-460[ DOI: 10.1109/CVPR.2006.235 http://dx.doi.org/10.1109/CVPR.2006.235 ]

Gallup D, Frahm J M and Mordohai P. 2007. Real-time plane-sweeping stereo with multiple sweeping directions//Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8[ DOI: 10.1109/CVPR.2007.383245 http://dx.doi.org/10.1109/CVPR.2007.383245 ]

Ge Y X, Wang H B, Zhu F, Zhao R and Li H S. 2020. Self-supervising fine-grained region similarities for large-scale image localization[EB/OL]. [2020-07-09] . https://arxiv.org/pdf/2006.03926.pdf https://arxiv.org/pdf/2006.03926.pdf

Goesele M, Curless B and Seitz S M. 2006. Multi-view stereo revisited//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2402-2409[ DOI: 10.1109/CVPR.2006.199 http://dx.doi.org/10.1109/CVPR.2006.199 ]

Goldstein T, Hand P, Lee C, Voroninski V and Soatto S. 2016. Shapefit and shapekick for robust, scalable structure from motion//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 289-304[ DOI: 10.1007/978-3-319-46478-7_18 http://dx.doi.org/10.1007/978-3-319-46478-7_18 ]

Gordo A, Almazán J, Revaud J and Larlus D. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2): 237-254[DOI:10.1007/s11263-017-1016-8]

Gu X D, Fan Z W, Zhu S Y, Dai Z Z, Tan F T and Tan P. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2492-2501[ DOI: 10.1109/CVPR42600.2020.00257 http://dx.doi.org/10.1109/CVPR42600.2020.00257 ]

Guo X K. 2019. Research on Key Technologies of Large-Scale 3D Terrain Construction. Shenyang: University of Chinese Academy of Sciences (Shenyang Institute of Computing Technology, Chinese Academy of Sciences

郭向坤. 2019. 大规模三维地形构建的关键技术研究. 沈阳: 中国科学院大学(中国科学院沈阳计算技术研究所

Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 1735-1742[ DOI: 10.1109/CVPR.2006.100 http://dx.doi.org/10.1109/CVPR.2006.100 ]

Han X F, Laga H and Bennamoun M. 2019. Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era[EB/OL]. [2020-12-30] . http://arxiv.org/abs/1906.06543.pdf http://arxiv.org/abs/1906.06543.pdf

Hartley R and Zisserman A. 2003. Multiple View Geometry in Computer Vision. 2nd ed. Cambridge: Cambridge University Press

Hartley R I, Aftab K and Trumpf J. 2011. L1 rotation averaging using the Weiszfeld algorithm//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 3041-3048[ DOI: 10.1109/CVPR.2011.5995745 http://dx.doi.org/10.1109/CVPR.2011.5995745 ]

Havlena M, Torii A and Pajdla T. 2010. Efficient structure from motion by graph optimization//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 100-113[ DOI: 10.1007/978-3-642-15552-9_8 http://dx.doi.org/10.1007/978-3-642-15552-9_8 ]

Heinly J, Schönberger J L, Dunn E and Frahm J M. 2015. Reconstructing the world * in six days//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3287-3295[ DOI: 10.1109/CVPR.2015.7298949 http://dx.doi.org/10.1109/CVPR.2015.7298949 ].

Hirschmuller H. 2008. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2): 328-341[DOI:10.1109/TPAMI.2007.1166]

Hoffer E and Ailon N. 2015. Deep metric learning using triplet network//Processing of 3rd International workshop on similarity-based pattern recognition. Copenhagen, Denmark: Springer: 84-92[ DOI: 10.1007/978-3-319-24261-3\-7 http://dx.doi.org/10.1007/978-3-319-24261-3\-7 ]

Jégou H, Douze M and Schmid C. 2008. Hamming embedding and weak geometric consistency for large scale image search//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 304-317[ DOI: 10.1007/978-3-540-88682-2_24 http://dx.doi.org/10.1007/978-3-540-88682-2_24 ]

Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P and Schmid C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1704-1716[DOI:10.1109/TPAMI.2011.235]

Ke Y and Sukthankar R. 2004. PCA-SIFT: a more distinctive representation for local image descriptors//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington DC, USA: IEEE: 506-513[ DOI: 10.1109/CVPR.2004.1315206 http://dx.doi.org/10.1109/CVPR.2004.1315206 ]

Kluger F, Brachmann E, Ackermann H, Rother C, Yang M Y and Rosenhahn B. 2020. CONSAC: robust multi-model fitting by conditional sample consensus//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4633-4642[ DOI: 10.1109/CVPR42600.2020.00469 http://dx.doi.org/10.1109/CVPR42600.2020.00469 ]

Knapitsch A, Park J, Zhou Q Y and Koltun V. 2017. Tanks and temples: benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4): 78[DOI:10.1145/3072959.3073599]

Kutulakos K N and Seitz S M. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3): 199-218[DOI:10.1023/A:1008191222954]

Lawler E L. 1963. The quadratic assignment problem. Management Science, 9(4): 586-599[DOI:10.1287/mnsc.9.4.586]

Lenc K and Vedaldi A. 2016. Learning covariant feature detectors//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 100-117[ DOI: 10.1007/978-3-319-49409-8_11 http://dx.doi.org/10.1007/978-3-319-49409-8_11 ]

Leordeanu M and Hebert M. 2005. A spectral technique for correspondence problems using pairwise constraints//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE: 1482-1489[ DOI: 10.1109/ICCV.2005.20 http://dx.doi.org/10.1109/ICCV.2005.20 ]

Leordeanu M, Sukthankar R and Hebert M. 2012. Unsupervised learning for graph matching. International Journal of Computer Vision, 96(1): 28-45[DOI:10.1007/s11263-011-0442-2]

Leutenegger S, Chli M and Siegwart R. 2011. Brisk: binary robust invariant scalable keypoints//Proceedings of 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE: 2548-2555[ DOI: 10.1109/ICCV.2011.6126542 http://dx.doi.org/10.1109/ICCV.2011.6126542 ]

Levi G. 1973. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. CALCOLO, 9(4): 341[DOI:10.1007/BF02575586]

Lhuillier M and Quan L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3): 418-433[DOI:10.1109/TPAMI.2005.44]

Li H, Sumner R W and Pauly M. 2008a. Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum, 27(5): 1421-1430[DOI:10.1111/j.1467-8659.2008.01282.x]

Li J G, Li E, Chen Y R, Xu L and Zhang Y M. 2010. Bundled depth-map merging for multi-view stereo//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2769-2776[ DOI: 10.1109/CVPR.2010.5540004 http://dx.doi.org/10.1109/CVPR.2010.5540004 ]

Li X W, Wu C C, Zach C, Lazebnik S and Frahm J M. 2008b. Modeling and recognition of landmark image collections using iconic scene graphs//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 427-440[ DOI: 10.1007/978-3-540-88682-2_33 http://dx.doi.org/10.1007/978-3-540-88682-2_33 ]

Li Z X. 2016. Research on Methods of Multi-View Stereo 3D Reconstruction. Harbin: Harbin Institute of Technology

李兆歆. 2016. 多视角立体三维重建方法研究. 哈尔滨: 哈尔滨工业大学

Lindeberg T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2): 79-116[DOI:10.1023/A:1008045108935]

Liu L, Li H D and Dai Y C. 2019a. Stochastic attraction-repulsion embedding for large scale image localization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2570-2579[ DOI: 10.1109/ICCV.2019.00266 http://dx.doi.org/10.1109/ICCV.2019.00266 ]

Liu Y, Shen Z H, Lin Z X, Peng S D, Bao H J and Zhou X W. 2019b. GIFT: learning transformation-invariant dense visual descriptors via group CNNS//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 6990-7001

Liu Z H. 2020. Research on Computation and Quality Evaluation Method for Large-Scale Multi-View 3D Reconstruction. Hangzhou: Zhejiang University

刘卓昊. 2020. 大规模多视图三维重建计算与质量评价方法研究. 杭州: 浙江大学

Liu Z Y, Qiao H, Yang X and Hoi S C H. 2014. Graph matching by simplified convex-concave relaxation procedure. International Journal of Computer Vision, 109(3): 169-186[DOI:10.1007/s11263-014-0707-7]

Loiola E M, de Abreu N M M, Boaventura-Netto P O, Hahn P and Querido T. 2007. A survey for the quadratic assignment problem. European Journal of Operational Research, 176(2): 657-690[DOI:10.1016/j.ejor.2005.09.032]

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110[DOI:10.1023/B:VISI.0000029664.99615.94]

Luo K Y, Guan T, Ju L L, Huang H P and Luo Y W. 2019a. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 10452-10461[ DOI: 10.1109/ICCV.2019.01055 http://dx.doi.org/10.1109/ICCV.2019.01055 ]

Luo Z, Shen T, Zhou L, Zhang J, Yao Y, Li S, Fang T and Quan L. 2019b. Contextdesc: Local descriptor augmentation with cross-modality context//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2527-2536[ DOI: 10.1109/CVPR.2019.00263 http://dx.doi.org/10.1109/CVPR.2019.00263 ]

Luo Z X, Shen T W, Zhou L, Zhu S Y, Zhang R Z, Yao Y and Quan L. 2018. GeoDesc: learning local descriptors by integrating geometry constraints//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 170-185[ DOI: 10.1007/978-3-030-01240-3_11 http://dx.doi.org/10.1007/978-3-030-01240-3_11 ]

Luo Z X, Zhou L, Bai X Y, Chen H K, Zhang J H, Yao Y, Li S W, Fang T and Quan L. 2020. ASLFeat: learning local features of accurate shape and localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6588-6597[ DOI: 10.1109/CVPR42600.2020.00662 http://dx.doi.org/10.1109/CVPR42600.2020.00662 ]

Ma J Y, Jiang X Y, Jiang J J, Zhao J and Guo X J. 2019. LMR: learning a two-class classifier for mismatch removal. IEEE Transactions on Image Processing, 28(8): 4045-4059[DOI:10.1109/TIP.2019.2906490]

Mainali P, Lafruit G, Yang Q, Geelen B, Van Gool L and Lauwereins R. 2013. SIFER: scale-invariant feature detector with error resilience. International Journal of Computer Vision, 104(2): 172-197[DOI:10.1007/s11263-013-0622-3]

Merrell P, Akbarzadeh A, Wang L, Mordohai A, Wang L, Mordohai P, Frahm J M, Yang R G, Nister D and Pollefeys M. 2007. Real-time visibility-based fusion of depth maps//Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8[ DOI: 10.1109/ICCV.2007.4408984 http://dx.doi.org/10.1109/ICCV.2007.4408984 ]

Mishchuk A, Mishkin D, Radenović F and Matas J. 2017. Working hard to know your neighbor's margins: local descriptor learning loss//Proceedings of the 31 st Conference on Neural Information Processing Systems. Long Beach, USA: MIT press: 4826-4837

Mobahi H, Collobert R and Weston J. 2009. Deep learning from temporal coherence in video//Proceedings of the 26th Annual International Conference on Machine Learning. Quebec, Canada: ACM: 737-744[ DOI: 10.1145/1553374.1553469 http://dx.doi.org/10.1145/1553374.1553469 ]

Moo Yi K, Trulls E, Ono Y, Lepetit V, Salzmann M and Fua P. 2018. Learning to find good correspondences//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2666-2674[ DOI: 10.1109/CVPR.2018.00282 http://dx.doi.org/10.1109/CVPR.2018.00282 ]

Morel J M and Yu G S. 2009. Asift: a new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, 2(2): 438-469[DOI:10.1137/080732730]

Moulon P, Monasse P and Marlet R. 2013. Adaptive structure from motion with a Contrario model estimation//Proceedings of the 11th Asian Conference on Computer Vision. Daejeon, Korea (South): Springer: 257-270[ DOI: 10.1007/978-3-642-37447-0_20 http://dx.doi.org/10.1007/978-3-642-37447-0_20 ].

Nistér D and Stewénius H. 2006. Scalable recognition with a vocabulary tree//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2161-2168[ DOI: 10.1109/CVPR.2006.264 http://dx.doi.org/10.1109/CVPR.2006.264 ]

Ono Y, Trulls E, Fua P and Yi K M. 2018. LF-Net: learning local features from images//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc. : 6237-6247

Özyeşil O and Singer A. 2015. Robust camera location estimation by convex programming//Proceedings of 2015 IEEE Conference on Computer Visionand Pattern Recognition. Boston, USA: IEEE: 2674-2683[ DOI: 10.1109/CVPR.2015.7298883 http://dx.doi.org/10.1109/CVPR.2015.7298883 ]

Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2007. Object retrieval with large vocabularies and fast spatial matching//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, Minnesota, USA: IEEE: 1-8[ DOI: 10.1109/CVPR.2007.383172 http://dx.doi.org/10.1109/CVPR.2007.383172 ]

Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2008. Lost in quantization: improving particular object retrieval in large scale image databases//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[ DOI: 10.1109/CVPR.2008.4587635 http://dx.doi.org/10.1109/CVPR.2008.4587635 ]

RadenovićF, Tolias G and Chum O. 2016. CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 3-20[ DOI: 10.1007/978-3-319-46448-0_1 http://dx.doi.org/10.1007/978-3-319-46448-0_1 ]

Revaud J, De Souza C, Humenberger M and Weinzaepfel P. 2019. R2 d2: reliable and repeatable detector and descriptor//Proceedings of 2019 Neural Information Processing Systems 32. Vancouver, Canada: MIT press: 12405-12415

Rublee E, Rabaud V, Konolige K and Bradski G R. 2011. Orb: an efficient alternative to SIFT or SURF//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain:IEEE: 2564-2571[ DOI: 10.1109/ICCV.2011.6126544 http://dx.doi.org/10.1109/ICCV.2011.6126544 ]

Sarlin P E, DeTone D, Malisiewicz T and Rabinovich A. 2020. SuperGlue: learning feature matching with graph neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4937-4946[ DOI: 10.1109/CVPR42600.2020.00499 http://dx.doi.org/10.1109/CVPR42600.2020.00499 ]

Schönberger J L and Frahm J M. 2016. Structure-from-motion revisited//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4104-4113[ DOI: 10.1109/CVPR.2016.445 http://dx.doi.org/10.1109/CVPR.2016.445 ]

Schönberger J L, Zheng E L, Frahm J M and Pollefeys M. 2016. Pixelwise view selection for unstructured multi-view stereo//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 501-518[ DOI: 10.1007/978-3-319-46487-9_31 http://dx.doi.org/10.1007/978-3-319-46487-9_31 ]

Schöps T, Schönberger J L, Galliani S, Sattler T, Schindler K, Pollefeys M and Geiger A. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2538-2547[ DOI: 10.1109/CVPR.2017.272 http://dx.doi.org/10.1109/CVPR.2017.272 ]

Schroff F, Kalenichenko D and Philbin J. 2015. Facenet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 815-823[ DOI: 10.1109/CVPR.2015.7298682 http://dx.doi.org/10.1109/CVPR.2015.7298682 ]

Shen S H. 2013. Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Transactions on Image Processing, 22(5): 1901-1914[DOI:10.1109/TIP.2013.2237921]

Shen T W, Luo Z X, Zhou L, Zhang R Z, Zhu S Y, Fang T and Quan L. 2019. Matchable image retrieval by learning from surface reconstruction//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 415-431[ DOI: 10.1007/978-3-030-20887-5_26 http://dx.doi.org/10.1007/978-3-030-20887-5_26 ]

Shen T W, Zhu S Y, Fang T, Zhang R Z and Quan L. 2016. Graph-based consistent matching for structure-from-motion//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 139-155[ DOI: 10.1007/978-3-319-46487-9_9 http://dx.doi.org/10.1007/978-3-319-46487-9_9 ]

Shi J B and Malik J. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 888-905[DOI:10.1109/34.868688]

Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P and Moreno-Noguer F. 2015. Discriminative learning of deep convolutional feature point descriptors//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 118-126[ DOI: 10.1109/ICCV.2015.22 http://dx.doi.org/10.1109/ICCV.2015.22 ]

Sivic J and Zisserman A. 2003. Video Google: a text retrieval approach to object matching in videos//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE: 1470-1477[ DOI: 10.1109/ICCV.2003.1238663 http://dx.doi.org/10.1109/ICCV.2003.1238663 ]

Snavely N, Seitz S M and Szeliski R. 2006. Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics, 25(3): 835-846[DOI:10.1145/1141911.1141964]

Snavely N, Seitz S M and Szeliski R. 2008. Skeletal graphs for efficient structure from motion//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA: IEEE: 1-8[ DOI: 10.1109/CVPR.2008.4587678 http://dx.doi.org/10.1109/CVPR.2008.4587678 ]

Steinbrücker F, Sturm J and Cremers D. 2011. Real-time visual odometry from dense RGB-D images//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops. Barcelona, Spain: IEEE: 719-722[ DOI: 10.1109/ICCVW.2011.6130321 http://dx.doi.org/10.1109/ICCVW.2011.6130321 ]

Strecha C, Fransens R and Van Gool L. 2006. Combined depth and outlier estimation in multi-view stereo//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2394-2401[ DOI: 10.1109/CVPR.2006.78 http://dx.doi.org/10.1109/CVPR.2006.78 ]

Sun K. 2017. Research on Image Matching and Three Dimension Scene Structure Reconstruction Method. Wuhan: Huazhong University of Science and Technology

孙琨. 2017. 图像匹配与场景三维重建方法研究. 武汉: 华中科技大学

Sweeney C, Fragoso V, Höllerer T and Turk M. 2016. Large scale SFM with the distributed camera model//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 230-238[ DOI: 10.1109/3DV.2016.31 http://dx.doi.org/10.1109/3DV.2016.31 ]

Sweeney C, Sattler T, Höllerer T, Turk M and Pollefeys M. 2015a. Optimizing the viewing graph for structure-from-motion//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 801-809[ DOI: 10.1109/ICCV.2015.98 http://dx.doi.org/10.1109/ICCV.2015.98 ]

Sweeney C, Hollerer T and Turk M. 2015b. Theia: A fast and scalable structure-from-motion library//Proceedings of the 23rd ACM international conference on Multimedia. Brisbane, Australia: ACM: 693-696[ DOI: 10.1145/2733373.2807405 http://dx.doi.org/10.1145/2733373.2807405 ]

Tang C Z and Tan P. 2019. BA-Net: dense bundle adjustment network[EB/OL]. [2020-12-30] . https://dblp.uni-trier.de/rec/conf/iclr/TangT19.html?view=bibtex https://dblp.uni-trier.de/rec/conf/iclr/TangT19.html?view=bibtex

Thomee B, Shamma D A, Friedland G, Elizalde B, Ni K, Poland D, Borth D and Li L J. 2016. YFCC100M: the new data in multimedia research. Communications of the ACM, 59(2): 64-73[DOI:10.1145/2812802]

Tian Y R, Fan B and Wu F C. 2017. L2-Net: deep learning of discriminative patch descriptor in Euclidean space//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6128-6136[ DOI: 10.1109/CVPR.2017.649 http://dx.doi.org/10.1109/CVPR.2017.649 ]

Tian Y R, Yu X, Fan B, Wu F C, Heijnen H and Balntas V. 2019. Sosnet: second order similarity regularization for local descriptor learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11008-11017[ DOI: 10.1109/CVPR.2019.01127 http://dx.doi.org/10.1109/CVPR.2019.01127 ]

Toews M and Wells W. 2009. SIFT-rank: ordinal description for invariant feature correspondence//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 172-177[ DOI: 10.1109/CVPR.2009.5206849 http://dx.doi.org/10.1109/CVPR.2009.5206849 ]

Tola E, Lepetit V and Fua P. 2010. Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 815-830[DOI:10.1109/TPAMI.2009.77]

Tolias G, Sicre R and Jégou H. 2015. Particular object retrieval with integral max-pooling of CNN activations[EB/OL]. [2020-12-30] . https://arxiv.org/pdf/1511.05879.pdf https://arxiv.org/pdf/1511.05879.pdf

Torr P H S. 1997. An assessment of information criteria for motion model selection//Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Juan, USA: IEEE: 47-52[ DOI: 10.1109/CVPR.1997.609296 http://dx.doi.org/10.1109/CVPR.1997.609296 ]

Trajković M and Hedley M. 1998. Fast corner detection. Image and Vision Computing, 16(2): 75-87[DOI:10.1016/S0262-8856(97)00056-5]

Ummenhofer B, Zhou H Z, Uhrig J, Mayer N, Ilg E, Dosovitskiy A and Brox T. 2017. DeMon: depth and motion network for learning monocular stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5622-5631[ DOI: 10.1109/CVPR.2017.596 http://dx.doi.org/10.1109/CVPR.2017.596 ]

Verdie Y, Yi K M, Fua P and Lepetit V. 2015. TILDE: a temporally invariant learned detector//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5279-5288[ DOI: 10.1109/CVPR.2015.7299165 http://dx.doi.org/10.1109/CVPR.2015.7299165 ]

Wang C Y, Miguel Buenaposada J, Zhu R and Lucey S. 2018. Learning depth from monocular videos using direct methods//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2022-2030[ DOI: 10.1109/CVPR.2018.00216 http://dx.doi.org/10.1109/CVPR.2018.00216 ]

Wang J, Song Y, Leung T, Rosenberg C, Wang J B, Philbin J, Chen B and Wu Y. 2014. Learning fine-grained image similarity with deep ranking//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1386-1393[ DOI: 10.1109/CVPR.2014.180 http://dx.doi.org/10.1109/CVPR.2014.180 ]

Wilson K, Bindel D and Snavely N. 2016. When is rotations averaging hard?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 255-270[ DOI: 10.1007/978-3-319-46478-7_16 http://dx.doi.org/10.1007/978-3-319-46478-7_16 ]

Wilson K and Snavely N. 2014. Robust global translations with 1DSfM//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 61-75[ DOI: 10.1007/978-3-319-10578-9_5 http://dx.doi.org/10.1007/978-3-319-10578-9_5 ]

Wu C C. 2013. Towards linear-time incremental structure from motion//Proceedings of 2013 International Conference on 3D Vision. Seattle, Washington, USA: IEEE: 127-134[ DOI: 10.1109/3DV.2013.25 http://dx.doi.org/10.1109/3DV.2013.25 ]

Xie L X. 2017. Research on Key Issues of UAV Dense Point Cloud Generation based on Multiple View Geometry. Zhengzhou: Information Engineering University

谢理想. 2017. 基于多视图几何的无人机稠密点云生成关键技术研究. 郑州: 解放军信息工程大学

Xue Y Z, Chen J S, Wan W T, Huang Y Q, Yu C, Li T P and Bao J Y. 2019. MVSCRF: learning multi-view stereo with conditional random fields//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4311-4320[ DOI: 10.1109/ICCV.2019.00441 http://dx.doi.org/10.1109/ICCV.2019.00441 ]

Yan J F, Wei Z Z, Yi H W, Ding M Y, Zhang R Z, Chen Y S and Tai Y W. 2020. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[EB/OL]. [2020-12-30] . https://arxiv.org/pdf/2007.10872.pdf https://arxiv.org/pdf/2007.10872.pdf

Yandex B A and Lempitsky V. 2015. Aggregating local deep features for image retrieval//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1269-1277[ DOI: 10.1109/ICCV.2015.150 http://dx.doi.org/10.1109/ICCV.2015.150 ]

Yang J Y, Mao W, Alvarez J M and Liu M M. 2020. Cost volume pyramid based depth inference for multi-view stereo//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4876-4885[ DOI: 10.1109/CVPR42600.2020.00493 http://dx.doi.org/10.1109/CVPR42600.2020.00493 ]

Yao Y, Luo Z X, Li S W, Fang T and Quan L. 2018. MVSNET: depth inference for unstructured multi-view stereo//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 785-801[ DOI: 10.1007/978-3-030-01237-3_47 http://dx.doi.org/10.1007/978-3-030-01237-3_47 ]

Yao Y, Luo Z X, Li S W, Shen T W, Fang T and Quan L. 2019. Recurrent MVSNet for high-resolution multi-view stereo depth inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5520-5529[ DOI: 10.1109/CVPR.2019.00567 http://dx.doi.org/10.1109/CVPR.2019.00567 ]

Yao Y, Luo Z X, Li S W, Zhang J Y, Ren Y F, Zhou L and Quan L. 2020. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1787-1796[ DOI: 10.1109/CVPR42600.2020.00186 http://dx.doi.org/10.1109/CVPR42600.2020.00186 ]

Yi H W, Wei Z Z, Ding M Y, Zhang R Z, Chen Y S, Wang G P and Tai Y W. 2019. Pyramid multi-view stereo net with self-adaptive view aggregation[EB/OL]. [2020-12-31] . https://arxiv.org/pdf/1912.03001.pdf https://arxiv.org/pdf/1912.03001.pdf

Yi K M, Trulls E, Lepetit V and Fua P. 2016. LIFT: learned invariant feature transform//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 467-483[ DOI: 10.1007/978-3-319-46466-4_28 http://dx.doi.org/10.1007/978-3-319-46466-4_28 ]

Yoon K J and Kweon I S. 2005. Locally adaptive support-weight approach for visual correspondence search//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 924-931[ DOI: 10.1109/CVPR.2005.218 http://dx.doi.org/10.1109/CVPR.2005.218 ]

Yu Z H and Gao S H. 2020. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1946-1955[ DOI: 10.1109/CVPR42600.2020.00202 http://dx.doi.org/10.1109/CVPR42600.2020.00202 ]

Zach C, Klopschitz M and Pollefeys M. 2010. Disambiguating visual relations using loop constraints//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 1426-1433[ DOI: 10.1109/CVPR.2010.5539801 http://dx.doi.org/10.1109/CVPR.2010.5539801 ]

Zach C, Pock T and Bischof H. 2007. A globally optimal algorithm for robust TV-L 1 range image integration//Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8[ DOI: 10.1109/ICCV.2007.4408983 http://dx.doi.org/10.1109/ICCV.2007.4408983 ].

Zhang L, Zhang Y D, Tang J H, Lu K and Tian Q. 2013. Binary code ranking with weighted hamming distance//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1586-1593[ DOI: 10.1109/CVPR.2013.208 http://dx.doi.org/10.1109/CVPR.2013.208 ]

Zhang J H, Sun D W, Luo Z X, Yao A B, Zhou L, Shen T W, Chen Y R, Liao H G and Quan L. 2019. Learning two-view correspondences and geometry using order-aware network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5844-5853[ DOI: 10.1109/ICCV.2019.00594 http://dx.doi.org/10.1109/ICCV.2019.00594 ]

Zhang J Y, Yao Y, Li S W, Luo Z X and Fang T. 2020. Visibility-aware multi-view stereo network[EB/OL]. [2020-12-30] . https://arxiv.org/pdf/2008.07928.pdf https://arxiv.org/pdf/2008.07928.pdf

Zhang W L. 2019. Research on 3 d Reconstruction Method with Local Information Constraint. Wuhan: Wuhan University

张卫龙. 2019. 局部信息约束的三维重建方法研究. 武汉: 武汉大学

Zhou K, Hou Q M, Wang R and Guo B N. 2008. Real-time KD-tree construction on graphics hardware. ACM Transactions on Graphics, 27(5): 126[DOI:10.1145/1409060.1409079]

Zhou T H, Brown M, Snavely N and Lowe D G. 2017. Unsupervised learning of depth and ego-motion from video//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6612-6619[ DOI: 10.1109/CVPR.2017.700 http://dx.doi.org/10.1109/CVPR.2017.700 ]

Zhu S Y, Zhang R Z, Zhou L, Shen T W, Fang T, Tan P and Quan L. 2018. Very large-scale global SfM by distributed motion averaging//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4568-4577[ DOI: 10.1109/CVPR.2018.00480 http://dx.doi.org/10.1109/CVPR.2018.00480 ]

Zhuang B B, Cheong L F and Lee G H. 2018. Baseline desensitizing in translation averaging//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4539-4547[ DOI: 10.1109/CVPR.2018.00477 http://dx.doi.org/10.1109/CVPR.2018.00477 ]

Alert me when the article has been cited

提交

A survey of cross-view geo-localization methods based on deep learning

Image retrieval based on transformer and asymmetric learning strategy

Visual localization system of integrated active and passive perception for indoor scenes

Multi-scale cost volumes information sharing based multi-view stereo reconstructed model

Rotational motion blur invariants based on Gaussian-Hermite moments