大规模室外图像3维重建技术研究进展

颜深; 张茂军; 樊亚春; 谭小慧; 刘煜; 彭杨; 刘宇翔

doi:10.11834/jig.200842

三维视觉和图形技术 | 浏览量 : 0 下载量: 0 CSCD: 3

PDF
导出
分享
收藏
专辑

大规模室外图像3维重建技术研究进展
Progress in the large-scale outdoor image 3D reconstruction
2021年26卷第6期页码：1429-1449
纸质出版日期： 2021-06-16 ，

录用日期： 2021-01-20
DOI： 10.11834/jig.200842
稿件说明：

移动端阅览

颜深, 张茂军, 樊亚春, 谭小慧, 刘煜, 彭杨, 刘宇翔. 大规模室外图像3维重建技术研究进展[J]. 中国图象图形学报, 2021,26(6):1429-1449.

Shen Yan, Maojun Zhang, Yachun Fan, Xiaohui Tan, Yu Liu, Yang Peng, Yuxiang Liu. Progress in the large-scale outdoor image 3D reconstruction[J]. Journal of Image and Graphics, 2021,26(6):1429-1449.
颜深, 张茂军, 樊亚春, 谭小慧, 刘煜, 彭杨, 刘宇翔. 大规模室外图像3维重建技术研究进展[J]. 中国图象图形学报, 2021,26(6):1429-1449. DOI： 10.11834/jig.200842.

Shen Yan, Maojun Zhang, Yachun Fan, Xiaohui Tan, Yu Liu, Yang Peng, Yuxiang Liu. Progress in the large-scale outdoor image 3D reconstruction[J]. Journal of Image and Graphics, 2021,26(6):1429-1449. DOI： 10.11834/jig.200842.

摘要

基于图像的3维重建旨在从一组2维多视角图像中精确地恢复真实场景的几何形状，是计算机视觉和摄影测量中基础且活跃的研究课题，具有重要的理论研究意义和应用价值，在智慧城市、虚拟旅游、数字遗产保护、数字地图和导航等领域有着广泛应用。随着图像采集系统（智能手机、消费级数码相机和民用无人机等）的普及和互联网的高速发展，通过搜索引擎可以获取大量关于某个室外场景的互联网图像。利用这些图像进行高效鲁棒准确的3维重建，为用户提供真实感知和沉浸式体验已经成为研究热点，引发了学术界和产业界的广泛关注，涌现了多种方法。深度学习的出现为大规模室外图像的3维重建提供了新的契机。首先阐述大规模室外图像3维重建的基本串行过程，包括图像检索、图像特征点匹配、运动恢复结构和多视图立体。然后从传统方法和基于深度学习的方法两个角度，分别系统全面地回顾大规模室外图像3维重建技术在各重建子过程中的发展和应用，总结各子过程中适用于大规模室外场景的数据集和评价指标。最后介绍现有主流的开源和商业3维重建系统以及国内相关产业的发展现状。

Abstract

3D reconstruction aims to accurately restore the geometry of an actual scene. It is a fundamental and active research field in computer vision and photogrammetry with important theoretical significance and application value. Acquisition of 3D models is highly relevant for various applications

including smart city

virtual tourism

digital heritage preservation

mapping

and navigation. Various technologies that enable 3D modeling have been developed

and each of them has its own benefits and drawbacks for certain applications. The methods can be classified into two categories

namely

active acquisition methods (e.g.

LiDAR and radar) and passive ones (i.e.

cameras). As a passive acquisition method

cameras are especially power efficient and do not need direct physical contact with the actual world

and 3D model can be effectively rebuilt from a set of 2D multiview images. In addition

with the increasing availability of cameras as commodity sensors in consumer devices

the cost of camera hardware has decreased significantly. Over the last decades

with the popularization of image acquisition systems (including smart phones

consumer-grade digital cameras

and civil drones) and the rapid development of the Internet

normal people can easily obtain a large number of Internet images about an outdoor scene through search engines (such as Google

Bing

or Baidu). Organizing and utilizing these extremely rich and diverse data source to perform efficient

robust

and accurate 3D reconstruction to provide users with actual perception and immersive experience have become a research hotspot and have attracted widespread attention from the academic and industrial circles. For a human

building an accurate and complete 3D representation of the actual world on the fly is natural

but abstracting the underlying problem in a computer program is extremely hard. Nowadays

many of the underlying problems in large-scale outdoor 3D reconstruction are gradually understood

but many problems

which the research community has not deeply understood

still exist. 3D modeling becomes feasible in computer programming by decomposing the entire reconstruction into several simpler subproblems. Thus far

a growing amount and diversity of methods have been proposed to solve the challenging problem. Some researchers focus on solving the overall modeling problem

and more approaches focus on dealing with subreconstruction tasks. In particular

in recent years

modern convolutional neural network (CNN) models have achieved the best quality for object recognition

image segmentation

image translation

and some other challenging computer vision problems. The emergence of deep learning provides new opportunities and increasing interests for the research on large-scale outdoor image 3D reconstruction. However

3D reconstruction experiences rapid development from traditional period to deep learning era. Interestingly

to the best of our knowledge

no previous work has presented an overview of recent progress in the large-scale outdoor image 3D reconstruction in detail. To conclude the rapid evolution of this field

traditional image-based 3D reconstruction approaches are presented

a comprehensive survey of the recent learning-based developments is provided. Specifically

the basic serial pipeline of large-scale outdoor image 3D reconstruction

including image retrieval

image feature matching

structure from motion

and multiview stereo is described. Then

traditional methods and deep learning-based methods are distinguished

and the development and application of large-scale outdoor image 3D reconstruction technology in each reconstruction subprocess are systematically and comprehensively reviewed. We show that

although deep learning-based methods have achieved overwhelming advantages in other computer vision and natural language processing tasks

geometric-based methods

which are adopted by some common 3D reconstruction systems

still illuminate higher robust and accurate performance in 3D reconstruction. This finding indicates that deep learning methods can be remarkably improved. Subsequently

the datasets and evaluation indicators applicable to large-scale outdoor scenes in each subprocess are summarized in detail. Furthermore

we introduce the datasets used in each subtask and present a comprehensive dataset specifically for 3D reconstruction. Finally

the current mainstream open source and commercial 3D reconstruction systems and the development status of domestic related industries are introduced. Although the image-based 3D reconstruction technology has made great progress in the past 10 years

the current method still has some problems

as follows: 1) For scenes with repeated textures (such as the Temple of Heaven)

the structure from motion process fails

resulting in inaccurate registered camera posed and incomplete reconstruction models; for scenes with weak textures (such as lake surface

glass curtain wall)

multiview stereo process fails

thereby resulting in holes in the reconstructed model. 2) The current 3D reconstruction system consumes considerable time to reconstruct scenes (especially large-scale scenes); this approach is different from real-time reconstruction. 3) The price of 3D sensors (such as LiDAR and ToF) has dropped significantly; thus

they become closer to consumer applications. Using these sensors to effectively compensate for the lack of image-based 3D reconstruction is still an unsolved problem.

关键词

3维重建图像检索图像特征点匹配运动恢复结构多视图立体

Keywords

3D reconstructionimage retrievalimage feature matchingstructure-from-motionmulti-view-stereo

references

Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz S M and Szeliski R. 2011. Building Rome in a day. Communications of the ACM, 54(10): 105-112[DOI:10.1145/2001269.2001293]

Alahi A, Ortiz R and Vandergheynst P. 2012. Freak: fast retina keypoint//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 510-517[DOI: 10.1109/CVPR.2012.6247715http://dx.doi.org/10.1109/CVPR.2012.6247715]

ArandjelovićR, Gronat P, Torii A, Pajdla T and Sivic J. 2016. NetVLAD: CNN architecture for weakly supervised place recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5297-5307[DOI: 10.1109/CVPR.2016.572http://dx.doi.org/10.1109/CVPR.2016.572]

ArandjelovićR and Zisserman A. 2012. Three things everyone should know to improve object retrieval//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2911-2918[DOI: 10.1109/CVPR.2012.6248018http://dx.doi.org/10.1109/CVPR.2012.6248018]

Balntas V, Riba E, Ponsa D and Mikolajczyk K. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks//Proceeding of 2016 British Machine Vision Conference. York, UK: BMVA Press: 119.1-119.11[DOI: 10.5244/C.30.119http://dx.doi.org/10.5244/C.30.119]

Barroso Laguna A, Riba E, Ponsa D and Mikolajczyk K. 2019. Key. Net: keypoint detection by handcrafted and learned cnn filters//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5835-5843[DOI: 10.1109/ICCV.2019.00593http://dx.doi.org/10.1109/ICCV.2019.00593]

Beckmann N, Kriegel H P, Schneider R and Seeger B. 1990. The R*-tree: an efficient and robust access method for points and rectangles//Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data. Atlantic City, USA: Association for Computing Machinery: 322-331[DOI: 10.1145/93597.98741http://dx.doi.org/10.1145/93597.98741]

Bhowmick B, Patra S, Chatterjee A, Govindu V M and Banerjee S. 2015. Divide and conquer: efficient large-scale structure from motion using graph partitioning//Proceedings of the 12th Asian Conference on Computer Vision. Singapore, Singapore: Springer: 273-287[DOI: 10.1007/978-3-319-16808-1_19http://dx.doi.org/10.1007/978-3-319-16808-1_19]

Bleyer M, Rhemann C and Rother C. 2011. PatchMatch stereo-stereo matching with slanted support windows//Proceedings of 2011 British Machine Vision Conference. Dundee, UK: BMVA Press: #14[DOI: 10.5244/C.25.14http://dx.doi.org/10.5244/C.25.14]

Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S and Rother C. 2017. DSAC-differentiable RANSAC for camera localization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2492-2500[DOI: 10.1109/CVPR.2017.267http://dx.doi.org/10.1109/CVPR.2017.267]

Brachmann E and Rother C. 2019. Neural-guided RANSAC: learning where to sample model hypotheses//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4321-4330[DOI: 10.1109/ICCV.2019.00442http://dx.doi.org/10.1109/ICCV.2019.00442]

Bradley D, Boubekeur T and Heidrich W. 2008. Accurate multi-view reconstruction using robust binocular stereo and surface meshing//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587792http://dx.doi.org/10.1109/CVPR.2008.4587792]

Calonder M, Lepetit V, Strecha C and Fua P. 2010. Brief: binary robust independent elementary features//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 778-792[DOI: 10.1007/978-3-642-15561-1_56http://dx.doi.org/10.1007/978-3-642-15561-1_56]

Chatterjee A and Govindu V M. 2018. Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 958-972[DOI:10.1109/TPAMI.2017.2693984]

Chen R, Han S F, Xu J and Su H. 2019. Point-based multi-view stereo network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1538-1547[DOI: 10.1109/ICCV.2019.00162http://dx.doi.org/10.1109/ICCV.2019.00162]

Chen Y, Shen S H, Chen Y S and Wang G P. 2020. Graph-based parallel large scale structure from motion. Pattern Recognition, 107: 107537[DOI:10.1016/j.patcog.2020.107537]

Cheng S, Xu Z X, Zhu S L, Li Z W, Li L E, Ramamoorthi R and Su H. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2521-2531[DOI: 10.1109/CVPR42600.2020.00260http://dx.doi.org/10.1109/CVPR42600.2020.00260]

Chopra S, Hadsell R and LeCun Y. 2005. Learning a similarity metric discriminatively, with application to face verification//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 539-546[DOI: 10.1109/CVPR.2005.202http://dx.doi.org/10.1109/CVPR.2005.202]

Chum O and Matas J. 2010. Large-scale discovery of spatially related images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2): 371-377[DOI:10.1109/TPAMI.2009.166]

Chum O, Mikulik A, Perdoch M and Matas J. 2011. Total recall Ⅱ: query expansion revisited//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 889-896[DOI: 10.1109/CVPR.2011.5995601http://dx.doi.org/10.1109/CVPR.2011.5995601]

DeTone D, MalisiewiczT and Rabinovich A. 2018. Superpoint: self-supervised interest point detection and description//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 224-236[DOI: 10.1109/CVPRW.2018.00060http://dx.doi.org/10.1109/CVPRW.2018.00060]

Dong J M and Soatto S. 2015. Domain-size pooling in local descriptors: DSP-SIFT//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5097-5106[DOI: 10.1109/CVPR.2015.7299145http://dx.doi.org/10.1109/CVPR.2015.7299145]

Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A and Sattler T. 2019. D2-Net: a trainable CNN for joint description and detection of local features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8084-8093[DOI: 10.1109/CVPR.2019.00828http://dx.doi.org/10.1109/CVPR.2019.00828]

Ebel P, Trulls E, Yi K M, Fua P and Mishchuk A. 2019. Beyond Cartesian representations for local descriptors//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 253-262[DOI: 10.1109/ICCV.2019.00034http://dx.doi.org/10.1109/ICCV.2019.00034]

Eriksson A, Olsson C, Kahl F and Chin T J. 2018. Rotation averaging and strong duality//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 127-135[DOI: 10.1109/CVPR.2018.00021http://dx.doi.org/10.1109/CVPR.2018.00021]

Fuhrmann S, Langguth F and Goesele M. 2014. MVE-A Multi-View Reconstruction Environment//Proceedings of the 12th Eurographics Workshop on Graphics and Cultural Heritage. Darmstadt, Germany: Eurographics Association: 11-18[DOI: 10.2312/gch.20141299http://dx.doi.org/10.2312/gch.20141299]

Frahm J M, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C C, Jen Y H, Dunn E, Clipp B, Lazebnik S and Pollefeys M. 2010. Building Rome on a cloudless day//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer: 368-381[DOI: 10.1007/978-3-642-15561-1_27http://dx.doi.org/10.1007/978-3-642-15561-1_27]

Frahm J M and Pollefeys M. 2006. RANSAC for (quasi-) degenerate data (QDEGSAC)//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 453-460[DOI: 10.1109/CVPR.2006.235http://dx.doi.org/10.1109/CVPR.2006.235]

Gallup D, Frahm J M and Mordohai P. 2007. Real-time plane-sweeping stereo with multiple sweeping directions//Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2007.383245http://dx.doi.org/10.1109/CVPR.2007.383245]

Ge Y X, Wang H B, Zhu F, Zhao R and Li H S. 2020. Self-supervising fine-grained region similarities for large-scale image localization[EB/OL]. [2020-07-09].https://arxiv.org/pdf/2006.03926.pdfhttps://arxiv.org/pdf/2006.03926.pdf

Goesele M, Curless B and Seitz S M. 2006. Multi-view stereo revisited//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2402-2409[DOI: 10.1109/CVPR.2006.199http://dx.doi.org/10.1109/CVPR.2006.199]

Goldstein T, Hand P, Lee C, Voroninski V and Soatto S. 2016. Shapefit and shapekick for robust, scalable structure from motion//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 289-304[DOI: 10.1007/978-3-319-46478-7_18http://dx.doi.org/10.1007/978-3-319-46478-7_18]

Gordo A, Almazán J, Revaud J and Larlus D. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2): 237-254[DOI:10.1007/s11263-017-1016-8]

Gu X D, Fan Z W, Zhu S Y, Dai Z Z, Tan F T and Tan P. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2492-2501[DOI: 10.1109/CVPR42600.2020.00257http://dx.doi.org/10.1109/CVPR42600.2020.00257]

Guo X K. 2019. Research on Key Technologies of Large-Scale 3D Terrain Construction. Shenyang: University of Chinese Academy of Sciences (Shenyang Institute of Computing Technology, Chinese Academy of Sciences

郭向坤. 2019. 大规模三维地形构建的关键技术研究. 沈阳: 中国科学院大学(中国科学院沈阳计算技术研究所

Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 1735-1742[DOI: 10.1109/CVPR.2006.100http://dx.doi.org/10.1109/CVPR.2006.100]

Han X F, Laga H and Bennamoun M. 2019. Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era[EB/OL]. [2020-12-30].http://arxiv.org/abs/1906.06543.pdfhttp://arxiv.org/abs/1906.06543.pdf

Hartley R and Zisserman A. 2003. Multiple View Geometry in Computer Vision. 2nd ed. Cambridge: Cambridge University Press

Hartley R I, Aftab K and Trumpf J. 2011. L1 rotation averaging using the Weiszfeld algorithm//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 3041-3048[DOI: 10.1109/CVPR.2011.5995745http://dx.doi.org/10.1109/CVPR.2011.5995745]

Havlena M, Torii A and Pajdla T. 2010. Efficient structure from motion by graph optimization//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 100-113[DOI: 10.1007/978-3-642-15552-9_8http://dx.doi.org/10.1007/978-3-642-15552-9_8]

Heinly J, Schönberger J L, Dunn E and Frahm J M. 2015. Reconstructing the world*in six days//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3287-3295[DOI: 10.1109/CVPR.2015.7298949http://dx.doi.org/10.1109/CVPR.2015.7298949].

Hirschmuller H. 2008. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2): 328-341[DOI:10.1109/TPAMI.2007.1166]

Hoffer E and Ailon N. 2015. Deep metric learning using triplet network//Processing of 3rd International workshop on similarity-based pattern recognition. Copenhagen, Denmark: Springer: 84-92[DOI: 10.1007/978-3-319-24261-3\-7http://dx.doi.org/10.1007/978-3-319-24261-3\-7]

Jégou H, Douze M and Schmid C. 2008. Hamming embedding and weak geometric consistency for large scale image search//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 304-317[DOI: 10.1007/978-3-540-88682-2_24http://dx.doi.org/10.1007/978-3-540-88682-2_24]

Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P and Schmid C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1704-1716[DOI:10.1109/TPAMI.2011.235]

Ke Y and Sukthankar R. 2004. PCA-SIFT: a more distinctive representation for local image descriptors//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington DC, USA: IEEE: 506-513[DOI: 10.1109/CVPR.2004.1315206http://dx.doi.org/10.1109/CVPR.2004.1315206]

Kluger F, Brachmann E, Ackermann H, Rother C, Yang M Y and Rosenhahn B. 2020. CONSAC: robust multi-model fitting by conditional sample consensus//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4633-4642[DOI: 10.1109/CVPR42600.2020.00469http://dx.doi.org/10.1109/CVPR42600.2020.00469]

Knapitsch A, Park J, Zhou Q Y and Koltun V. 2017. Tanks and temples: benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4): 78[DOI:10.1145/3072959.3073599]

Kutulakos K N and Seitz S M. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3): 199-218[DOI:10.1023/A:1008191222954]

Lawler E L. 1963. The quadratic assignment problem. Management Science, 9(4): 586-599[DOI:10.1287/mnsc.9.4.586]

Lenc K and Vedaldi A. 2016. Learning covariant feature detectors//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 100-117[DOI: 10.1007/978-3-319-49409-8_11http://dx.doi.org/10.1007/978-3-319-49409-8_11]

Leordeanu M and Hebert M. 2005. A spectral technique for correspondence problems using pairwise constraints//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE: 1482-1489[DOI: 10.1109/ICCV.2005.20http://dx.doi.org/10.1109/ICCV.2005.20]

Leordeanu M, Sukthankar R and Hebert M. 2012. Unsupervised learning for graph matching. International Journal of Computer Vision, 96(1): 28-45[DOI:10.1007/s11263-011-0442-2]

Leutenegger S, Chli M and Siegwart R. 2011. Brisk: binary robust invariant scalable keypoints//Proceedings of 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE: 2548-2555[DOI: 10.1109/ICCV.2011.6126542http://dx.doi.org/10.1109/ICCV.2011.6126542]

Levi G. 1973. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. CALCOLO, 9(4): 341[DOI:10.1007/BF02575586]

Lhuillier M and Quan L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3): 418-433[DOI:10.1109/TPAMI.2005.44]

Li H, Sumner R W and Pauly M. 2008a. Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum, 27(5): 1421-1430[DOI:10.1111/j.1467-8659.2008.01282.x]

Li J G, Li E, Chen Y R, Xu L and Zhang Y M. 2010. Bundled depth-map merging for multi-view stereo//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2769-2776[DOI: 10.1109/CVPR.2010.5540004http://dx.doi.org/10.1109/CVPR.2010.5540004]

Li X W, Wu C C, Zach C, Lazebnik S and Frahm J M. 2008b. Modeling and recognition of landmark image collections using iconic scene graphs//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 427-440[DOI: 10.1007/978-3-540-88682-2_33http://dx.doi.org/10.1007/978-3-540-88682-2_33]

Li Z X. 2016. Research on Methods of Multi-View Stereo 3D Reconstruction. Harbin: Harbin Institute of Technology

李兆歆. 2016. 多视角立体三维重建方法研究. 哈尔滨: 哈尔滨工业大学

Lindeberg T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2): 79-116[DOI:10.1023/A:1008045108935]

Liu L, Li H D and Dai Y C. 2019a. Stochastic attraction-repulsion embedding for large scale image localization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2570-2579[DOI: 10.1109/ICCV.2019.00266http://dx.doi.org/10.1109/ICCV.2019.00266]

Liu Y, Shen Z H, Lin Z X, Peng S D, Bao H J and Zhou X W. 2019b. GIFT: learning transformation-invariant dense visual descriptors via group CNNS//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 6990-7001

Liu Z H. 2020. Research on Computation and Quality Evaluation Method for Large-Scale Multi-View 3D Reconstruction. Hangzhou: Zhejiang University

刘卓昊. 2020. 大规模多视图三维重建计算与质量评价方法研究. 杭州: 浙江大学

Liu Z Y, Qiao H, Yang X and Hoi S C H. 2014. Graph matching by simplified convex-concave relaxation procedure. International Journal of Computer Vision, 109(3): 169-186[DOI:10.1007/s11263-014-0707-7]

Loiola E M, de Abreu N M M, Boaventura-Netto P O, Hahn P and Querido T. 2007. A survey for the quadratic assignment problem. European Journal of Operational Research, 176(2): 657-690[DOI:10.1016/j.ejor.2005.09.032]

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110[DOI:10.1023/B:VISI.0000029664.99615.94]

Luo K Y, Guan T, Ju L L, Huang H P and Luo Y W. 2019a. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 10452-10461[DOI: 10.1109/ICCV.2019.01055http://dx.doi.org/10.1109/ICCV.2019.01055]

Luo Z, Shen T, Zhou L, Zhang J, Yao Y, Li S, Fang T and Quan L. 2019b. Contextdesc: Local descriptor augmentation with cross-modality context//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2527-2536[DOI: 10.1109/CVPR.2019.00263http://dx.doi.org/10.1109/CVPR.2019.00263]

Luo Z X, Shen T W, Zhou L, Zhu S Y, Zhang R Z, Yao Y and Quan L. 2018. GeoDesc: learning local descriptors by integrating geometry constraints//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 170-185[DOI: 10.1007/978-3-030-01240-3_11http://dx.doi.org/10.1007/978-3-030-01240-3_11]

Luo Z X, Zhou L, Bai X Y, Chen H K, Zhang J H, Yao Y, Li S W, Fang T and Quan L. 2020. ASLFeat: learning local features of accurate shape and localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6588-6597[DOI: 10.1109/CVPR42600.2020.00662http://dx.doi.org/10.1109/CVPR42600.2020.00662]

Ma J Y, Jiang X Y, Jiang J J, Zhao J and Guo X J. 2019. LMR: learning a two-class classifier for mismatch removal. IEEE Transactions on Image Processing, 28(8): 4045-4059[DOI:10.1109/TIP.2019.2906490]

Mainali P, Lafruit G, Yang Q, Geelen B, Van Gool L and Lauwereins R. 2013. SIFER: scale-invariant feature detector with error resilience. International Journal of Computer Vision, 104(2): 172-197[DOI:10.1007/s11263-013-0622-3]

Merrell P, Akbarzadeh A, Wang L, Mordohai A, Wang L, Mordohai P, Frahm J M, Yang R G, Nister D and Pollefeys M. 2007. Real-time visibility-based fusion of depth maps//Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8[DOI: 10.1109/ICCV.2007.4408984http://dx.doi.org/10.1109/ICCV.2007.4408984]

Mishchuk A, Mishkin D, Radenović F and Matas J. 2017. Working hard to know your neighbor's margins: local descriptor learning loss//Proceedings of the 31 st Conference on Neural Information Processing Systems. Long Beach, USA: MIT press: 4826-4837

Mobahi H, Collobert R and Weston J. 2009. Deep learning from temporal coherence in video//Proceedings of the 26th Annual International Conference on Machine Learning. Quebec, Canada: ACM: 737-744[DOI: 10.1145/1553374.1553469http://dx.doi.org/10.1145/1553374.1553469]

Moo Yi K, Trulls E, Ono Y, Lepetit V, Salzmann M and Fua P. 2018. Learning to find good correspondences//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2666-2674[DOI: 10.1109/CVPR.2018.00282http://dx.doi.org/10.1109/CVPR.2018.00282]

Morel J M and Yu G S. 2009. Asift: a new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, 2(2): 438-469[DOI:10.1137/080732730]

Moulon P, Monasse P and Marlet R. 2013. Adaptive structure from motion witha Contrariomodel estimation//Proceedings of the 11th Asian Conference on Computer Vision. Daejeon, Korea (South): Springer: 257-270[DOI: 10.1007/978-3-642-37447-0_20http://dx.doi.org/10.1007/978-3-642-37447-0_20].

Nistér D and Stewénius H. 2006. Scalable recognition with a vocabulary tree//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2161-2168[DOI: 10.1109/CVPR.2006.264http://dx.doi.org/10.1109/CVPR.2006.264]

Ono Y, Trulls E, Fua P and Yi K M. 2018. LF-Net: learning local features from images//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc. : 6237-6247

Özyeşil O and Singer A. 2015. Robust camera location estimation by convex programming//Proceedings of 2015 IEEE Conference on Computer Visionand Pattern Recognition. Boston, USA: IEEE: 2674-2683[DOI: 10.1109/CVPR.2015.7298883http://dx.doi.org/10.1109/CVPR.2015.7298883]

Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2007. Object retrieval with large vocabularies and fast spatial matching//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, Minnesota, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2007.383172http://dx.doi.org/10.1109/CVPR.2007.383172]

Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2008. Lost in quantization: improving particular object retrieval in large scale image databases//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587635http://dx.doi.org/10.1109/CVPR.2008.4587635]

RadenovićF, Tolias G and Chum O. 2016. CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 3-20[DOI: 10.1007/978-3-319-46448-0_1http://dx.doi.org/10.1007/978-3-319-46448-0_1]

Revaud J, De Souza C, Humenberger M and Weinzaepfel P. 2019. R2 d2: reliable and repeatable detector and descriptor//Proceedings of 2019 Neural Information Processing Systems 32. Vancouver, Canada: MIT press: 12405-12415

Rublee E, Rabaud V, Konolige K and Bradski G R. 2011. Orb: an efficient alternative to SIFT or SURF//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain:IEEE: 2564-2571[DOI: 10.1109/ICCV.2011.6126544http://dx.doi.org/10.1109/ICCV.2011.6126544]

Sarlin P E, DeTone D, Malisiewicz T and Rabinovich A. 2020. SuperGlue: learning feature matching with graph neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4937-4946[DOI: 10.1109/CVPR42600.2020.00499http://dx.doi.org/10.1109/CVPR42600.2020.00499]

Schönberger J L and Frahm J M. 2016. Structure-from-motion revisited//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4104-4113[DOI: 10.1109/CVPR.2016.445http://dx.doi.org/10.1109/CVPR.2016.445]

Schönberger J L, Zheng E L, Frahm J M and Pollefeys M. 2016. Pixelwise view selection for unstructured multi-view stereo//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 501-518[DOI: 10.1007/978-3-319-46487-9_31http://dx.doi.org/10.1007/978-3-319-46487-9_31]

Schöps T, Schönberger J L, Galliani S, Sattler T, Schindler K, Pollefeys M and Geiger A. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2538-2547[DOI: 10.1109/CVPR.2017.272http://dx.doi.org/10.1109/CVPR.2017.272]

Schroff F, Kalenichenko D and Philbin J. 2015. Facenet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 815-823[DOI: 10.1109/CVPR.2015.7298682http://dx.doi.org/10.1109/CVPR.2015.7298682]

Shen S H. 2013. Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Transactions on Image Processing, 22(5): 1901-1914[DOI:10.1109/TIP.2013.2237921]

Shen T W, Luo Z X, Zhou L, Zhang R Z, Zhu S Y, Fang T and Quan L. 2019. Matchable image retrieval by learning from surface reconstruction//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 415-431[DOI: 10.1007/978-3-030-20887-5_26http://dx.doi.org/10.1007/978-3-030-20887-5_26]

Shen T W, Zhu S Y, Fang T, Zhang R Z and Quan L. 2016. Graph-based consistent matching for structure-from-motion//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 139-155[DOI: 10.1007/978-3-319-46487-9_9http://dx.doi.org/10.1007/978-3-319-46487-9_9]

Shi J B and Malik J. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 888-905[DOI:10.1109/34.868688]

Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P and Moreno-Noguer F. 2015. Discriminative learning of deep convolutional feature point descriptors//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 118-126[DOI: 10.1109/ICCV.2015.22http://dx.doi.org/10.1109/ICCV.2015.22]

Sivic J and Zisserman A. 2003. Video Google: a text retrieval approach to object matching in videos//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE: 1470-1477[DOI: 10.1109/ICCV.2003.1238663http://dx.doi.org/10.1109/ICCV.2003.1238663]

Snavely N, Seitz S M and Szeliski R. 2006. Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics, 25(3): 835-846[DOI:10.1145/1141911.1141964]

Snavely N, Seitz S M and Szeliski R. 2008. Skeletal graphs for efficient structure from motion//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587678http://dx.doi.org/10.1109/CVPR.2008.4587678]

Steinbrücker F, Sturm J and Cremers D. 2011. Real-time visual odometry from dense RGB-D images//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops. Barcelona, Spain: IEEE: 719-722[DOI: 10.1109/ICCVW.2011.6130321http://dx.doi.org/10.1109/ICCVW.2011.6130321]

Strecha C, Fransens R and Van Gool L. 2006. Combined depth and outlier estimation in multi-view stereo//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2394-2401[DOI: 10.1109/CVPR.2006.78http://dx.doi.org/10.1109/CVPR.2006.78]

Sun K. 2017. Research on Image Matching and Three Dimension Scene Structure Reconstruction Method. Wuhan: Huazhong University of Science and Technology

孙琨. 2017. 图像匹配与场景三维重建方法研究. 武汉: 华中科技大学

Sweeney C, Fragoso V, Höllerer T and Turk M. 2016. Large scale SFM with the distributed camera model//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 230-238[DOI: 10.1109/3DV.2016.31http://dx.doi.org/10.1109/3DV.2016.31]

Sweeney C, Sattler T, Höllerer T, Turk M and Pollefeys M. 2015a. Optimizing the viewing graph for structure-from-motion//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 801-809[DOI: 10.1109/ICCV.2015.98http://dx.doi.org/10.1109/ICCV.2015.98]

Sweeney C, Hollerer T and Turk M. 2015b. Theia: A fast and scalable structure-from-motion library//Proceedings of the 23rd ACM international conference on Multimedia. Brisbane, Australia: ACM: 693-696[DOI: 10.1145/2733373.2807405http://dx.doi.org/10.1145/2733373.2807405]

Tang C Z and Tan P. 2019. BA-Net: dense bundle adjustment network[EB/OL]. [2020-12-30].https://dblp.uni-trier.de/rec/conf/iclr/TangT19.html?view=bibtexhttps://dblp.uni-trier.de/rec/conf/iclr/TangT19.html?view=bibtex

Thomee B, Shamma D A, Friedland G, Elizalde B, Ni K, Poland D, Borth D and Li L J. 2016. YFCC100M: the new data in multimedia research. Communications of the ACM, 59(2): 64-73[DOI:10.1145/2812802]

Tian Y R, Fan B and Wu F C. 2017. L2-Net: deep learning of discriminative patch descriptor in Euclidean space//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6128-6136[DOI: 10.1109/CVPR.2017.649http://dx.doi.org/10.1109/CVPR.2017.649]

Tian Y R, Yu X, Fan B, Wu F C, Heijnen H and Balntas V. 2019. Sosnet: second order similarity regularization for local descriptor learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11008-11017[DOI: 10.1109/CVPR.2019.01127http://dx.doi.org/10.1109/CVPR.2019.01127]

Toews M and Wells W. 2009. SIFT-rank: ordinal description for invariant feature correspondence//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 172-177[DOI: 10.1109/CVPR.2009.5206849http://dx.doi.org/10.1109/CVPR.2009.5206849]

Tola E, Lepetit V and Fua P. 2010. Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 815-830[DOI:10.1109/TPAMI.2009.77]

Tolias G, Sicre R and Jégou H. 2015. Particular object retrieval with integral max-pooling of CNN activations[EB/OL]. [2020-12-30].https://arxiv.org/pdf/1511.05879.pdfhttps://arxiv.org/pdf/1511.05879.pdf

Torr P H S. 1997. An assessment of information criteria for motion model selection//Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Juan, USA: IEEE: 47-52[DOI: 10.1109/CVPR.1997.609296http://dx.doi.org/10.1109/CVPR.1997.609296]

Trajković M and Hedley M. 1998. Fast corner detection. Image and Vision Computing, 16(2): 75-87[DOI:10.1016/S0262-8856(97)00056-5]

Ummenhofer B, Zhou H Z, Uhrig J, Mayer N, Ilg E, Dosovitskiy A and Brox T. 2017. DeMon: depth and motion network for learning monocular stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5622-5631[DOI: 10.1109/CVPR.2017.596http://dx.doi.org/10.1109/CVPR.2017.596]

Verdie Y, Yi K M, Fua P and Lepetit V. 2015. TILDE: a temporally invariant learned detector//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5279-5288[DOI: 10.1109/CVPR.2015.7299165http://dx.doi.org/10.1109/CVPR.2015.7299165]

Wang C Y, Miguel Buenaposada J, Zhu R and Lucey S. 2018. Learning depth from monocular videos using direct methods//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2022-2030[DOI: 10.1109/CVPR.2018.00216http://dx.doi.org/10.1109/CVPR.2018.00216]

Wang J, Song Y, Leung T, Rosenberg C, Wang J B, Philbin J, Chen B and Wu Y. 2014. Learning fine-grained image similarity with deep ranking//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1386-1393[DOI: 10.1109/CVPR.2014.180http://dx.doi.org/10.1109/CVPR.2014.180]

Wilson K, Bindel D and Snavely N. 2016. When is rotations averaging hard?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 255-270[DOI: 10.1007/978-3-319-46478-7_16http://dx.doi.org/10.1007/978-3-319-46478-7_16]

Wilson K and Snavely N. 2014. Robust global translations with 1DSfM//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 61-75[DOI: 10.1007/978-3-319-10578-9_5http://dx.doi.org/10.1007/978-3-319-10578-9_5]

Wu C C. 2013. Towards linear-time incremental structure from motion//Proceedings of 2013 International Conference on 3D Vision. Seattle, Washington, USA: IEEE: 127-134[DOI: 10.1109/3DV.2013.25http://dx.doi.org/10.1109/3DV.2013.25]

Xie L X. 2017. Research on Key Issues of UAV Dense Point Cloud Generation based on Multiple View Geometry. Zhengzhou: Information Engineering University

谢理想. 2017. 基于多视图几何的无人机稠密点云生成关键技术研究. 郑州: 解放军信息工程大学

Xue Y Z, Chen J S, Wan W T, Huang Y Q, Yu C, Li T P and Bao J Y. 2019. MVSCRF: learning multi-view stereo with conditional random fields//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4311-4320[DOI: 10.1109/ICCV.2019.00441http://dx.doi.org/10.1109/ICCV.2019.00441]

Yan J F, Wei Z Z, Yi H W, Ding M Y, Zhang R Z, Chen Y S and Tai Y W. 2020. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[EB/OL]. [2020-12-30].https://arxiv.org/pdf/2007.10872.pdfhttps://arxiv.org/pdf/2007.10872.pdf

Yandex B A and Lempitsky V. 2015. Aggregating local deep features for image retrieval//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1269-1277[DOI: 10.1109/ICCV.2015.150http://dx.doi.org/10.1109/ICCV.2015.150]

Yang J Y, Mao W, Alvarez J M and Liu M M. 2020. Cost volume pyramid based depth inference for multi-view stereo//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4876-4885[DOI: 10.1109/CVPR42600.2020.00493http://dx.doi.org/10.1109/CVPR42600.2020.00493]

Yao Y, Luo Z X, Li S W, Fang T and Quan L. 2018. MVSNET: depth inference for unstructured multi-view stereo//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 785-801[DOI: 10.1007/978-3-030-01237-3_47http://dx.doi.org/10.1007/978-3-030-01237-3_47]

Yao Y, Luo Z X, Li S W, Shen T W, Fang T and Quan L. 2019. Recurrent MVSNet for high-resolution multi-view stereo depth inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5520-5529[DOI: 10.1109/CVPR.2019.00567http://dx.doi.org/10.1109/CVPR.2019.00567]

Yao Y, Luo Z X, Li S W, Zhang J Y, Ren Y F, Zhou L and Quan L. 2020. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1787-1796[DOI: 10.1109/CVPR42600.2020.00186http://dx.doi.org/10.1109/CVPR42600.2020.00186]

Yi H W, Wei Z Z, Ding M Y, Zhang R Z, Chen Y S, Wang G P and Tai Y W. 2019. Pyramid multi-view stereo net with self-adaptive view aggregation[EB/OL]. [2020-12-31].https://arxiv.org/pdf/1912.03001.pdfhttps://arxiv.org/pdf/1912.03001.pdf

Yi K M, Trulls E, Lepetit V and Fua P. 2016. LIFT: learned invariant feature transform//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 467-483[DOI: 10.1007/978-3-319-46466-4_28http://dx.doi.org/10.1007/978-3-319-46466-4_28]

Yoon K J and Kweon I S. 2005. Locally adaptive support-weight approach for visual correspondence search//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 924-931[DOI: 10.1109/CVPR.2005.218http://dx.doi.org/10.1109/CVPR.2005.218]

Yu Z H and Gao S H. 2020. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1946-1955[DOI: 10.1109/CVPR42600.2020.00202http://dx.doi.org/10.1109/CVPR42600.2020.00202]

Zach C, Klopschitz M and Pollefeys M. 2010. Disambiguating visual relations using loop constraints//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 1426-1433[DOI: 10.1109/CVPR.2010.5539801http://dx.doi.org/10.1109/CVPR.2010.5539801]

Zach C, Pock T and Bischof H. 2007. A globally optimal algorithm for robust TV-L1range image integration//Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8[DOI: 10.1109/ICCV.2007.4408983http://dx.doi.org/10.1109/ICCV.2007.4408983].

Zhang L, Zhang Y D, Tang J H, Lu K and Tian Q. 2013. Binary code ranking with weighted hamming distance//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1586-1593[DOI: 10.1109/CVPR.2013.208http://dx.doi.org/10.1109/CVPR.2013.208]

Zhang J H, Sun D W, Luo Z X, Yao A B, Zhou L, Shen T W, Chen Y R, Liao H G and Quan L. 2019. Learning two-view correspondences and geometry using order-aware network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5844-5853[DOI: 10.1109/ICCV.2019.00594http://dx.doi.org/10.1109/ICCV.2019.00594]

Zhang J Y, Yao Y, Li S W, Luo Z X and Fang T. 2020. Visibility-aware multi-view stereo network[EB/OL]. [2020-12-30].https://arxiv.org/pdf/2008.07928.pdfhttps://arxiv.org/pdf/2008.07928.pdf

Zhang W L. 2019. Research on 3 d Reconstruction Method with Local Information Constraint. Wuhan: Wuhan University

张卫龙. 2019. 局部信息约束的三维重建方法研究. 武汉: 武汉大学

Zhou K, Hou Q M, Wang R and Guo B N. 2008. Real-time KD-tree construction on graphics hardware. ACM Transactions on Graphics, 27(5): 126[DOI:10.1145/1409060.1409079]

Zhou T H, Brown M, Snavely N and Lowe D G. 2017. Unsupervised learning of depth and ego-motion from video//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6612-6619[DOI: 10.1109/CVPR.2017.700http://dx.doi.org/10.1109/CVPR.2017.700]

Zhu S Y, Zhang R Z, Zhou L, Shen T W, Fang T, Tan P and Quan L. 2018. Very large-scale global SfM by distributed motion averaging//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4568-4577[DOI: 10.1109/CVPR.2018.00480http://dx.doi.org/10.1109/CVPR.2018.00480]

Zhuang B B, Cheong L F and Lee G H. 2018. Baseline desensitizing in translation averaging//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4539-4547[DOI: 10.1109/CVPR.2018.00477http://dx.doi.org/10.1109/CVPR.2018.00477]

文章被引用时，请邮件提醒。

提交

结合Transformer与非对称学习策略的图像检索

面向室内场景的主被动融合视觉定位系统

多尺度代价体信息共享的多视角立体重建网络

基于Gaussian-Hermite矩的旋转运动模糊不变量

非局部注意力双分支网络的跨模态赤足足迹检索