基于深度学习的单目深度估计技术综述
A review of monocular depth estimation techniques based on deep learning
- 2022年27卷第2期 页码:292-328
纸质出版日期: 2022-02-16 ,
录用日期: 2021-10-19
DOI: 10.11834/jig.210554
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-02-16 ,
录用日期: 2021-10-19
移动端阅览
宋巍, 朱孟飞, 张明华, 赵丹枫, 贺琪. 基于深度学习的单目深度估计技术综述[J]. 中国图象图形学报, 2022,27(2):292-328.
Wei Song, Mengfei Zhu, Minghua Zhang, Danfeng Zhao, Qi He. A review of monocular depth estimation techniques based on deep learning[J]. Journal of Image and Graphics, 2022,27(2):292-328.
场景的深度估计问题是计算机视觉领域中的经典问题之一,也是3维重建和图像合成等应用中的一个重要环节。基于深度学习的单目深度估计技术高速发展,各种网络结构相继提出。本文对基于深度学习的单目深度估计技术最新进展进行了综述,回顾了基于监督学习和基于无监督学习方法的发展历程。重点关注单目深度估计的优化思路及其在深度学习网络结构中的表现,将监督学习方法分为多尺度特征融合的方法、结合条件随机场(conditional random field,CRF)的方法、基于序数关系的方法、结合多元图像信息的方法和其他方法等5类;将无监督学习方法分为基于立体视觉的方法、基于运动恢复结构(structure from motion,SfM)的方法、结合对抗性网络的方法、基于序数关系的方法和结合不确定性的方法等5类。此外,还介绍了单目深度估计任务中常用的数据集和评价指标,并对目前基于深度学习的单目深度估计技术在精确度、泛化性、应用场景和无监督网络中不确定性研究等方面的现状和面临的挑战进行了讨论,为相关领域的研究人员提供一个比较全面的参考。
Scene depth estimation is one of the key issues in the field of computer vision and an important aspect in the applications such as 3D reconstruction and image synthesis. Monocular depth estimation techniques based on deep learning have developed fast recently. Differentiated network structures have been proposed gradually. The current development of monocular depth estimation techniques based on deep learning and a categorical review of supervised and unsupervised learning-based methods have been illustrated in terms of the characteristics of the network structures. The supervised learning methods have been segmented as following: 1) Multi-scale feature fusion strategies: Different scales images contain different kinds of information via fusing multi-scale features extracted from the images. The demonstrated results of depth estimation can be effectively improved. 2)Conditional random fields (CRFs): CRFs
as one of probabilistic graphical models
have good performance in the field of semantic segmentation. Since depth information has similar data distribution attributes as semantic information
the use of consistent CRFs can be effective for predicting continuous depth values. CRFs can be operated as the loss function in the final part of the network as well as a feature fusion module in the medium layer of the network due to its effectiveness for fuse features. 3)Ordinal relations: One category is the relative depth estimation method which uses ordinal relation straight forward to estimate the relative position of two pixels in the image. The other category defines the depth estimation as an ordinal regression issue
which needs to discretize the continuous depth values into discrete depth labels and perform multi-class classification for the global depth. 4) Multiple image information: It is beneficial to combining various image information in depth estimation to improve the accuracy of depth estimation results whereas the image information of different dimensions (time
space
semantics
etc.) can be implicitly related to the depth of the image scene. Four types of information are often adopted: semantic information
neighborhood information
temporal information and object boundary information. 5)Miscellaneous strategies: Some other supervised learning methods still cannot be easily classified into the above-mentioned methods. 6) Various optimization strategies: Acquiring efficiency optimization
using synthetic data obtained via image style transfer for domain adaptation
and the hardware-oriented optimization for underwater scene depth estimation. The unsupervised learning methods of scene depth estimation are classified as below: 1) Stereo vision: Stereo vision aims to deduce the depth information of each pixel in the image from two or more images. Conventional binocular stereo vision algorithm is based on the stereo disparity
and can reconstruct the three-dimensional geometric information of surrounding scenery from the images captured by two camera sensors in terms of the principle of trigonometry. Researchers transform the depth estimation into an image reconstruction
and unsupervised depth estimation method is realized based on binocular (or multi-ocular) images and predicted disparity maps. 2) Structure from motion (SfM): SfM is a technique that automatically recovers camera parameters and the 3D structure of a scene from multiple images or video sequences. The unsupervised method based on SfM has its similarity to the unsupervised method based on stereo vision. It also transforms the depth estimation into the image reconstruction
but there are many differences in details. First
the SfM-based image reconstruction unsupervised processing method is generally using successive frames
that is
the image of the current frame is used to reconstruct the image of the previous or the next frame. Therefore
this kind of method uses image sequence generally-video as the training data. Second
the unsupervised method based on SfM needs to introduce a module for camera pose estimation in the training process. 3) Adversarial strategies: Generative adversarial networks (GANs) facilitate many imaging tasks with their powerful performance
where a discriminator can judge the results generated by the generator to force the generator to produce the same results as the labels. Adding discriminators to unsupervised learning networks can be effective in improving depth estimation results by optimizing image reconstruction results. 4) Ordinal relationship: Similar to the ordinal regression approach that utilizes ordinal relationships in the supervised learning methods
discrete disparity estimation is also desirable in unsupervised networks. In view of the fact that discrete depth values achieve more robust and sharper depth estimates than conventional regression predictions
discrete operations are equally effective in unsupervised networks. 5) Uncertainty: Since unsupervised learning does not use ground truth depth values
the depth results predicted is in doubt. From this viewpoint
it has been proposed to use the uncertainty of the prediction results of unsupervised methods as a benchmark for judging whether the prediction results are credible
and the results can be optimized in monocular depth estimation tasks. Meanwhile
this review refers to the NYU dataset
Karlsruhe Institute Technology and Toyota Technological Institute at Chicago (KITTI) dataset
Make3D dataset and Cityscapes dataset
which are mainly used in monocular deep estimation tasks
as well as six commonly-used evaluation metrics. Based on these datasets and evaluation metrics
a comparison among the reviewed methods is illustrated. Finally
the review discusses the current status of deep learning-based monocular depth estimation techniques in terms of accuracy
generalizability
application scenarios and uncertainty studies in unsupervised networks.
深度学习单目深度估计监督学习无监督学习多尺度特征融合序数关系立体视觉
deep learningmonocular depth estimationsupervised learningunsupervised learningmulti-scale feature fusionordinal relationshipstereo vision
Aleotti F, Tosi F, Poggi M and Mattoccia S. 2019. Generative adversarial networks for unsupervised monocular depth prediction//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 337-354[DOI: 10.1007/978-3-030-11009-3_20http://dx.doi.org/10.1007/978-3-030-11009-3_20]
Atapour-Abarghouei A and Breckon T P. 2018. Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2800-2810[DOI: 10.1109/CVPR.2018.00296http://dx.doi.org/10.1109/CVPR.2018.00296]
Bello J L G and Kim M. 2021. PLADE-Net: towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 6851-6860[DOI: 10.1109/CVPR46437.2021.00678http://dx.doi.org/10.1109/CVPR46437.2021.00678]
Cao Y Z H, Wu Z F and Shen C H. 2018. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(11): 3174-3182[DOI:10.1109/tcsvt.2017.2740321]
Chang H H, Cheng C Y and Sung C C. 2019. Single underwater image restoration based on depth estimation and transmission compensation. IEEE Journal of Oceanic Engineering, 44(4): 1130-1149[DOI:10.1109/joe.2018.2865045]
Chen L, Tang W, Wan T R and John N W. 2020. Self-supervised monocular image depth learning and confidence estimation. Neurocomputing, 381: 272-281[DOI:10.1016/j.neucom.2019.11.038]
Chen W F, Fu Z, Yang D W and Deng J. 2016. Single-image depth perception in the wild//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona Spain: Curran Associates Inc. : 730-738
Chen X T, Chen X J and Zha Z J. 2019. Structure-aware residual pyramid network for monocular depth estimation//Proceedingsof the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI: 694-700[DOI: 10.24963/ijcai.2019/98http://dx.doi.org/10.24963/ijcai.2019/98]
Chen X J, Chen X T, Zhang Y T, Fu X Y and Zha Z J. 2021. Laplacian pyramid neural network for dense continuous-value regression for complex scenes. IEEE Transactions on Neural Networks and Learning Systems, 32(11): 5034-5046[DOI:10.1109/tnnls.2020.3026669]
Choy C B, Gwak J Y, Savarese S and Chandraker M. 2016. Universal correspondence network//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona Spain: Curran Associates Inc. : 2414-2422
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, FrankeU, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223[DOI: 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]
Eigen D and Fergus R. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2650-2658[DOI: 10.1109/iccv.2015.304http://dx.doi.org/10.1109/iccv.2015.304]
Eigen D, Puhrsch C and Fergus R. 2014. Depth map prediction from a single image using a multi-scale deep network//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2366-2374
Fehn C. 2004. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV//Proceedings of SPIE 5291, Stereoscopic Displays and Virtual Reality Systems XI. San Jose, California, United States: SPIE[DOI: 10.1117/12.524762http://dx.doi.org/10.1117/12.524762]
Fu H, Gong M M, Wang C H, Batmanghelich K and Tao D C. 2018. Deep ordinal regression network for monocular depth estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2002-2011[DOI: 10.1109/CVPR.2018.00214http://dx.doi.org/10.1109/CVPR.2018.00214]
Gan Y K, Xu X Y, Sun W X and Lin L. 2018. Monocular depth estimation with affinity,vertical pooling, and label enhancement//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 232-247[DOI: 10.1007/978-3-030-01219-9_14http://dx.doi.org/10.1007/978-3-030-01219-9_14]
Garg R, Kumar B G V, Carneiro G and Reid I. 2016. Unsupervised CNN for single view depth estimation: geometry to the rescue//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 740-756[DOI: 10.1007/978-3-319-46484-8_45http://dx.doi.org/10.1007/978-3-319-46484-8_45]
Garg R, Wadhwa N, Ansari S and Barron J. 2019. Learning single camera depth estimation using dual-pixels//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7628-7637[DOI: 10.1109/iccv.2019.00772http://dx.doi.org/10.1109/iccv.2019.00772]
Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3354-3361[DOI: 10.1109/CVPR.2012.6248074http://dx.doi.org/10.1109/CVPR.2012.6248074]
Godard C, Aodha O M and Brostow G J. 2017. Unsupervised monocular depth estimation with left-right consistency//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 270-279[DOI: 10.1109/CVPR.2017.699http://dx.doi.org/10.1109/CVPR.2017.699]
Godard C, Aodha O M, Firman M and Brostow G J. 2019. Digging intoself-supervised monocular depth estimation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3828-3838[DOI: 10.1109/ICCV.2019.00393http://dx.doi.org/10.1109/ICCV.2019.00393]
Gonzalez J L and Kim M. 2020. Forget about the LiDAR: self-supervised depth estimators with med probability volumes//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: [s. n.]: 12626-12637
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
Guizilini V, Ambrus R, Pillai S, Raventos A and Gaidon A. 2020. 3D packing for self-supervised monocular depth estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2485-2494[DOI: 10.1109/CVPR42600.2020.00256http://dx.doi.org/10.1109/CVPR42600.2020.00256]
Gupta H and Mitra K. 2019. Unsupervised single image underwater depth estimation//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 624-628[DOI: 10.1109/ICIP.2019.8804200http://dx.doi.org/10.1109/ICIP.2019.8804200]
Heise P, Klose S, Jensen B and Knoll A. 2013. PM-Huber: PatchMatch with Huber regularization for stereo matching//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 2360-2367[DOI: 10.1109/ICCV.2013.293http://dx.doi.org/10.1109/ICCV.2013.293]
Hu J J, Ozay M, Zhang Y and Okatani T. 2019. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 1043-1051[DOI: 10.1109/WACV.2019.00116http://dx.doi.org/10.1109/WACV.2019.00116]
Huang G, Li Y X, Pleiss G, Liu Z, Hopcroft J E and Weinberger K Q. 2017a. Snapshot ensembles: train 1, getMfor free[EB/OL]. [2021-10-05].https://arxiv.org/pdf/1704.00109.pdfhttps://arxiv.org/pdf/1704.00109.pdf.
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017b. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4700-4708[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang J, Wang C, Liu Y and Bi T T. 2019. The progress of monocular depth estimation technology. Journal of Image and Graphics, 24(12): 2081-2097
黄军, 王聪, 刘越, 毕天腾. 2019. 单目深度估计技术进展综述. 中国图象图形学报, 24(12): 2081-2097[DOI:10.11834/jig.190455]
Ilg E,ÇiçekÖ, Galesso S, Klein A, Makansi O, Hutter F and Brox T. 2018. Uncertainty estimates and multi-hypotheses networks for optical flow//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 652-667[DOI: 10.1007/978-3-030-01234-2_40http://dx.doi.org/10.1007/978-3-030-01234-2_40]
Jiao J B, Cao Y, Song Y B and Lau R. 2018. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, Cham: 55-71[DOI: 10.1007/978-3-030-01267-0_4http://dx.doi.org/10.1007/978-3-030-01267-0_4]
Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711[DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]
Johnston A and Carneiro G. 2020. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4755-4764[DOI: 10.1109/CVPR42600.2020.00481http://dx.doi.org/10.1109/CVPR42600.2020.00481]
Karsch K, Liu C and Kang S B. 2012. Depth extraction from video using non-parametric sampling//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 775-788[DOI: 10.1007/978-3-642-33715-4_56http://dx.doi.org/10.1007/978-3-642-33715-4_56]
Kendall A and Gal Y. 2017. What uncertainties do we need in Bayesian deep learning for computer vision?//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5580-5590
Kendall A, Grimes M and Cipolla R. 2015. PoseNet: a convolutional network for real-time 6-DOF camera relocalization//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2938-2946[DOI: 10.1109/ICCV.2015.336http://dx.doi.org/10.1109/ICCV.2015.336]
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A and Bry A. 2017. End-to-end learning of geometry and context for deep stereo regression//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 66-75[DOI: 10.1109/ICCV.2017.17http://dx.doi.org/10.1109/ICCV.2017.17]
Khan F, Salahuddin S and Javidnia H. 2020. Deep learning-based monocular depth estimation methods-a state-of-the-art review. Sensors, 20(8): #2272[DOI:10.3390/s20082272]
Koren Y, Bell R and Volinsky C. 2009. Matrix factorization techniques for recommender systems. Computer, 42(8): 30-37[DOI:10.1109/mc.2009.263]
Kuang H F, Xu Q W and Schwertfeger S. 2019. Depth estimation on underwater Omni-directional images using a deep neural network[EB/OL]. [2021-10-05].https://arxiv.org/pdf/1905.09441.pdfhttps://arxiv.org/pdf/1905.09441.pdf
Kumar A C S, Bhandarkar S M and Prasad M. 2018. Monocular depth prediction using generative adversarial networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 413-4138[DOI: 10.1109/CVPRW.2018.00068http://dx.doi.org/10.1109/CVPRW.2018.00068]
Kuznietsov Y, Stückler J and Leibe B. 2017. Semi-supervised deep learning for monocular depth map prediction//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2215-2223[DOI: 10.1109/CVPR.2017.238http://dx.doi.org/10.1109/CVPR.2017.238]
Laina I, Rupprecht C, Belagiannis V, Tombari F and Navab N. 2016. Deeper depth prediction with fully convolutional residual networks//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 239-248[DOI: 10.1109/3DV.2016.32http://dx.doi.org/10.1109/3DV.2016.32]
Lakshminarayanan B, Pritzel A and Blundell C. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6405-6416
Lee J H, Heo M, Kim K R and Kim C S. 2018. Single-image depth estimation based on Fourier domain analysis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 330-339[DOI: 10.1109/CVPR.2018.00042http://dx.doi.org/10.1109/CVPR.2018.00042]
Lee J H and Kim C S. 2019. Monocular depth estimation using relative depth maps//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9721-9730[DOI: 10.1109/CVPR.2019.00996http://dx.doi.org/10.1109/CVPR.2019.00996]
Levin A, Lischinski D and Weiss Y. 2006. A closed form solution to natural image matting//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 61-68[DOI: 10.1109/CVPR.2006.18http://dx.doi.org/10.1109/CVPR.2006.18]
Li B, Shen C H, Dai Y C, van den Hengel A and He M Y. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1119-1127[DOI: 10.1109/CVPR.2015.7298715http://dx.doi.org/10.1109/CVPR.2015.7298715]
Li J, Skinner K A, Eustice R M and Johnson-Roberson M. 2018. WaterGAN: unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robotics and Automation letters, 3(1): 387-394[DOI:10.1109/lra.2017.2730363]
Li Q, Zhu J S, Liu J, Cao R, Li Q Q, Jia S and Qiu G P. 2020. Deep learning based monocular depth prediction: datasets, methods and applications[EB/OL]. [2021-10-05].https://arxiv.org/pdf/2011.04123.pdfhttps://arxiv.org/pdf/2011.04123.pdf
Li S K, Wu X, Cao Y D and Zha H B. 2021. Generalizing to the open world: deep visual odometry with online adaptation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13179-13188[DOI: 10.1109/CVPR46437.2021.01298http://dx.doi.org/10.1109/CVPR46437.2021.01298]
Li Y, Chen X W, Wang Y and Liu M L. 2019. Progress in deep learning based monocular image depth estimation. Laser&Optoelectronics Progress, 56(19): #190001
李阳, 陈秀万, 王媛, 刘茂林. 2019. 基于深度学习的单目图像深度估计的研究进展. 激光与光电子学进展, 56(19): #190001[DOI:10.3788/LOP56.190001]
Liu B Y, Gould S and Koller D. 2010. Single image depth estimation from predicted semantic labels//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 1253-1260[DOI: 10.1109/cvpr.2010.5539823http://dx.doi.org/10.1109/cvpr.2010.5539823]
Liu C, Gu J W, Kim K, Narasimhan S G and Kautz J. 2019. Neural RGB→D sensing: depth and uncertainty from a video camera//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10978-10987[DOI: 10.1109/CVPR.2019.01124http://dx.doi.org/10.1109/CVPR.2019.01124]
Liu F Y, Shen C H and Lin G S. 2015. Deep convolutional neural fields for depth estimation from a single image//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5162-5170[DOI: 10.1109/CVPR.2015.7299152http://dx.doi.org/10.1109/CVPR.2015.7299152]
Long X X, Cheng X J, Zhu H, Zhang P J, Liu H M, Li J, Zheng L T, Hu Q Y, Liu H, Cao X, Yang R G, Wu Y H, Zhang G F, Liu Y B, Xu K, Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics, 26(6): 1389-1428
龙霄潇, 程新景, 朱昊, 张朋举, 刘浩敏, 李俊, 郑林涛, 胡庆拥, 刘浩, 曹汛, 杨睿刚, 吴毅红, 章国锋, 刘烨斌, 徐凯, 郭裕兰, 陈宝权. 2021. 三维视觉前沿进展. 中国图象图形学报, 26(6): 1389-1428[DOI:10.11834/jig.210043]
Lu H M, Li Y J, Uemura T, Kim H and Serikawa S. 2018. Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Generation Computer Systems, 82: 142-148[DOI:10.1016/j.future.2018.01.001]
Luo X, Huang J B, Szeliski R, Matzen K andKopf J. 2020. Consistent video depth estimation. ACM Transactions on Graphics, 39(4): #71[DOI:10.1145/3386569.3392377]
Luo Y, Ren J, Lin M D, Pang J H, Sun W X, Li H S and Lin L. 2018. Single view stereo matching//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 155-163[DOI: 10.1109/CVPR.2018.00024http://dx.doi.org/10.1109/CVPR.2018.00024]
Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A and Brox T. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4040-4048[DOI: 10.1109/CVPR.2016.438http://dx.doi.org/10.1109/CVPR.2016.438]
Ming Y, Meng X Y, Fan C X and Yu H. 2021. Deep learning for monocular depth estimation: a review. Neurocomputing, 438: 14-33[DOI:10.1016/j.neucom.2020.12.089]
Mousavian A, Pirsiavash H and KošeckáJ. 2016. Joint semantic segmentation and depth estimation with deep convolutional networks//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 611-619[DOI: 10.1109/3DV.2016.69http://dx.doi.org/10.1109/3DV.2016.69]
Neal R M. 1996. Bayesian Learning for Neural Networks (Vol. 118). New York: Springer
Nix D A and Weigend A S. 1994. Estimating the mean and variance of the target probability distribution//Proceedings of 1994 IEEE International Conference on Neural Networks. Orlando, USA: IEEE: 55-60[DOI: 10.1109/ICNN.1994.374138http://dx.doi.org/10.1109/ICNN.1994.374138]
Pang J H, Sun W X, Ren J S J, Yang C X and Yan Q. 2017. Cascade residual learning: a two-stage convolutional neural network for stereo matching//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE: 878-886[DOI: 10.1109/ICCVW.2017.108http://dx.doi.org/10.1109/ICCVW.2017.108]
Pilzer A, Lathuilière S, Sebe N and Ricci E. 2019. Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9760-9769[DOI: 10.1109/CVPR.2019.01000http://dx.doi.org/10.1109/CVPR.2019.01000]
Poggi M, Aleotti F, Tosi F and Mattoccia S. 2020. On the uncertainty of self-supervised monocular depth estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 3224-3234[DOI: 10.1109/CVPR42600.2020.00329http://dx.doi.org/10.1109/CVPR42600.2020.00329]
Poggi M, Tosi F and Mattoccia S. 2018. Learning monocular depth estimation with unsupervised trinocular assumptions//Proceedings of 2018 International Conference on 3D Vision. Verona, Italy: IEEE: 324-333[DOI: 10.1109/3DV.2018.00045http://dx.doi.org/10.1109/3DV.2018.00045]
Prados E and Faugeras O. 2006. Shape from shading//Paragios N, Chen Y M and Faugeras O, eds. Handbook of Mathematical Models in Computer Vision. Boston, USA: Springer: 375-388[DOI: 10.1007/0-387-28831-7_23http://dx.doi.org/10.1007/0-387-28831-7_23]
Saxena A, Chung S H and Ng A Y. 2005. Learning depth from single monocular images//Proceedings of the 18th International Conference on Neural Information Processing Systems. Vancouver British, Canada: MIT Press: 1161-1168
Saxena A, Sun M and Ng A Y. 2009. Make3D: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5): 824-840[DOI:10.1109/TPAMI.2008.132]
Schönberger J L, Pollefeys M, Geiger A and Sattler T. 2018. Semantic visual localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6896-6906[DOI: 10.1109/CVPR.2018.00721http://dx.doi.org/10.1109/CVPR.2018.00721]
Shi X J, Chen Z R, Wang H, Yeung D Y, Wong W K and Woo W C. 2015. Convolutional LSTM network: a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 802-810
Silberman N, Hoiem D, Kohli P and Fergus R. 2012. Indoor segmentation and support inference from RGBD images//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 746-760[DOI: 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54]
Song M and Kim W. 2019. Depth estimation from a single image using guided deep network. IEEE Access, 7: 142595-142606[DOI:10.1109/access.2019.2944937]
Spencer J, Bowden R and Hadfield S. 2019. Scale-adaptive neural dense features: learning via hierarchical context aggregation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6193-6202[DOI: 10.1109/CVPR.2019.00636http://dx.doi.org/10.1109/CVPR.2019.00636]
Spencer J, Bowden R and Hadfield S. 2020. DeFeat-Net: general monocular depth via simultaneous unsupervised representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14390-14401[DOI: 10.1109/CVPR42600.2020.01441http://dx.doi.org/10.1109/CVPR42600.2020.01441]
Srivastava N, Hinton G, Krizhevsky A, Sutskever I and Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1): 1929-1958
Tang C, Hou C P and Song Z J. 2015. Depth recovery and refinement from a single image using defocus cues. Journal of Modern Optics, 62(6): 441-448[DOI:10.1080/09500340.2014.967321]
Tateno K, Tombari F, Laina I and Navab N. 2017. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6565-6574[DOI: 10.1109/CVPR.2017.695http://dx.doi.org/10.1109/CVPR.2017.695]
Tsai Y M, Chang Y L and Chen L G. 2006. Block-based vanishing line and vanishing point detection for 3D scene reconstruction//Proceedings of 2006 International Symposium on Intelligent Signal Processing and Communications. Yonago, Japan: IEEE: 586-589[DOI: 10.1109/ISPACS.2006.364726http://dx.doi.org/10.1109/ISPACS.2006.364726]
Wang L J, Zhang J M, Wang O, Lin Z and Lu H C. 2020. SDC-Depth: semantic divide-and-conquer network for monocular depth estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 538-547[DOI: 10.1109/CVPR42600.2020.00062http://dx.doi.org/10.1109/CVPR42600.2020.00062]
Wang P, Shen X H, Lin Z, Cohen S, Price B and Yuille A. 2015. Towards unified depth and semantic prediction from a single image//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2800-2809[DOI: 10.1109/CVPR.2015.7298897http://dx.doi.org/10.1109/CVPR.2015.7298897]
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612[DOI:10.1109/TIP.2003.819861]
Watson J, Firman M, Brostow G and Turmukhambetov D. 2019. Self-supervised monocular depth hints//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2162-2171[DOI: 10.1109/ICCV.2019.00225http://dx.doi.org/10.1109/ICCV.2019.00225]
Xian K, Zhang J M, Wang O, Mai L, Lin Z and Cao Z G. 2020. Structure-guided ranking loss for single image depth prediction//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 608-617[DOI: 10.1109/CVPR42600.2020.00069http://dx.doi.org/10.1109/CVPR42600.2020.00069]
Xu D, Ricci E, Ouyang W L, Wang X G and Sebe N. 2017. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 161-169[DOI: 10.1109/CVPR.2017.25http://dx.doi.org/10.1109/CVPR.2017.25]
Xu D, Wang W, Tang H, Liu H, Sebe N and Ricci E. 2018. Structured attention guided convolutional neural fields for monocular depth estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3917-3925[DOI: 10.1109/CVPR.2018.00412http://dx.doi.org/10.1109/CVPR.2018.00412]
Xue F, Cao J F, Zhou Y, Sheng F, Wang Y K and Ming A L. 2021. Boundary-induced and scene-aggregated network for monocular depth prediction. Pattern Recognition, 115: #107901[DOI:10.1016/j.patcog.2021.107901]
Zeng A, Song S R, Nießner M, Fisher M, Xiao J X and Funkhouser T. 2017. 3DMatch: learning local geometric descriptors from RGB-D reconstructions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 199-208[DOI: 10.1109/CVPR.2017.29http://dx.doi.org/10.1109/CVPR.2017.29]
Zhang H K, Li Y, Cao Y Z H, Liu Y, Shen C H and Yan Y L. 2019. Exploiting temporal consistency for real-time video depth estimation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1725-1734[DOI: 10.1109/ICCV.2019.00181http://dx.doi.org/10.1109/ICCV.2019.00181]
Zhao C Q, Sun Q Y, Zhang C Z, Tang Y and Qian F. 2020. Monocular depth estimation based on deep learning: an overview. Science China Technological Sciences, 63(9): 1612-1627[DOI:10.1007/s11431-020-1582-8]
Zheng M Y, Zhou C, Wu J and Guo L. 2019. Smooth deep network embedding//Proceedings of 2019 International Joint Conference on Neural Networks. Budapest, Hungary: IEEE: 1-8[DOI: 10.1109/IJCNN.2019.8851802http://dx.doi.org/10.1109/IJCNN.2019.8851802]
Zhou T H, Brown M, Snavely N and Lowe D G. 2017. Unsupervised learning of depth and ego-motion from video//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6612-6619[DOI: 10.1109/CVPR.2017.700http://dx.doi.org/10.1109/CVPR.2017.700]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251[DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]
Zoran D, Isola P, Krishnan D and Freeman W T. 2015. Learning ordinal relationships for mid-level vision//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 388-396[DOI: 10.1109/ICCV.2015.52http://dx.doi.org/10.1109/ICCV.2015.52]
相关作者
相关机构