Review of monocular depth estimation based on deep learning

Huilan Luo; Yifeng Zhou

doi:10.11834/jig.200618

Review | Views : 0 下载量: 0 CSCD: 3

PDF
Export
Share
Collection
Album

Review of monocular depth estimation based on deep learning
Vol. 27, Issue 2, Pages: 390-403(2022)
Published： 16 February 2022 ，

Accepted： 16 March 2021
DOI： 10.11834/jig.200618
稿件说明：

移动端阅览

Huilan Luo, Yifeng Zhou. Review of monocular depth estimation based on deep learning. [J]. Journal of Image and Graphics 27(2):390-403(2022)
DOI：

Huilan Luo, Yifeng Zhou. Review of monocular depth estimation based on deep learning. [J]. Journal of Image and Graphics 27(2):390-403(2022) DOI： 10.11834/jig.200618.

摘要

单目深度估计是从单幅图像中获取场景深度信息的重要技术，在智能汽车和机器人定位等领域应用广泛，具有重要的研究价值。随着深度学习技术的发展，涌现出许多基于深度学习的单目深度估计研究，单目深度估计性能也取得了很大进展。本文按照单目深度估计模型采用的训练数据的类型，从3个方面综述了近年来基于深度学习的单目深度估计方法：基于单图像训练的模型、基于多图像训练的模型和基于辅助信息优化训练的单目深度估计模型。同时，本文在综述了单目深度估计研究常用数据集和性能指标基础上，对经典的单目深度估计模型进行了性能比较分析。以单幅图像作为训练数据的模型具有网络结构简单的特点，但泛化性能较差。采用多图像训练的深度估计网络有更强的泛化性，但网络的参数量大、网络收敛速度慢、训练耗时长。引入辅助信息的深度估计网络的深度估计精度得到了进一步提升，但辅助信息的引入会造成网络结构复杂、收敛速度慢等问题。单目深度估计研究还存在许多的难题和挑战。利用多图像输入中包含的潜在信息和特定领域的约束信息，来提高单目深度估计的性能，逐渐成为了单目深度估计研究的趋势。

Abstract

The development of computer technology promotes the development of computer vision. Nowadays

more researchers focus on the field of 3D vision while monocular depth estimation is one of the basic tasks of 3D vision. Depth estimation from a single image is a critical technology for obtaining scene depth information. This technology has important research value because it has potential applications in intelligent vehicles

robot positioning

and other fields. Compared with traditional depth acquisition methods

monocular depth estimation based on deep learning has the advantages of low cost and simple operation. With the development of deep learning technology

many studies on monocular depth estimation based on deep learning have emerged in recent years

and the performance of monocular depth estimation has made great progress. The monocular depth estimation model needs a large a large amount of data to train the model. The commonly used training data types include RGB and depth (RGB-D) image pairs

stereo image pairs

and image sequences. The depth estimation model training by RGB-D images first extracts the image features through convolutional neural network and then predicts the depth map by using the method of continuous depth value regression. After predicting the depth map

several models use conditional random fields or other methods to optimize the depth map. Unsupervised learning is often used to train the monocular depth estimation model when the training data types are stereo image pairs and image sequences. The monocular estimation model training by stereo image pairs first predicts the disparity map and then estimates depth by using the disparity map. When an image sequence is used to train the model

the model first predicts the depth map of an image in the image sequence

and then the depth estimation model is optimized by images reconstructed by the depth map and other images in the sequence. To improve the accuracy of depth estimation

several researchers use semantic tags

depth range

and other auxiliary information to optimize depth maps. Several data sets can be used for multiple computer vision tasks such as depth estimation and semantic segmentation. Several researchers improve the accuracy of depth estimation by learning depth estimation and semantic segmentation model jointly because depth estimation has a strong correlation with semantic segmentation. When establishing the depth estimation data set

depth camera or light laser detection and ranging (LiDAR) is used to obtain the scene depth. Depth camera and LiDAR are based on the principle that light and other propagation media will reflect when they encounter objects. The depth range obtained by depth cameras and LiDAR is fixed because the propagation medium is dissipated in the transmission

and depth cameras and LiDAR cannot measure depth while the propagation medium energy is very small. Several models first divide the depth range into several depth intervals

take the median value of the depth interval as the depth value of the interval

and then use the method of multiple classifications to predict the depth map. Different training data types not only result in different network model structures but also affect the accuracy of depth estimation. In this review

the current monocular depth estimation methods based on deep learning are surveyed from the perspective of the training data type used by the monocular depth estimation model. Moreover

the single-image training model

the multi-image training model

and the monocular depth estimation model of auxiliary information optimization training are separately discussed. Furthermore

the latest research status of monocular depth estimation is systematically analyzed

and the advantages and disadvantages of various methods are discussed. Finally

the future research trends of monocular depth estimation are summarized.

关键词

单目视觉场景感知深度学习3维重建深度估计

Keywords

monocular visionscene perceptiondeep learning3D reconstructiondepth estimation

references

Badki A, Troccoli A, Kim K, Kautz J, Sen P and Gallo O. 2020. Bi3D: stereo depth estimation via binary classifications//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1597-1605[DOI: 10.1109/CVPR42600.2020.00167http://dx.doi.org/10.1109/CVPR42600.2020.00167]

Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I and Leonard J J. 2016. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Transactions on Robotics, 32(6): 1309-1332[DOI:10.1109/TRO.2016.2624-754]

Cao Y Z H, Wu Z F and Shen C H. 2018. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(11): 3174-3182[DOI:10.1109/TCSVT.2017.2740321]

Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851[DOI: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]

Chen P Y, Liu A H, Liu Y C and Wang Y C F. 2019. Towards scene understanding: unsupervised monocular depth estimation with semantic-aware representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2619-2627[DOI: 10.1109/CVPR.2019.00273http://dx.doi.org/10.1109/CVPR.2019.00273]

Chen W F, Fu Z, Yang D W and Deng J. 2016. Single-image depth perception in the wild//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc. : 730-738

Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, Smagt P V D, Cremers D and Brox T. 2015. FlowNet: learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2758-2766[DOI: 10.1109/ICCV.2015.316http://dx.doi.org/10.1109/ICCV.2015.316]

Eigen D and Fergus R. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2650-2658[DOI: 10.1109/ICCV.2015.304http://dx.doi.org/10.1109/ICCV.2015.304]

Eigen D, Puhrsch C and Fergus R. 2014. Depth map prediction from a single image using a multi-scale deep network//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal,Canada: MIT Press: 2366-2374

Fu H, Gong M M, Wang C H, Batmanghelich K and Tao D C. 2018. Deep ordinal regression network for monocular depth estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2002-2011[DOI: 10.1109/CVPR.2018.00214http://dx.doi.org/10.1109/CVPR.2018.00214]

Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3354-3361[DOI: 10.1109/CVPR.2012.6248074http://dx.doi.org/10.1109/CVPR.2012.6248074]

Godard C, Aodha O M and Brostow G J. 2017. Unsupervised monocular depth estimation with left-right consistency//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6602-6611[DOI: 10.1109/CVPR.2017.699http://dx.doi.org/10.1109/CVPR.2017.699]

Godard C, Aodha O M, Firman M and Brostow G J. 2019. Digging into self-supervised monocular depth estimation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3827-3837[DOI: 10.1109/ICCV.2019.00393http://dx.doi.org/10.1109/ICCV.2019.00393]

He K M, Zhang X, Y Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023[DOI:10.1109/TPAMI.2019.2913372]

Hu J J, Ozay M, Zhang Y and Okatani T. 2019. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 1043-1051[DOI: 10.1109/WACV.2019.00116http://dx.doi.org/10.1109/WACV.2019.00116]

Hu Y F, Qu T, Liu J, Shi Z Q, Zhu B, Cao D P and Chen H. 2019. Human-machine cooperative control of intelligent vehicle: recent developments and future perspectives. Acta Automatica Sinica, 45(7): 1261-1280

胡云峰, 曲婷, 刘俊, 施竹清, 朱冰, 曹东璞, 陈虹. 2019. 智能汽车人机协同控制的研究现状与展望. 自动化学报, 45(7): 1261-1280)[DOI:10.16383/j.aas.c180136]

Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]

Jiao J B, Cao Y, Song Y B and Lau R. 2018. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 55-71[DOI: 10.1007/978-3-030-01267-0_4http://dx.doi.org/10.1007/978-3-030-01267-0_4]

Johnston A and Carneiro G. 2020. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4755-4764[DOI: 10.1109/CVPR42-600.2020.00481http://dx.doi.org/10.1109/CVPR42-600.2020.00481]

Kendall A, Grimes M and Cipolla R. 2015. PoseNet: a convolutional network for real-time 6-DOF camera relocalization//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2938-2946[DOI: 10.1109/ICCV.2015.336http://dx.doi.org/10.1109/ICCV.2015.336]

Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A and Bry A. 2017. End-to-end learning of geometry and context for deep stereo regression//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 66-75[DOI: 10.1109/ICCV.2017.17http://dx.doi.org/10.1109/ICCV.2017.17]

Khan F, Salahuddin S and Javidnia H. 2020. Deep learning-based monocular depth estimation methods-a state-of-the-art review. Sensors, 20(8): 2272[DOI:10.3390/s20-082272]

Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI:10.1145/3065386]

Lafferty J, McCallum A and Pereira F C N. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data//Proceedings of the 18th International Conference on Machine Learning. Williamstown, USA: Morgan Kaufmann: 282-289

Laina I, Rupprecht C, Belagiannis V, Tombari F and Navab N. 2016. Deeper depth prediction with fully convolutional residual networks//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 239-248[DOI: 10.1109/3DV.2016.32http://dx.doi.org/10.1109/3DV.2016.32]

Lee J H, Han M K, Ko D W and Suh I H. 2019. From big to small: multi-scale local planar guidance for monocular depth estimation[EB/OL]. [2021-02-21].https://arxiv.org/pdf/1907.10326.pdfhttps://arxiv.org/pdf/1907.10326.pdf

Lee J H, Heo M, Kim K R and Kim C S. 2018. Single-image depth estimation based on fourier domain analysis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 330-339[DOI: 10.1109/CVPR.2018.00042http://dx.doi.org/10.1109/CVPR.2018.00042]

Li J, Klein R andYao A. 2017. A two-streamed network for estimating fine-scaled depth maps from single RGB images//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3392-3400[DOI: 10.1109/ICCV.2017.365http://dx.doi.org/10.1109/ICCV.2017.365]

Liang X K, Song C and Zhao J J. 2019. Depth estimation technique of sequence image based on deep learning. Infrared and Laser Engineering, 48(S2): #S226002

梁欣凯, 宋闯, 赵佳佳. 2019. 基于深度学习的序列图像深度估计技术. 红外与激光工程, 48(S2): #S226002

Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]

Liu F Y, Shen C H and Lin G S. 2015. Deep convolutional neural fields for depth estimation from a single image//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5162-5170[DOI: 10.1109/CVPR.2015.7299152http://dx.doi.org/10.1109/CVPR.2015.7299152]

Liu F Y, Shen C H, LinG S and Reid I. 2016. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10): 2024-2039[DOI:10.110-9/TPAMI.2015.2505283]

Liu M Y, Tuzel O, Ramalingam S and Chellappa R. 2011. Entropy rate superpixel segmentation//Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 2097-2104[DOI: 10.1109/CVPR.2011.5995323http://dx.doi.org/10.1109/CVPR.2011.5995323]

Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A and Brox T. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4040-4048[DOI: 10.1109/CVPR.2016.438http://dx.doi.org/10.1109/CVPR.2016.438]

Owen A B. 2007. A robust hybrid of lasso and ridge regression. Prediction and Discovery, 443: 59-71

Ranjan A, Jampani V, BallesL, Kim K, Sun D Q, Wulff J and Black M J. 2019. Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion Segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12232-12241[DOI: 10.1109/CVPR.2019.01252http://dx.doi.org/10.1109/CVPR.2019.01252]

Repala V K and Dubey S R. 2019. Dual CNN models for unsupervised monocular depth estimation//Proceedings of the 8th International Conference on Pattern Recognition and Machine Intelligence. Tezpur, India: Springer: 209-217[DOI: 10.1007/978-3-030-34869-4_23http://dx.doi.org/10.1007/978-3-030-34869-4_23]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Saxena A, Chung S H and Ng A Y. 2005. Learning depth from single monocular images//Proceedings of the 18th International Conference on Neural Information Proce-ssing Systems. Vancouver, British Columbia, Canada: MIT Press: 1161-1168

Shelhamer E, Long J and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]

Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[DOI: 10.1109/CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207]

Silberman N, Hoiem D, Kohli P and Fergus R. 2012. Indoor segmentation and support inference from RGBD images//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 746-760[DOI: 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2021-02-21].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Su W and Zhang H F. 2020. Soft regression of monocular depth using scale-semantic exchange network. IEEE Access, 8: 114930-114939[DOI:10.1109/ACCESS.2020.3003466]

Sun Y H, Shi J L and Sun Z X. 2020. Estimating depth from single image using unsupervised convolutional network. Journal of Computer-Aided Design and Computer Graphics, 32(4): 643-651

孙蕴瀚, 史金龙, 孙正兴. 2020. 利用自监督卷积网络估计单图像深度信息. 计算机辅助设计与图形学学报, 32(4): 643-651)[DOI:10.3724/SP.J.1089.2020.1782]

Wang L J, Zhang J M, Wang O, Lin Z and Lu H C. 2020. SDC-Depth: semantic divide-and-conquer network for monocular depth estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 538-547[DOI: 10.1109/CVPR42600.2020.00062http://dx.doi.org/10.1109/CVPR42600.2020.00062]

Wang P, Shen X H, Lin Z, Cohen S, Price B and Yuille A. 2015. Towards unified depth and semantic prediction from a single image//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2800-2809[DOI: 10.1109/CVPR.2015.7298897http://dx.doi.org/10.1109/CVPR.2015.7298897]

Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks[EB/OL]. [2021-02-21].https://arxiv.org/pdf/1711.07971.pdfhttps://arxiv.org/pdf/1711.07971.pdf

Xie Z, Ma H L, Wu K W, Gao Y and Sun Y X. 2020. Sampling aggregate network for scene depth estimation. Acta Automatica Sinica, 46(3): 600-612

谢昭, 马海龙, 吴克伟, 高扬, 孙永宣. 2020. 基于采样汇集网络的场景深度估计. 自动化学报, 46(3): 600-612)[DOI:10.16383/j.aas.c180430]

Xu D, Ricci E, Ouyang W L, Wang X G and Sebe N. 2017. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 161-169[DOI: 10.1109/CVPR.2017.25http://dx.doi.org/10.1109/CVPR.2017.25]

Yang N, von Stumberg L, Wang R and Cremers D. 2020. D3VO: deep depth, deep pose and deep uncertainty for monocular visual odometry//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1278-1289[DOI: 10.1109/CVPR42600.2020.00136http://dx.doi.org/10.1109/CVPR42600.2020.00136]

Yin Z C and Shi J P. 2018. GeoNet: unsupervised learning of dense depth, optical flow and camera pose//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1983-1992[DOI: 10.1109/CVPR.2018.00212http://dx.doi.org/10.1109/CVPR.2018.00212]

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2021-02-21].https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf

Zhan H Y, Garg R, Weerasekera C S, Li K J, Agarwal H and Reid I M. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 340-349[DOI: 10.1109/CVPR.2018.00043http://dx.doi.org/10.1109/CVPR.2018.00043]

Zhang F H, Prisacariu V, Yang R G and Torr P H S. 2019. GA-Net: guided aggregation net for end-to-end stereo matching//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 185-194[DOI: 10.1109/CVPR.2019.00027http://dx.doi.org/10.1109/CVPR.2019.00027]

Zhao C Q, Sun Q Y, Zhang C Z, Tang Y and Qian F. 2020. Monocular depth estimation based on deep learning: an overview. Science China Technological Sciences, 63(9): 1612-1627[DOI:10.1007/s11431-020-1582-8]

Zhao S Y, Zhang L, Shen Y, Zhao S J and Zhang H J. 2019. Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint. IEEE Access, 7: 16323-16335[DOI:10.1109/ACCESS.2019.2894651]

Zhou T H, Brown M, Snavely N and Lowe D G. 2017. Unsupervised learning of depth and ego-motion from video//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6612-6619[DOI: 10.1109/CVPR.2017.700http://dx.doi.org/10.1109/CVPR.2017.700]

Zhuo W, Salzmann M, He X M and Liu M M. 2015. Indoor scene structure analysis for single image depth estimation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 614-622[DOI: 10.1109/CVPR.2015.7298660http://dx.doi.org/10.1109/CVPR.2015.7298660]

Zou Y L, Luo Z L and Huang J B. 2018. DF-Net: unsupervised joint learning of depth and flow using cross-task consistency//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 38-55[DOI: 10.1007/978-3-030-01228-1_3http://dx.doi.org/10.1007/978-3-030-01228-1_3]

Zwald L and Lambert-Lacroix S. 2012. The BerHu penalty and the grouped effect[EB/OL]. [2021-02-21].https://arxiv.org/pdf/1207.6868.pdfhttps://arxiv.org/pdf/1207.6868.pdf

Alert me when the article has been cited

提交

The progress of monocular depth estimation technology

Dual-pixel imaging and applications: an overview

Multi-scale cost volumes information sharing based multi-view stereo reconstructed model

Multi-stage guidance network for constructing dense depth map based on LiDAR and RGB data

Differential rendering: a survey