龙霄潇1,2, 程新景2, 朱昊3,2, 张朋举4,5, 刘浩敏6, 李俊7, 郑林涛7, 胡庆拥8, 刘浩9, 曹汛3, 杨睿刚2, 吴毅红4,5, 章国锋10, 刘烨斌11, 徐凯7, 郭裕兰7, 陈宝权12(1.香港大学, 香港 999077;2.际络科技(上海)有限公司, 上海 200000;3.南京大学, 南京 210023;4.中国科学院自动化研究所, 北京 100190;5.中国科学院大学人工智能学院, 北京 100190;6.商汤研究院, 杭州 311215;7.国防科技大学, 长沙 410073;8.牛津大学, 牛津 OX13QR;9.中山大学, 广州 510275;10.浙江大学, 杭州 310058;11.清华大学, 北京 100085;12.北京大学, 北京 100871)
Recent progress in 3D vision
Long Xiaoxiao1,2, Cheng Xinjing2, Zhu Hao3,2, Zhang Pengju4,5, Liu Haomin6, Li Jun7, Zheng Lintao7, Hu Qingyong8, Liu Hao9, Cao Xun3, Yang Ruigang2, Wu Yihong4,5, Zhang Guofeng10, Liu Yebin11, Xu Kai7, Guo Yulan7, Chen Baoquan12(1.The University of Hong Kong, Hong Kong 999077, China;2.Jiluo Technology(Shanghai) Co., Ltd., Shanghai 200000, China;3.Nanjing University, Nanjing 210023, China;4.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;5.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China;6.SenseTime Research Institute, Hangzhou 311215, China;7.National University of Defense Technology, Changsha 410073, China;8.University of Oxford, Oxford OX13 QR, United Kingdom;9.Sun Yat-sen University, Guangzhou 510275, China;10.Zhejiang University, Hangzhou 310058, China;11.Tsinghua University, Beijing 100085, China;12.Peking University, Beijing 100871, China)
3D vision has numerous applications in various areas, such as autonomous vehicles, robotics, digital city, virtual/mixed reality, human-machine interaction, entertainment, and sports. It covers a broad variety of research topics, ranging from 3D data acquisition, 3D modeling, shape analysis, rendering, to interaction. With the rapid development of 3D acquisition sensors (such as low-cost LiDARs, depth cameras, and 3D scanners), 3D data become even more accessible and available. Moreover, the advances in deep learning techniques further boost the development of 3D vision, with a large number of algorithms being proposed recently. We provide a comprehensive review on progress of 3D vision algorithms in recent few years, mostly in the last year. This survey covers seven different topics, including stereo matching, monocular depth estimation, visual localization in large-scale scenes, simultaneous localization and mapping (SLAM), 3D geometric modeling, dynamic human modeling, and point cloud understanding. Although several surveys are now available in the area of 3D vision, this survey is different from few aspects. First, this study covers a wide range of topics in 3D vision and can therefore benefit a broad research community. On the contrary, most existing works mainly focus on a specific topic, such as depth estimation or point cloud learning. Second, this study mainly focuses on the progress in very recent years. Therefore, it can provide the readers with up-to-date information. Third, this paper presents a direct comparison between the progresses in China and abroad. The recent progress in depth image acquisition, including stereo matching and monocular depth estimation, is initially reviewed. The stereo matching algorithms are divided into non-end-to-end stereo matching, end-to-end stereo matching, and unsupervised stereo matching algorithms. The monocular depth estimation algorithms are categorized into depth regression networks and depth completion networks. The depth regression networks are further divided into encoder-decoder networks and composite networks. Then, the recent progress in visual localization, including visual localization in large-scale scenes and SLAM is reviewed. The visual localization algorithms for large-scale scenes are divided into end-to-end and non-end-to-end algorithms, and these non-end-to-end algorithms are further categorized into deep learning-based feature description algorithms, 2D image retrieval-based visual localization algorithms, 2D-3D matching-based visual localization algorithms, and visual localization algorithms based on the fusion of 2D image retrieval and 2D-3D matching. SLAM algorithms are divided into visual SLAM algorithms and multisensor fusion based SLAM algorithms. The recent progress in 3D modeling and understanding, including 3D geometric modeling, dynamic human modeling, and point cloud understanding is further reviewed. 3D geometric modeling algorithms consist of several components, including deep 3D representation learning, deep 3D generative models, structured representation learning and generative models, and deep learning-based 3D modeling. Dynamic human modeling algorithms are divided into multiview RGB modeling algorithms, single-depth camera-based and multiple-depth camera-based algorithms, and single-view RGB modeling methods. Point cloud understanding algorithms are further categorized into semantic segmentation methods and instance segmentation methods for point clouds. The paper is organized as follows. In Section 1, we present the progress in 3D vision outside China. In Section 2, we introduce the progress of 3D vision in China. In Section 3, the 3D vision techniques developed in China and abroad are compared and analyzed. In Section 4, we point out several future research directions in the area.