龙霄潇,程新景,朱昊,张朋举,刘浩敏,李俊,郑林涛,胡庆拥,刘浩,曹汛,杨睿刚,吴毅红,章国锋,刘烨斌,徐凯,郭裕兰,陈宝权(香港大学, 香港 999077;际络科技(上海)有限公司, 上海 200000;南京大学, 南京 210023;际络科技(上海)有限公司, 上海 200000;中国科学院自动化研究所, 北京 100190;中国科学院大学人工智能学院, 北京 100190;商汤研究院, 杭州 311215;国防科技大学, 长沙 410073;牛津大学, 牛津 OX13QR;中山大学, 广州 510275;浙江大学, 杭州 310058;清华大学, 北京 100085;北京大学, 北京 100871)
Recent progress in 3D vision
Long Xiaoxiao,Cheng Xinjing,Zhu Hao,Zhang Pengju,Liu Haomin,Li Jun,Zheng Lintao,Hu Qingyong,Liu Hao,Cao Xun,Yang Ruigang,Wu Yihong,Zhang Guofeng,Liu Yebin,Xu Kai,Guo Yulan,Chen Baoquan(The University of Hong Kong, Hong Kong 999077, China;Jiluo Technology(Shanghai) Co., Ltd., Shanghai 200000, China;Nanjing University, Nanjing 210023, China;Jiluo Technology(Shanghai) Co., Ltd., Shanghai 200000, China;Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China;SenseTime Research Institute, Hangzhou 311215, China;National University of Defense Technology, Changsha 410073, China;University of Oxford, Oxford OX13 QR, United Kingdom;Sun Yat-sen University, Guangzhou 510275, China;Zhejiang University, Hangzhou 310058, China;Tsinghua University, Beijing 100085, China;Peking University, Beijing 100871, China)
3D vision has numerous applications in various areas, such as autonomous vehicles, robotics, digital city, virtual/mixed reality, human-machine interaction, entertainment, and sports. It covers a broad variety of research topics, ranging from 3D data acquisition, 3D modeling, shape analysis, rendering, to interaction. With the rapid development of 3D acquisition sensors (such as low-cost LiDARs, depth cameras, and 3D scanners), 3D data become even more accessible and available. Moreover, the advances in deep learning techniques further boost the development of 3D vision, with a large number of algorithms being proposed recently. We provide a comprehensive review on progress of 3D vision algorithms in recent few years, mostly in the last year. This survey covers seven different topics, including stereo matching, monocular depth estimation, visual localization in large-scale scenes, simultaneous localization and mapping (SLAM), 3D geometric modeling, dynamic human modeling, and point cloud understanding. Although several surveys are now available in the area of 3D vision, this survey is different from few aspects. First, this study covers a wide range of topics in 3D vision and can therefore benefit a broad research community. On the contrary, most existing works mainly focus on a specific topic, such as depth estimation or point cloud learning. Second, this study mainly focuses on the progress in very recent years. Therefore, it can provide the readers with up-to-date information. Third, this paper presents a direct comparison between the progresses in China and abroad. The recent progress in depth image acquisition, including stereo matching and monocular depth estimation, is initially reviewed. The stereo matching algorithms are divided into non-end-to-end stereo matching, end-to-end stereo matching, and unsupervised stereo matching algorithms. The monocular depth estimation algorithms are categorized into depth regression networks and depth completion networks. The depth regression networks are further divided into encoder-decoder networks and composite networks. Then, the recent progress in visual localization, including visual localization in large-scale scenes and SLAM is reviewed. The visual localization algorithms for large-scale scenes are divided into end-to-end and non-end-to-end algorithms, and these non-end-to-end algorithms are further categorized into deep learning-based feature description algorithms, 2D image retrieval-based visual localization algorithms, 2D-3D matching-based visual localization algorithms, and visual localization algorithms based on the fusion of 2D image retrieval and 2D-3D matching. SLAM algorithms are divided into visual SLAM algorithms and multisensor fusion based SLAM algorithms. The recent progress in 3D modeling and understanding, including 3D geometric modeling, dynamic human modeling, and point cloud understanding is further reviewed. 3D geometric modeling algorithms consist of several components, including deep 3D representation learning, deep 3D generative models, structured representation learning and generative models, and deep learning-based 3D modeling. Dynamic human modeling algorithms are divided into multiview RGB modeling algorithms, single-depth camera-based and multiple-depth camera-based algorithms, and single-view RGB modeling methods. Point cloud understanding algorithms are further categorized into semantic segmentation methods and instance segmentation methods for point clouds. The paper is organized as follows. In Section 1, we present the progress in 3D vision outside China. In Section 2, we introduce the progress of 3D vision in China. In Section 3, the 3D vision techniques developed in China and abroad are compared and analyzed. In Section 4, we point out several future research directions in the area.