Current Issue Cover

颜深,张茂军,樊亚春,谭小慧,刘煜,彭杨,刘宇翔(国防科技大学系统工程学院, 长沙 410073;北京师范大学人工智能学院, 北京 100875;首都师范大学信息工程学院, 北京 100048)

摘 要
Progress in the large-scale outdoor image 3D reconstruction

Yan Shen,Zhang Maojun,Fan Yachun,Tan Xiaohui,Liu Yu,Peng Yang,Liu Yuxiang(School of System Engineering, National University of Defense Technology, Changsha 410073, China;School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China;Information Engineering College, Capital Normal University, Beijing 100048, China)

3D reconstruction aims to accurately restore the geometry of an actual scene. It is a fundamental and active research field in computer vision and photogrammetry with important theoretical significance and application value. Acquisition of 3D models is highly relevant for various applications, including smart city, virtual tourism, digital heritage preservation, mapping, and navigation. Various technologies that enable 3D modeling have been developed, and each of them has its own benefits and drawbacks for certain applications. The methods can be classified into two categories, namely, active acquisition methods (e.g., LiDAR and radar) and passive ones (i.e., cameras). As a passive acquisition method, cameras are especially power efficient and do not need direct physical contact with the actual world, and 3D model can be effectively rebuilt from a set of 2D multiview images. In addition, with the increasing availability of cameras as commodity sensors in consumer devices, the cost of camera hardware has decreased significantly. Over the last decades, with the popularization of image acquisition systems (including smart phones, consumer-grade digital cameras, and civil drones) and the rapid development of the Internet, normal people can easily obtain a large number of Internet images about an outdoor scene through search engines (such as Google, Bing, or Baidu). Organizing and utilizing these extremely rich and diverse data source to perform efficient, robust, and accurate 3D reconstruction to provide users with actual perception and immersive experience have become a research hotspot and have attracted widespread attention from the academic and industrial circles. For a human, building an accurate and complete 3D representation of the actual world on the fly is natural, but abstracting the underlying problem in a computer program is extremely hard. Nowadays, many of the underlying problems in large-scale outdoor 3D reconstruction are gradually understood, but many problems, which the research community has not deeply understood, still exist. 3D modeling becomes feasible in computer programming by decomposing the entire reconstruction into several simpler subproblems. Thus far, a growing amount and diversity of methods have been proposed to solve the challenging problem. Some researchers focus on solving the overall modeling problem, and more approaches focus on dealing with subreconstruction tasks. In particular, in recent years, modern convolutional neural network (CNN) models have achieved the best quality for object recognition, image segmentation, image translation, and some other challenging computer vision problems. The emergence of deep learning provides new opportunities and increasing interests for the research on large-scale outdoor image 3D reconstruction. However, 3D reconstruction experiences rapid development from traditional period to deep learning era. Interestingly, to the best of our knowledge, no previous work has presented an overview of recent progress in the large-scale outdoor image 3D reconstruction in detail. To conclude the rapid evolution of this field, traditional image-based 3D reconstruction approaches are presented, a comprehensive survey of the recent learning-based developments is provided. Specifically, the basic serial pipeline of large-scale outdoor image 3D reconstruction, including image retrieval, image feature matching, structure from motion, and multiview stereo is described. Then, traditional methods and deep learning-based methods are distinguished, and the development and application of large-scale outdoor image 3D reconstruction technology in each reconstruction subprocess are systematically and comprehensively reviewed. We show that, although deep learning-based methods have achieved overwhelming advantages in other computer vision and natural language processing tasks, geometric-based methods, which are adopted by some common 3D reconstruction systems, still illuminate higher robust and accurate performance in 3D reconstruction. This finding indicates that deep learning methods can be remarkably improved. Subsequently, the datasets and evaluation indicators applicable to large-scale outdoor scenes in each subprocess are summarized in detail. Furthermore, we introduce the datasets used in each subtask and present a comprehensive dataset specifically for 3D reconstruction. Finally, the current mainstream open source and commercial 3D reconstruction systems and the development status of domestic related industries are introduced. Although the image-based 3D reconstruction technology has made great progress in the past 10 years, the current method still has some problems, as follows: 1) For scenes with repeated textures (such as the Temple of Heaven), the structure from motion process fails, resulting in inaccurate registered camera posed and incomplete reconstruction models; for scenes with weak textures (such as lake surface, glass curtain wall), multiview stereo process fails, thereby resulting in holes in the reconstructed model. 2) The current 3D reconstruction system consumes considerable time to reconstruct scenes (especially large-scale scenes); this approach is different from real-time reconstruction. 3) The price of 3D sensors (such as LiDAR and ToF) has dropped significantly; thus, they become closer to consumer applications. Using these sensors to effectively compensate for the lack of image-based 3D reconstruction is still an unsolved problem.