基于图像的自动驾驶3D目标检测综述 ——基准、制约因素和误差分析
韦世奎,李熙莹,叶芝桧,陈泽,陈小彤,田永鸿,党建武,付树军,赵耀(北京交通大学;中山大学;北京大学;兰州交通大学;山东大学) 摘 要
从高分辨率图像中获取周边目标的精准3D位置和尺寸信息,是实现自动驾驶控制和行为决策的基础,因此基于图像的3D目标检测是自动驾驶领域中的研究热点。目前已有学者对该领域方法论及成果进行了比较详细的综述,但对于导致现有方法检测精度不尽如意的制约因素未能进行深入系统的分析。考虑自动驾驶领域在工程应用方面的要求高,且现有方法以数据驱动类型为主,本文从常用数据集和评价基准、数据影响、方法论的制约因素和误差等角度,对学术界和产业界在3D目标检测方面的研究成果及行业应用进行了较为系统的阐述。首先,从学术界探索成果以及自动驾驶行业的应用角度进行了概要介绍。然后,从数据采集设备、数据精度和标注信息三方面详细分析总结了KITTI数据集等4个通用数据集,并对这些数据集提出的主要评价指标进行了对比分析。接着,从数据和方法论方面分析制约算法性能的主要因素及由此造成的误差影响。数据方面,制约因素主要是数据精度、样本差异、标注数据量、以及标注规范;方法论方面,制约因素主要包括先验几何关系、深度预测误差、数据模态等。最后,对国内外的研究现状进行总结,并在数据集、评价指标、目标深度预测等方面提出了未来需要重点关注的研究方向。
关键词
3D Object Detection for Autonomous Driving from Image:A survey ——Benchmarks,Constraints and Error Analysis
Wei Shikui,Li Xiying,Ye Zhihui,Chen Ze,Chen Xiaotong,Tian Yonghong,Dang Jianwu,Fu Shujun,Zhao Yao() Abstract
The development of autonomous driving will greatly change people"s travel and lifestyle. Accurate perception and measurement of the three-dimensional(3D) spatial position and scale of surrounding objects is the basis for realizing autonomous driving control and behavior decision-making, so 3D object detection is an indispensable measure. With the development of sensing technology, the autonomous vehicles are generally equipped with high-precision camera, lidar, radar, GPS/IMU positioning and other sensors. At present, 3D object detection algorithms based on lidar or multi-modal data have achieved excellent performance, but the disadvantages of high-precision lidar such as high price, limited sensing range and sparse point clouds data limit its widely deployment and application. In contrast, high-precision cameras have much lower price, can obtain high-precision spatial information, abundant shape and appearance details, and are more easily to be common used. Thus, image-based 3D object detection is a research hot spot in the field of autonomous driving. At present, some scholars have made a relatively detailed review of the methodology and achievements in this field, but the constraining factors that lead to the unsatisfactory detection accuracy of the existing methods have not been thoroughly and systematically analyzed. Considering the high-performance requirements for engineering applications in the field of automatic driving, and the existing 3D object detection methods are mainly data-driven, this paper systematically summaries the research results and industrial applications from the perspectives of commonly used data sets and evaluation criteria, data impact, methodological constraints and prediction errors. First of all, a brief introduction is made from the perspective of academic research achievements and the application of autonomous driving industry. The latest achievements of Baidu Apollo, Google Waymo, Tesla and other autonomous driving companies in the leading position in the industry are summarized, and the 3D object detection methods for autonomous driving are classified. Then, four general data sets including KITTI dataset, nuScenes dataset, Waymo open dataset and DAIR-V2X dataset are analyzed and summarized in detail from three aspects: data acquisition sensors, data accuracy and data label information; the main evaluation standards proposed by these data sets are compared and analyzed, and the advantages, disadvantages and applicability of these evaluation standards are summarized as well. Next, the main factors that restrict the performance of the image-based 3D object detection algorithm and the errors resulted by them are analyzed from two sides: data and methodology. In terms of data, the main constraints are data accuracy, sample difference, data volume, and data annotation. The data accuracy is mainly limited by equipment performance and the sample difference is mainly reflected the problems during imaging in the object distance difference, imaging angle difference, occlusion and truncation. Data volume is affected by the variety of 3D data types and the high difficulty of labeling. Compared with the 2D object detection data set, the volume of 3D object detection data set is much smaller. Data annotation, especially image annotation used in image-based 3D object detection is mainly 3D bounding box, the labeling details and quality of the data set directly affect the performance of the algorithm. For non-rigid objects such as pedestrians, the annotation error is larger, and there are some suggestions for improving the labeling method. In terms of methodology, the general framework of image-based 3D object detection can be classified as one-stage methods and two-stage methods, and the limitations mainly include the prior geometric relationship, depth prediction accuracy, and data modality used by the algorithm. The prior geometric relationship includes the 2D-3D geometric constraints of 3D objects projected onto 2D images and the position relationships between objects. The image-based 3D object detection methods face such problems as how to use prior 2D-3D geometric constraints and how to deal with occluded and truncated objects. The prediction of depth information from 2D images is a morbid problem, and dimension collapse will cause the loss of depth information in the image while result in the depth prediction error. On the one hand, the depth prediction is often not accurate due to the influence of projection relationship. On the other hand, the performance of continuous depth prediction is often poor at the depth mutation of the image (such as edge of objects). When the prediction depth is discretized, there is a problem that the classification of depth is relatively rough, and the accuracy classification can not be arbitrarily divided. Some scholars put forward other ideas to solve the problem of depth prediction. The limitation of data modality is mainly reflected in the large error of depth prediction based on a single image. The detection performance of the algorithm can be improved by simulating the stereo signal and lidar point clouds, or directly using stereo image as the aided input or using point clouds data with accurate 3D information as supervision signal. In addition, using video data can also improve the detection accuracy to a certain extent. In section 3, the research status between domestic and foreign are summarized and compared from the academic and industrial perspectives. At the end of this paper, some research directions that need to be focused on in the future are proposed in terms of data sets, evaluation indicators, depth prediction and so on.
Keywords
|