场景视点偏移的激光雷达点云分割

郑阳; 林春雨; 廖康; 赵耀; 薛松

发布时间： 2021-10-20
摘要点击次数： 1927
全文下载次数： 759
DOI: 10.11834/jig.200424
2021 | Volume 26 | Number 10

场景视点偏移的激光雷达点云分割

郑阳¹, 林春雨¹, 廖康¹, 赵耀¹, 薛松²(1.北京交通大学信息科学研究所, 北京 100044;2.中车青岛四方车辆研究所有限公司, 青岛 266031)

摘要

目的激光雷达采集的室外场景点云数据规模庞大且包含丰富的空间结构细节信息，但是目前多数点云分割方法并不能很好地平衡结构细节信息的提取和计算量之间的关系。一些方法将点云变换到多视图或体素化网格等稠密表示形式进行处理，虽然极大地减少了计算量，但却忽略了由激光雷达成像特点以及点云变换引起的信息丢失和遮挡问题，导致分割性能降低，尤其是在小样本数据以及行人和骑行者等小物体场景中。针对投影过程中的空间细节信息丢失问题，根据人类观察机制提出了一种场景视点偏移方法，以改善三维（3D）激光雷达点云分割结果。方法利用球面投影将3D点云转换为2维（2D）球面正视图（spherical front view，SFV）。水平移动SFV的原始视点以生成多视点序列，解决点云变换引起的信息丢失和遮挡的问题。考虑到多视图序列中的冗余，利用卷积神经网络（convolutional neural networks，CNN）构建场景视点偏移预测模块来预测最佳场景视点偏移。结果添加场景视点偏移模块后，在小样本数据集中，行人和骑行者分割结果改善相对明显，行人和骑行者（不同偏移距离下）的交叉比相较于原方法最高提升6.5%和15.5%。添加场景视点偏移模块和偏移预测模块后，各类别的交叉比提高1.6% 3%。在公用数据集KITTI（Karlsruhe Institute of Technology and Toyota Technological Institute）上与其他算法相比，行人和骑行者的分割结果取得了较大提升，其中行人交叉比最高提升9.1%。结论本文提出的结合人类观察机制和激光雷达点云成像特点的场景视点偏移与偏移预测方法易于适配不同的点云分割方法，使得点云分割结果更加准确。

关键词

点云分割球面正视图(SFV) 场景视点偏移场景视点偏移预测卷积神经网络(CNN)

LiDAR point cloud segmentation through scene viewpoint offset

Zheng Yang¹, Lin Chunyu¹, Liao Kang¹, Zhao Yao¹, Xue Song²(1.Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China;2.China Railway Rolling Stock Corporation Qingdao Sifang Rolling Stock Research Institute Co., Ltd., Qingdao 266031, China)

Abstract

Objective The point cloud data of the ground scene collected by LiDAR is large in scale and contains rich spatial structure detail information. Many current point cloud segmentation methods cannot well balance the relationship between the extraction of structure detail information and computation. To solve the problem, the current point cloud learning tasks are mainly divided into direct method and conversion method. The direct method directly extracts features from all point clouds and can obtain more spatial structure information, but the scale of point clouds that can be processed is usually limited. Therefore, the direct method requires other auxiliary processing methods for outdoor scenes with a large data scale. The transformation method adopts projection and voxelization methods to transform the point cloud into a dense representation. The image generated by the point cloud transformation method of projecting the point cloud into 2D graphics is denser and more consistent with people's cognition. Moreover, 2D point clouds are easier to fuse with mature 2D convolutional neural networks (CNN). However, the real spatial structure information will inevitably be lost in the transformation. In addition, for small sample data and small object scenes (such as pedestrians and cyclists), the segmentation performance will decrease. The reasons mainly include the loss of information caused by the imaging characteristics and transformation of LiDAR and the more serious occlusion problems. A scene view point offset method based on the human observation mechanism is proposed in this paper to improve the 3D LiDAR point cloud segmentation performance and solve the problem of loss of spatial detail information in projection. Method First, a spherical projection is exploited to transform the 3D point cloud into a 2D spherical front view (SFV). This method is more consistent with LiDAR imaging, which minimizes the loss of generating new information. Moreover, the generated images are denser, more in line with people's cognition, and easy to be combined with the mature 2D convolutional neural network. In addition, the projection method removes part of the point cloud and reduces the amount of computation. Then, to address the problems of information loss and occlusion, the original viewpoint of SFV is horizontally moved to generate a multiview series. SFV projection solves several problems such as sparseness and occlusion in point clouds, but many spatial details will inevitably be lost in the projection. The 3D object itself can be observed from different angles, and the shape characteristics of different angles can be obtained. Based on this feature, a multi-view observation sequence is formed by moving the projection center to obtain a more reliable sample sequence for point cloud segmentation. In the segmentation network, the information of SFV is downsampled by using the Fire convolutional layer and the maximum pooling layer using a series of network SqueezeSeg. To obtain the full-resolution label features of each point, deconvolution is used to carry out upsampling and obtain the decoding features. The skip layer connection structure is adopted to add the upsampling feature map to the low-level feature map of the same size and better combine the low-level features and high-level semantic features of the network. Although the deviation will improve the segmentation results to some extent, blindly increasing the deviation will add unnecessary computation to the system. Considering the redundancy of the multi-view point sequence, finding the optimal offset point in actual work is important. Finally, the CNN is used to construct the scene viewpoint offset prediction module and predict the optimal scene viewpoint offset. Result The dataset adopted in this paper is the converted Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI) dataset. To prove that the proposed method used is suitable for a relatively small dataset, a smaller dataset (contains a training set of 1 182 frames, a validation set of 169 frames) is extracted for ablation experiment verification. In the small sample dataset, after adding the scene viewpoint offset module, the segmentation results of pedestrians and cyclists are improved, and the intersection over union of pedestrians and cyclists at different offset distances are increased by 6.5% and 15.5%, respectively, compared with the original method. After adding the scene viewpoint offset module and the offset prediction module, the crossover ratio of each category is increased by 1.6%3%. On KITTI's raw dataset, compared with other methods, several categories of the intersection over union achieve the best results, and that of the pedestrian increases by 9.1%.Conclusion Combined with the human observation mechanism and LiDAR point cloud imaging characteristics, the method is greatly reduced based on retaining certain 3D space information. High-precision segmentation is efficiently realized to improve point cloud segmentation results easily and adapt to different point cloud segmentation methods. Although the viewpoint shift and the offset prediction method can improve the segmentation results of LiDAR point cloud to a certain extent, an improvement remains possible, especially in the case of a strong correlation between images. Moreover, global and local offset fusion architectures for objects of different types and sizes are designed to utilize the correlation between images, making more accurate, effective predictions for objects in the view.

Keywords

point cloud segmentation spherical front view(SFV) scene viewpoint shift scene viewpoint offset prediction convolutional neural network(CNN)

在线采编平台

在线出版

年度会议

下载中心

年度信息