场景视点偏移的激光雷达点云分割
LiDAR point cloud segmentation through scene viewpoint offset
- 2021年26卷第10期 页码:2514-2523
收稿日期:2020-08-07,
修回日期:2021-01-15,
录用日期:2021-1-22,
纸质出版日期:2021-09-16
DOI: 10.11834/jig.200424
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2020-08-07,
修回日期:2021-01-15,
录用日期:2021-1-22,
纸质出版日期:2021-09-16
移动端阅览
目的
2
激光雷达采集的室外场景点云数据规模庞大且包含丰富的空间结构细节信息,但是目前多数点云分割方法并不能很好地平衡结构细节信息的提取和计算量之间的关系。一些方法将点云变换到多视图或体素化网格等稠密表示形式进行处理,虽然极大地减少了计算量,但却忽略了由激光雷达成像特点以及点云变换引起的信息丢失和遮挡问题,导致分割性能降低,尤其是在小样本数据以及行人和骑行者等小物体场景中。针对投影过程中的空间细节信息丢失问题,根据人类观察机制提出了一种场景视点偏移方法,以改善三维(3D)激光雷达点云分割结果。
方法
2
利用球面投影将3D点云转换为2维(2D)球面正视图(spherical front view,SFV)。水平移动SFV的原始视点以生成多视点序列,解决点云变换引起的信息丢失和遮挡的问题。考虑到多视图序列中的冗余,利用卷积神经网络(convolutional neural networks,CNN)构建场景视点偏移预测模块来预测最佳场景视点偏移。
结果
2
添加场景视点偏移模块后,在小样本数据集中,行人和骑行者分割结果改善相对明显,行人和骑行者(不同偏移距离下)的交叉比相较于原方法最高提升6.5%和15.5%。添加场景视点偏移模块和偏移预测模块后,各类别的交叉比提高1.6% 3%。在公用数据集KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)上与其他算法相比,行人和骑行者的分割结果取得了较大提升,其中行人交叉比最高提升9.1%。
结论
2
本文提出的结合人类观察机制和激光雷达点云成像特点的场景视点偏移与偏移预测方法易于适配不同的点云分割方法,使得点云分割结果更加准确。
Objective
2
The point cloud data of the ground scene collected by LiDAR is large in scale and contains rich spatial structure detail information. Many current point cloud segmentation methods cannot well balance the relationship between the extraction of structure detail information and computation. To solve the problem
the current point cloud learning tasks are mainly divided into direct method and conversion method. The direct method directly extracts features from all point clouds and can obtain more spatial structure information
but the scale of point clouds that can be processed is usually limited. Therefore
the direct method requires other auxiliary processing methods for outdoor scenes with a large data scale. The transformation method adopts projection and voxelization methods to transform the point cloud into a dense representation. The image generated by the point cloud transformation method of projecting the point cloud into 2D graphics is denser and more consistent with people's cognition. Moreover
2D point clouds are easier to fuse with mature 2D convolutional neural networks (CNN). However
the real spatial structure information will inevitably be lost in the transformation. In addition
for small sample data and small object scenes (such as pedestrians and cyclists)
the segmentation performance will decrease. The reasons mainly include the loss of information caused by the imaging characteristics and transformation of LiDAR and the more serious occlusion problems. A scene view point offset method based on the human observation mechanism is proposed in this paper to improve the 3D LiDAR point cloud segmentation performance and solve the problem of loss of spatial detail information in projection.
Method
2
First
a spherical projection is exploited to transform the 3D point cloud into a 2D spherical front view (SFV). This method is more consistent with LiDAR imaging
which minimizes the loss of generating new information. Moreover
the generated images are denser
more in line with people's cognition
and easy to be combined with the mature 2D convolutional neural network. In addition
the projection method removes part of the point cloud and reduces the amount of computation. Then
to address the problems of information loss and occlusion
the original viewpoint of SFV is horizontally moved to generate a multiview series. SFV projection solves several problems such as sparseness and occlusion in point clouds
but many spatial details will inevitably be lost in the projection. The 3D object itself can be observed from different angles
and the shape characteristics of different angles can be obtained. Based on this feature
a multi-view observation sequence is formed by moving the projection center to obtain a more reliable sample sequence for point cloud segmentation. In the segmentation network
the information of SFV is downsampled by using the Fire convolutional layer and the maximum pooling layer using a series of network SqueezeSeg. To obtain the full-resolution label features of each point
deconvolution is used to carry out upsampling and obtain the decoding features. The skip layer connection structure is adopted to add the upsampling feature map to the low-level feature map of the same size and better combine the low-level features and high-level semantic features of the network. Although the deviation will improve the segmentation results to some extent
blindly increasing the deviation will add unnecessary computation to the system. Considering the redundancy of the multi-view point sequence
finding the optimal offset point in actual work is important. Finally
the CNN is used to construct the scene viewpoint offset prediction module and predict the optimal scene viewpoint offset.
Result
2
The dataset adopted in this paper is the converted Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI) dataset. To prove that the proposed method used is suitable for a relatively small dataset
a smaller dataset (contains a training set of 1 182 frames
a validation set of 169 frames) is extracted for ablation experiment verification. In the small sample dataset
after adding the scene viewpoint offset module
the segmentation results of pedestrians and cyclists are improved
and the intersection over union of pedestrians and cyclists at different offset distances are increased by 6.5% and 15.5%
respectively
compared with the original method. After adding the scene viewpoint offset module and the offset prediction module
the crossover ratio of each category is increased by 1.6%3%. On KITTI's raw dataset
compared with other methods
several categories of the intersection over union achieve the best results
and that of the pedestrian increases by 9.1%.
Conclusion
2
Combined with the human observation mechanism and LiDAR point cloud imaging characteristics
the method is greatly reduced based on retaining certain 3D space information. High-precision segmentation is efficiently realized to improve point cloud segmentation results easily and adapt to different point cloud segmentation methods. Although the viewpoint shift and the offset prediction method can improve the segmentation results of LiDAR point cloud to a certain extent
an improvement remains possible
especially in the case of a strong correlation between images. Moreover
global and local offset fusion architectures for objects of different types and sizes are designed to utilize the correlation between images
making more accurate
effective predictions for objects in the view.
Caltagirone L, Scheidegger S, Svensson L and Wahde M. 2017. Fast LIDAR-based road detection using fully convolutional neural networks//The 2017 IEEE Intelligent Vehicles Symposium. Los Angeles, USA: IEEE: 1019-1024[ DOI: 10.1109/IVS.2017.7995848 http://dx.doi.org/10.1109/IVS.2017.7995848 ]
Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 652-660[ DOI: 10.1109/cvpr.2017.16 http://dx.doi.org/10.1109/cvpr.2017.16 ]
Chen X Z, Ma H M, Wan J, Li B and Xia T. 2017. Multi-view 3D object detection network for autonomous driving//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1907-1915[ DOI: 10.1109/cvpr.2017.691 http://dx.doi.org/10.1109/cvpr.2017.691 ]
Engelcke M, Rao D, Wang D Z, Tong C H and Posner I. 2017. Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE: 1355-1361[ DOI: 10.1109/ICRA.2017.7989161 http://dx.doi.org/10.1109/ICRA.2017.7989161 ]
Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3354-3361[ DOI: 10.1109/cvpr.2012.6248074 http://dx.doi.org/10.1109/cvpr.2012.6248074 ]
Graham B. 2020. Sparse 3D convolutional neural networks[EB/OL]. [2020-07-26] . https://arxiv.org/pdf/1505.02890.pdf https://arxiv.org/pdf/1505.02890.pdf
Hu Q Y, Yang B, Xie L H, Rosa S, Guo Y L, Wang Z H, Trigoni N and Markham A. 2020. RandLA-Net: efficient semantic segmentation of large-scale point clouds[EB/OL]. [2020-07-28] . https://arxiv.org/pdf/1911.11236.pdf https://arxiv.org/pdf/1911.11236.pdf
Jiang M Y, Wu Y R, Zhao T Q, Zhao Z L and Lu C W. PointSIFT: a SIFT-like network module for 3D point cloud semantic segmentation[EB/OL]. [2020-07-26] . https://arxiv.org/pdf/1807.00652.pdf https://arxiv.org/pdf/1807.00652.pdf
Jing Z W, Guan H Y, Zang Y F, Ni H, Li D L and Yu Y T. 2020. Survey of poind cloud semantic segmentation based on deep learning[J/OL]. Journal of Frontiers of Computer Science Technology: 1-28[2020-12-23] . http://kns.cnki.net/kcms/detail/11.5602.tp.20200827.1544.004.html http://kns.cnki.net/kcms/detail/11.5602.tp.20200827.1544.004.html
景庄伟, 管海燕, 臧玉府, 倪欢, 李迪龙, 于永涛. 2020. 基于深度学习的点云语义分割研究综述[J/OL]. 计算机科学与探索: 1-28[2020-12-23] . http://kns.cnki.net/kcms/detail/11.5602.tp.20200827 http://kns.cnki.net/kcms/detail/11.5602.tp.20200827
Ku J, Mozifian M, Lee J, Harakeh A and Waslander S L. 2018. Joint 3D proposal generation and object detection from view aggregation//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE: 1-8[ DOI: 10.1109/iros.2018.8594049 http://dx.doi.org/10.1109/iros.2018.8594049 ]
Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. PointCNN: convolution on X -transformed points[EB/OL]. [2020-07-28] . https://arxiv.org/pdf/1801.07791.pdf https://arxiv.org/pdf/1801.07791.pdf .
Qi C R, Liu W, Wu C X, Su H and Guibas L J. 2018. Frustum PointNets for 3D object detection from RGB-D data//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 918-927[ DOI: 10.1109/cvpr.2018.00102 http://dx.doi.org/10.1109/cvpr.2018.00102 ]
Qi C R, Yi L, Su H and Guibas L J. 2020. PointNet++: deep hierarchical feature learning on point sets in a metric space[EB/OL]. [2020-07-28] . https://arxiv.org/pdf/1706.02413.pdf https://arxiv.org/pdf/1706.02413.pdf
Shi S S, Wang X G and Li H S. 2019. PointRCNN: 3D object proposal generation and detection from point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 770-779[ DOI: 10.1109/CVPR.2019.00086 http://dx.doi.org/10.1109/CVPR.2019.00086 ]
Simon M, Milz S, Amende K and Gross H M. 2018. Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 197-209[ DOI: 10.1007/978-3-030-11009-3_11 http://dx.doi.org/10.1007/978-3-030-11009-3_11 ]
Simonyan K and Zisserman A. 2020. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-07-26] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Wang D Z and Posner I. 2015. Voting for voting in online point cloud object detection//Proceedings of the Robotics: Science and Systems. Rome, Italy: IEEE Press: 10-15607[ DOI: 10.15607/rss.2015.xi.035 http://dx.doi.org/10.15607/rss.2015.xi.035 ]
Wang Y, Shi T Y, Yun P, Tai L and Liu M. 2020. PointSeg: real-time semantic segmentation based on 3D LiDAR point cloud[EB/OL]. [2020-07-26] . https://arxiv.org/pdf/1807.06288.pdf https://arxiv.org/pdf/1807.06288.pdf
Wu B C, Wan A, Yue X Y and Keutzer K. 2018. SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 1887-1893[ DOI: 10.1109/ICRA.2018.8462926 http://dx.doi.org/10.1109/ICRA.2018.8462926 ]
Wu B C, Zhou X Y, Zhao S C, Yue X Y and Keutzer K. 2019. SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 4376-4382[ DOI: 10.1109/ICRA.2019.8793495 http://dx.doi.org/10.1109/ICRA.2019.8793495 ]
Ye Y Y. 2020. Study on Key Technology of Environmental Perception Oriented to Road Traffic. Beijing: Beijing Jiaotong University
叶阳阳. 2020. 面向道路交通的环境感知关键技术研究. 北京: 北京交通大学
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4490-4499[ DOI: 10.1109/cvpr.2018.00472 http://dx.doi.org/10.1109/cvpr.2018.00472 ]
相关作者
相关机构