融合稀疏注意力和实例增强的雷达点云分割
LiDAR point cloud semantic segmentation combined with sparse attention and instance enhancement
- 2023年28卷第2期 页码:483-494
纸质出版日期: 2023-02-16 ,
录用日期: 2022-03-17
DOI: 10.11834/jig.210787
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-02-16 ,
录用日期: 2022-03-17
移动端阅览
刘盛, 曹益烽, 黄文豪, 李丁达. 融合稀疏注意力和实例增强的雷达点云分割[J]. 中国图象图形学报, 2023,28(2):483-494.
Sheng Liu, Yifeng Cao, Wenhao Huang, Dingda Li. LiDAR point cloud semantic segmentation combined with sparse attention and instance enhancement[J]. Journal of Image and Graphics, 2023,28(2):483-494.
目的
2
雷达点云语义分割是3维环境感知的重要环节,准确分割雷达点云对象对无人驾驶汽车和自主移动机器人等应用具有重要意义。由于雷达点云数据具有非结构化特征,为提取有效的语义信息,通常将不规则的点云数据投影成结构化的2维图像,但会造成点云数据中几何信息丢失,不能得到高精度分割效果。此外,真实数据集中存在数据分布不均匀问题,导致小样本物体分割效果较差。为解决这些问题,本文提出一种基于稀疏注意力和实例增强的雷达点云分割方法,有效提高了激光雷达点云语义分割精度。
方法
2
针对数据集中数据分布不平衡问题,采用实例注入方式增强点云数据。首先,通过提取数据集中的点云实例数据,并在训练中将实例数据注入到每一帧点云中,实现实例增强的效果。由于稀疏卷积网络不能获得较大的感受野,提出Transformer模块扩大网络的感受野。为了提取特征图的关键信息,使用基于稀疏卷积的空间注意力机制,显著提高了网络性能。另外,对不同类别点云对象的边缘,提出新的TVloss用于增强网络的监督能力。
结果
2
本文提出的模型在SemanticKITTI和nuScenes数据集上进行测试。在SemanticKITTI数据集上,本文方法在线单帧精度在平均交并比(mean intersection over union,mIoU)指标上为64.6%,在nuScenes数据集上为75.6%。消融实验表明,本文方法的精度在baseline的基础上提高了3.1%。
结论
2
实验结果表明,本文提出的基于稀疏注意力和实例增强的雷达点云分割方法在SemanticKITTI和nuScenes数据集上都取得了较好表现,提高了网络对点云细节的分割能力,使点云分割结果更加准确。
Objective
2
Outdoor-perceptive recognition is essential for robots-mobile and autonomous driving vehicles applications. LiDAR-based point cloud semantic segmentation has been developing for that. Three-dimensional image-relevant (3D image-relevant) LiDAR can be focused on the range of information quickly and accurately for outdoor-related perception with no illumination effects. To get feasible effects for autonomous driving vehicles
LiDAR point cloud-related semantic segmentation can be predicted in terms of point cloud analysis for overall scene factors like roads
vehicles
pedestrians
and plants. Recent deep learning-based (DL-based) two-dimensional image-relevant (2D image-relevant) computer vision has been developing intensively. Nevertheless
LiDAR point cloud data is featured of unstructured
disorder
sparse and non-uniform densities beyond 2D image-relevant structured data. The challenging issue is to extract semantic information from LiDAR data effectively.DL-based methods can be divided into three categories: 1) point-based
2) projection-relevant
and 3) voxel-related. To extract effective semantic information
the existing methods are often used to project irregular point cloud data into 2D images-structured because of the unstructured characteristics of LiDAR point cloud data. However
geometric information loss-derived high-precision segmentation results cannot be obtained well. In addition
lower segmentation effect for small sample objects has restricted by uneven data distribution. To resolve these problems
we develop a sparse attention and instance enhancement-based LiDAR point cloud segmentation method
which can improve the accuracy of semantic segmentation of LiDAR point cloud effectively.
Method
2
An end-to-end sparse convolution-based network is demonstrated for LiDAR point cloud semantic segmentation. To optimize uneven data distribution in the training data set
instance-injected is used to enhance the point cloud data. Instance-injected can be employed to extract its points cloud data factors like pedestrians
vehicles
and bicycles. Instance-related data is injected into an appropriate position of each frame during the training process. Recently
the receptive field-strengthened and attention mechanism-aware visual semantic segmentation tasks are mainly focused on. But
a wider receptive field cannot be realized due to the encoder-decoder-based network ability. A lightweight Transformer module is then illustrated to widen the receptive field of the network. To get global information better
the Transformer module can be used to build up the interconnection between each non-empty voxel. The Transformer module is used in the bottleneck layer of the network for memory optimization. To extract the key positions of the feature map
a sparse convolution-based spatial attention module is proposed as well. Additionally
to clarify the edges of multiple types of point cloud objects
a new TVloss is adopted to identify the semantic boundaries and alleviate the noise within each region-predicted.
Result
2
Our model is proposed and evaluated on SemanticKITTI dataset and nuScenes dataset both. It achieves 64.6% mean intersection over union (mIoU) in the single-frame accuracy evaluation of SemanticKITTI
and 75.6% mIoU on the nuScenes dataset. The ablation experiments show that the mIoU is improved by 1.2% in terms of instance-injected
and the spatial attention module has an improvement of 1.0% and 0.7% each based on sparse convolution and the transformer module. The efficiency of these two modules is improved a total of 1.5%
the mIoU-based TVloss achieves 0.2% final gain. The integrated analysis of all modules is increased by 3.1% in comparison with the benchmark.
Conclusion
2
A new sparse convolution-based end-to-end network is developed for LiDAR point cloud semantic segmentation. We use instance-injected to resolve the problem of the unbalanced distribution of data profiling. A wider range of receptive field is achieved in terms of the proposed Transformer module. To extract the key location of the feature map
a sparse convolution-based spatial attention mechanism is melted into. A new TVloss loss function is added and the edge of the objects in point clouds is clarified. The comparative experiments are designed in comparison with recent SOTA(state of the art) methods
including projection and point-based methods. Our proposed method has its potentials for the improved segmentation ability of the network to point cloud details and the effectiveness for point cloud segmentation further.
激光雷达(LiDAR)语义分割空间注意力机制Transformer深度学习(DL)实例增强
LiDARsemantic segmentationspatial attention mechanismTransformerdeep learning (DL)instanceenhancement
Aksoy E E, Baci S and Cavdar S. 2020. SalsaNet: fast road and vehicle segmentation in LiDAR point clouds for autonomous driving//2020 IEEE Intelligent Vehicles Symposium (Ⅳ). Las Vegas, USA: IEEE: 926-932 [DOI: 10.1109/iv47402.2020.9304694http://dx.doi.org/10.1109/iv47402.2020.9304694]
Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C and Gall J. 2019. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul,Korea (South): IEEE: 9297-9307 [DOI: 10.1109/ICCV.2019.00939http://dx.doi.org/10.1109/ICCV.2019.00939]
Berman M, Triki A R and Blaschko M B. 2018. The Lovasz-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4413-4421 [DOI: 10.1109/cvpr.2018.00464http://dx.doi.org/10.1109/cvpr.2018.00464]
Caesar H, Bankiti V, Lang A H, Vora S, Liong V E, Xu Q, Krishnan A, Pan Y, Baldan G and Beijbom O. 2020. nuScenes: a multimodal dataset for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11618-11628 [DOI: 10.1109/cvpr42600.2020.01164http://dx.doi.org/10.1109/cvpr42600.2020.01164]
Charles R Q, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 652-660 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Charles R Q, Yi L, Su H and Guibas L J. 2017b. PointNet+ +: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5105-5114
Choy C, Gwak J and Savarese S. 2019. 4D spatio-temporal ConvNets: minkowski convolutional neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 3070-3079 [DOI: 10.1109/cvpr.2019.00319http://dx.doi.org/10.1109/cvpr.2019.00319]
Cortinhal T, Tzelepis G and Aksoy E E. 2020. SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds//The 15th International Symposium on Visual Computing. San Diego, USA: Springer: 207-222 [DOI: 10.1007/978-3-030-64559-5_16http://dx.doi.org/10.1007/978-3-030-64559-5_16]
Du J and Cai G R. 2021. Point cloud semantic segmentation method based on multi-feature fusion and residual optimization. Journal of Image and Graphics, 26(5): 1105-1116
杜静, 蔡国榕. 2021. 多特征融合与残差优化的点云语义分割方法. 中国图象图形学报, 26(5): 1105-1116 [DOI: 10.11834/jig.200374]
Gerdzhev M, Razani R, Taghavi E and Liu B B. 2021. TORNADO-Net: mulTiview tOtal vaRiatioN semAntic segmentation with Diamond inceptiOn module//Proceedings of 2021 IEEE International Conference on Robotics and Automation. Xi′an, China: IEEE: #9562041 [DOI: 10.1109/ICRA48506.2021.9562041http://dx.doi.org/10.1109/ICRA48506.2021.9562041]
Graham B, Engelcke M and van der Maaten L. 2018. 3D semantic segmentation with submanifold sparse convolutional networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9224-9232 [DOI: 10.1109/CVPR.2018.00961http://dx.doi.org/10.1109/CVPR.2018.00961]
Guo M H, Cai J X, Liu Z N, Mu T J, Martin R R and Hu S M. 2021. PCT: point cloud transformer. Computational Visual Media, 7(2): 187-199 [DOI: 10.1007/s41095-021-0229-5]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1063-6919 [DOI: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90]
Hu Q Y, Yang B, Xie L H, Rosa S, Guo Y L, Wang Z H, Trigoni N and Markham A. 2020. RandLA-Net: efficient semantic segmentation of large-scale point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11105-11114 [DOI: 10.1109/CVPR42600.2020.01112http://dx.doi.org/10.1109/CVPR42600.2020.01112]
Kochanov D, Nejadasl F K and Booij O. 2020. KPRNet: improving projection-based LiDAR semantic segmentation [EB/OL]. [2021-07-21].https://arxiv.org/pdf/2007.12668.pdfhttps://arxiv.org/pdf/2007.12668.pdf
Li G H, Yuan Y F, Ben X Y and Zhang J P. 2020. Spatiotemporal attention network for microexpression recognition. Journal of Image and Graphics, 25(11): 2380-2390
李国豪, 袁一帆, 贲晛烨, 张军平. 2020. 采用时空注意力机制的人脸微表情识别. 中国图象图形学报, 25(11): 2380-2390 [DOI: 10.11834/jig.200325]
Liu S, Huang S Y, Cheng H H, Shen J Y and Chen S Y. 2021. A deep residual network with spatial depthwise convolution for large-scale point cloud semantic segmentation. Journal of Image and Graphics, 26(12): 2848-2859
刘盛, 黄圣跃, 程豪豪, 沈家瑜, 陈胜勇. 2021. 结合空间深度卷积和残差的大尺度点云场景分割. 中国图象图形学报, 26(12): 2848-2859 [DOI: 10.11834/jig.200477]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin transformer: hierarchical vision transformer using shifted windows [EB/OL]. [2021-03-21].https://arxiv.org/pdf/2103.14030.pdfhttps://arxiv.org/pdf/2103.14030.pdf
Milioto A, Vizzo I, Behley J and Stachniss C. 2019. RangeNet + +: fast and accurate LiDAR semantic segmentation//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, China: IEEE: 4213-4220 [DOI: 10.1109/IROS40897.2019.8967762http://dx.doi.org/10.1109/IROS40897.2019.8967762]
Song W, Cai W Y, He S Q and Li W J. 2021. Dynamic graph convolution with spatial attention for point cloud classification and segmentation. Journal of Image and Graphics, 26(11): 2691-2702
宋巍, 蔡万源, 何盛琪, 李文俊. 2021. 结合动态图卷积和空间注意力的点云分类与分割. 中国图象图形学报, 26(11): 2691-2702 [DOI: 10.11834/jig.200550]
Tao S B, Liang C, Jiang T P, Yang Y J and Wang Y J. 2021. Sparse voxel pyramid neighborhood construction and classification of LiDAR point cloud. Journal of Image and Graphics, 26(11): 2703-2712
陶帅兵, 梁冲, 蒋腾平, 杨玉娇, 王永君. 2021. 激光点云的稀疏体素金字塔邻域构建与分类. 中国图象图形学报, 26(11): 2703-2712 [DOI: 10.11834/jig.200262]
Tatarchenko M, Park J, Koltun V and Zhou Q Y. 2018. Tangent convolutions for dense prediction in 3D//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3887-3896 [DOI: 10.1109/cvpr.2018.00409http://dx.doi.org/10.1109/cvpr.2018.00409]
Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F and Guibas L. 2019. KPConv: flexible and deformable convolution for point clouds//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6411-6420 [DOI: 10.1109/ICCV.2019.00651http://dx.doi.org/10.1109/ICCV.2019.00651]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6000-6010
Wang Y Y, Luo L K and Zhou Z G. 2019. Road scene segmentation based on KSW and FCNN. Journal of Image and Graphics, 24(4): 583-591
王云艳, 罗冷坤, 周志刚. 2019. 结合KSW和FCNN的道路场景分割. 中国图象图形学报, 24(4): 583-591 [DOI: 10.11834/jig.180467]
Xu C F, Wu B C, Wang Z N, Zhan W, Vajda P, Keutzer K and Tomizuka M. 2020. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 1-19 [DOI: 10.1007/978-3-030-58604-1_1http://dx.doi.org/10.1007/978-3-030-58604-1_1]
Yu S and Wang X L. 2021. Remote sensing building segmentation by CGAN with multilevel channel attention mechanism. Journal of Image and Graphics, 26(3): 686-699
余帅, 汪西莉. 2021. 含多级通道注意力机制的CGAN遥感图像建筑物分割. 中国图象图形学报, 26(3): 686-699 [DOI: 10.11834/jig.200059]
Zhang F H, Fang J, Wah B and Torr P. 2020a. Deep FusionNet for point cloud semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 644-663 [DOI: 10.1007/978-3-030-58586-0_38http://dx.doi.org/10.1007/978-3-030-58586-0_38]
Zhang Y, Zhou Z X, David P, Yue X Y, Xi Z R, Gong B Q and Foroosh H. 2020b. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9601-9610 [DOI: 10.1109/CVPR42600.2020.00962http://dx.doi.org/10.1109/CVPR42600.2020.00962]
Zheng Y, Lin C Y, Liao K, Zhao Y and Xue S. 2021. LiDAR point cloud segmentation through scene viewpoint offset. Journal of Image and Graphics, 26(10): 2514-2523
郑阳, 林春雨, 廖康, 赵耀, 薛松. 2021. 场景视点偏移的激光雷达点云分割. 中国图象图形学报, 26(10): 2514-2523 [DOI: 10.11834/jig.200424]
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4490-4499 [DOI: 10.1109/cvpr.2018.00472http://dx.doi.org/10.1109/cvpr.2018.00472]
Zhu X G, Zhou H, Wang T, Hong F Z, Li W, Ma Y X, Li H S, Yang R G and Lin D H. 2022. Cylindrical and asymmetrical 3D convolution networks for LiDAR-based perception. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 6807-6822 [DOI: 10.1109/tpami.2021.3098789]
相关作者
相关机构