大场景双视角点云特征融合语义分割方法
Double-view feature fusion network for LiDAR semantic segmentation
- 2024年29卷第1期 页码:205-217
纸质出版日期: 2024-01-16
DOI: 10.11834/jig.220943
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-01-16 ,
移动端阅览
孙刘杰, 曾腾飞, 樊景星, 王文举. 2024. 大场景双视角点云特征融合语义分割方法. 中国图象图形学报, 29(01):0205-0217
Sun Liujie, Zeng Tengfei, Fan Jingxing, Wang Wenju. 2024. Double-view feature fusion network for LiDAR semantic segmentation. Journal of Image and Graphics, 29(01):0205-0217
目的
2
点云语义分割在无人驾驶、城市场景建模等领域中具有重要意义,为了提升大场景条件下点云特征的提取效率,提出一种大场景双视角点云特征融合的语义分割方法(double-view feature fusion network for LiDAR semantic segmentation,DVFNet)。
方法
2
大场景双视角点云特征融合的语义分割方法由两个部分组成,分别为双视角点云特征融合模块和基于非对称卷积的点云特征整合模块。双视角点云特征融合模块将柱状体素特征与关键点全局特征相结合,减少降采样导致的特征损失;基于非对称卷积的点云特征整合模块将双视角点云特征使用非对称卷积进行处理,并使用多维度卷积与多尺度特征整合来实现局部特征优化。
结果
2
本文提出的大场景双视角点云特征融合语义分割方法,在SemanticKITTI大场景点云数据集上达到63.9%的准确率,分割精度在已开源的分割方法中处于领先地位。
结论
2
通过文中的双视角点云特征融合语义分割方法,能够实现大场景条件下点云数据的高精度语义分割。
Objective
2
Point cloud semantic segmentation, as the basic technology of 3D point cloud data target detection, point cloud classification, and other projects, is an important part of the current 3D computer vision. At the same time, point cloud segmentation technology is the key for the computer to understand scenes, and it has been widely used in many fields such as autonomous driving, robotics, and augmented reality. Point cloud semantic segmentation refers to the point-by-point classification operation of points in the point cloud scene, that is, to judge the category of each point in the point cloud and finally segment and integrate accordingly. Generally, point cloud semantic segmentation technology can be divided into two categories according to different application scenarios: small-scale point cloud semantic segmentation and large-scale point cloud semantic segmentation. Small-scale point cloud semantic segmentation only performs semantic segmentation operations on indoor point cloud scenes or small-scale point cloud scenes, whereas the large-scale point cloud semantic segmentation replaces the deployment environment of the algorithm with outdoor large-scale point cloud data. Classification and integration for point clouds are usually performed on driving scenes or urban scenes. Compared with the point cloud semantic segmentation of small scenes, the semantic segmentation of large-scale point clouds has a wider range of applications and is extensively used in driving scene understanding, urban scene reconstruction, and other fields. However, due to the large amount of data and the complexity of point cloud data, the task of semantic segmentation for point cloud in large scenes is more difficult. To improve the extraction quality of point features in a large-scale point cloud, a semantic segmentation method based on double-view feature fusion network for LiDAR semantic segmentation is proposed.
Method
2
Our method is composed of two parts, double-view feature fusion module and feature integration based on asymmetric convolution. In the down sampling stage, a double-view feature fusion module, which includes a double-view point cloud feature extraction module and a feature fusion block, is suggested. The double-view feature fusion module combines the cylindrical feature with the global feature of key points to reduce the feature loss caused by downsampling. The features in different views of the point cloud are combined by feature splicing in this module. Finally, the combined point cloud features are placed into the feature fusion block for feature dimensionality reduction and fusion. In the feature integration stage, a point cloud feature integration module is proposed based on asymmetric convolution, including asymmetric convolution and multiscale dimension-decomposition context modeling, achieving the enhancement and reconstruction of point cloud features by the operation of asymmetric point cloud feature processing and multi-scale context feature integration. The feature integration processes the double-view feature by asymmetric convolution and then uses multi-dimensional convolution and multiscale feature integration for feature optimization.
Result
2
In our experimental environment, our algorithm has the second-highest frequency weighted intersection over union accuracy rate and the highest mean intersection over union (mIoU) accuracy rate among recent algorithms. Our work focuses on the improvement of segmentation accuracy and achieves the highest segmentation accuracy in multiple categories. In vehicle categories such as cars and trucks, our method achieves a high segmentation accuracy. In categories such as bicycles, motorcycles, and pedestrians with small individuals and complex shapes, our method performs better than other methods. In buildings, railings, vegetation, and other categories that are at the edge of the point cloud scene and where the point cloud distribution is relatively sparse, the double-view feature fusion module in our method not only retains the geometric structure of the point cloud but also extracts the global features of the data, thereby achieving the high-precision segmentation of these categories. Our method achieves 63.9% mIoU on the SemanticKITTI dataset and leads the open-source segmentation methods. Compared with CA3DCNet, our method also achieves better segmentation results on the nuScenes dataset, and the mIoU accuracy is improved by 0.7%.
Conclusion
2
Our method achieves a high-precision semantic segmentation for a large-scale point cloud. A double-view feature fusion network is proposed for LiDAR semantic segmentation, which is suitable for the segmentation for a large-scale point cloud. Experimental results show the double-view feature fusion module can reduce the loss of edge information in a large-scale point cloud, thereby improving the segmentation accuracy of edge object in the scene. Experiments prove the feature integration module based on asymmetric convolution can effectively segment small-sized objects, such as pedestrians, bicycles, motorcycles, and other categories. Our method is compared with a variety of semantic segmentation methods for large-scale point cloud. In terms of accuracy, our method performs better and achieves an average segmentation accuracy of 63.9% on the SemanticKITTI dataset.
深度学习语义分割点云柱状体素上下文信息
deep learningsemantic segmentationpoint cloudcylindrical voxelcontext information
Alnaggar Y A, Afifi M, Amer K and ElHelw M. 2021. Multi projection fusion for real-time semantic segmentation of 3D LiDAR point clouds//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 1800-1809 [DOI: 10.1109/WACV48630.2021.00184http://dx.doi.org/10.1109/WACV48630.2021.00184]
Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C and Gall J. 2019. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 9296-9306 [DOI: 10.1109/ICCV.2019.00939http://dx.doi.org/10.1109/ICCV.2019.00939]
Caesar H, Bankiti V, Lang A H, Vora S, Liong V E, Xu Q, Krishnan A, Pan Y, Baldan G and Beijbom O. 2020. nuScenes: a multimodal dataset for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11618-11628 [DOI: 10.1109/CVPR42600.2020.01164http://dx.doi.org/10.1109/CVPR42600.2020.01164]
Chen W L, Zhu X G, Sun R Q, He J J, Li R Y, Shen X Y and Yu B. 2020. Tensor low-rank reconstruction for semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 52-69 [DOI: 10.1007/978-3-030-58520-4_4].
Cheng R, Razani R, Taghavi E, Li E X and Liu B B. 2021. (AF)2-S3Net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 12542-12551 [DOI: 10.1109/CVPR46437.2021.01236http://dx.doi.org/10.1109/CVPR46437.2021.01236]
Choy C, Gwak J Y and Savarese S. 2019. 4D spatio-temporal convnets: Minkowski convolutional neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 3070-3079 [DOI: 10.1109/CVPR.2019.00319http://dx.doi.org/10.1109/CVPR.2019.00319]
Gan L, Zhang R, Grizzle J W, Eustice R M and Ghaffari M. 2020. Bayesian spatial kernel smoothing for scalable dense semantic mapping. IEEE Robotics and Automation Letters, 5(2): 790-797 [DOI: 10.1109/LRA.2020.2965390http://dx.doi.org/10.1109/LRA.2020.2965390]
Gerdzhev M, Razani R, Taghavi E and Liu B B. 2021. TORNADO-Net: multiview total variation semantic segmentation with diamond inception module//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi’an, China: IEEE: 9543-9549 [DOI: 10.1109/ICRA48506.2021.9562041http://dx.doi.org/10.1109/ICRA48506.2021.9562041]
Hu Q Y, Yang B, Xie L H, Rosa S, Guo Y L, Wang Z H, Trigoni N and Markham A. 2020. RandLA-Net: efficient semantic segmentation of large-scale point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11105-11114 [DOI: 10.1109/CVPR42600.2020.01112http://dx.doi.org/10.1109/CVPR42600.2020.01112]
Kochanov D, Nejadasl F K and Booij O. 2020. KPRNet: improving projection-based LiDAR semantic segmentation [EB/OL]. [2022-12-09]. https://arxiv.org/pdf/2007.12668.pdfhttps://arxiv.org/pdf/2007.12668.pdf
Liong V E, Nguyen T N T, Widjaja S, Sharma D and Zhuang J C. 2020. Amvnet: assertion-based multi-view fusion network for LiDAR semantic segmentation [EB/OL]. [2022-12-09]. https://arxiv.org/pdf/2012.04934.pdfhttps://arxiv.org/pdf/2012.04934.pdf
Milioto A, Vizzo I, Behley J and Stachniss C. 2019. RangeNet++: fast and accurate LiDAR semantic segmentation//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, China: IEEE: 4213-4220 [DOI: 10.1109/IROS40897.2019.8967762http://dx.doi.org/10.1109/IROS40897.2019.8967762]
Qi C R, Su H, Mo K C and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 77-85 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 5105-5114
Quan T M, Hildebrand D G C and Jeong W K. 2021. FusionNet: a deep fully residual convolutional neural network for image segmentation in connectomics. Frontiers in Computer Science, 3: #613981 [DOI: 10.3389/fcomp.2021.613981http://dx.doi.org/10.3389/fcomp.2021.613981]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Tang H T, Liu Z J, Zhao S Y, Lin Y J, Lin J, Wang H R and Han S. 2020. Searching efficient 3D architectures with sparse point-voxel convolution//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 685-702 [DOI: 10.1007/978-3-030-58604-1_41http://dx.doi.org/10.1007/978-3-030-58604-1_41]
Wu B C, Wan A, Yue X Y and Keutzer K. 2018. SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud//Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA). Bisbane, Australia: IEEE: 1887-1893 [DOI: 10.1109/ICRA.2018.8462926http://dx.doi.org/10.1109/ICRA.2018.8462926]
Wu B C, Zhou X Y, Zhao S C, Yue X Y and Keutzer K. 2019. SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud//Proceedings of 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE: 4376-4382 [DOI: 10.1109/ICRA.2019.8793495http://dx.doi.org/10.1109/ICRA.2019.8793495]
Xu C F, Wu B C, Wang Z N, Zhan W, Vajda P, Keutzer K and Tomizuka M. 2020. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 1-19 [DOI: 10.1007/978-3-030-58604-1_1http://dx.doi.org/10.1007/978-3-030-58604-1_1]
Xu J Y, Zhang R X, Dou J, Zhu Y S, Sun J and Pu S L. 2021. RPVNet: a deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 16004-16013 [DOI: 10.1109/ICCV48922.2021.01572http://dx.doi.org/10.1109/ICCV48922.2021.01572]
Yan X, Zheng C D, Li Z, Wang S and Cui S G. 2020. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 5588-5597 [DOI: 10.1109/CVPR42600.2020.00563http://dx.doi.org/10.1109/CVPR42600.2020.00563]
Zhang Y, Zhou Z X, David P, Yue X Y, Xi Z R, Gong B Q and Foroosh H. 2020. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 9598-9607 [DOI: 10.1109/CVPR42600.2020.00962http://dx.doi.org/10.1109/CVPR42600.2020.00962]
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE: 4490-4499 [DOI: 10.1109/CVPR.2018.00472http://dx.doi.org/10.1109/CVPR.2018.00472]
Zhu X G, Zhou H, Wang T, Hong F Z, Ma Y X, Li W, Li H S and Lin D H. 2021. Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 9934-9943 [DOI: 10.1109/CVPR46437.2021.00981http://dx.doi.org/10.1109/CVPR46437.2021.00981]
相关作者
相关机构