多特征融合与残差优化的点云语义分割方法

杜静; 蔡国榕

doi:10.11834/jig.200374

激光雷达大会2020 | 浏览量 : 0 下载量: 0 CSCD: 6

PDF
导出
分享
收藏
专辑

多特征融合与残差优化的点云语义分割方法
Point cloud semantic segmentation method based on multi-feature fusion and residual optimization
2021年26卷第5期页码：1105-1116
纸质出版日期： 2021-05-16 ，

录用日期： 2020-08-19
DOI： 10.11834/jig.200374
稿件说明：

移动端阅览

杜静, 蔡国榕. 多特征融合与残差优化的点云语义分割方法[J]. 中国图象图形学报, 2021,26(5):1105-1116.

Jing Du, Guorong Cai. Point cloud semantic segmentation method based on multi-feature fusion and residual optimization[J]. Journal of Image and Graphics, 2021,26(5):1105-1116.
杜静, 蔡国榕. 多特征融合与残差优化的点云语义分割方法[J]. 中国图象图形学报, 2021,26(5):1105-1116. DOI： 10.11834/jig.200374.

Jing Du, Guorong Cai. Point cloud semantic segmentation method based on multi-feature fusion and residual optimization[J]. Journal of Image and Graphics, 2021,26(5):1105-1116. DOI： 10.11834/jig.200374.

摘要

目的

当前的大场景3维点云语义分割方法一般是将大规模点云切成点云块再进行处理。然而在实际计算过程中，切割边界的几何特征容易被破坏，使得分割结果呈现明显的边界现象。因此，迫切需要以原始点云作为输入的高效深度学习网络模型，用于点云的语义分割。

方法

为了解决该问题，提出基于多特征融合与残差优化的点云语义分割方法。网络通过一个多特征提取模块来提取每个点的几何结构特征以及语义特征，通过对特征的加权获取特征集合。在此基础上，引入注意力机制优化特征集合，构建特征聚合模块，聚合点云中最具辨别力的特征。最后在特征聚合模块中添加残差块，优化网络训练。最终网络的输出是每个点在数据集中各个类别的置信度。

结果

本文提出的残差网络模型在S3DIS（Stanford Large-scale 3D Indoor Spaces Dataset）与户外场景点云分割数据集Semantic3D等2个数据集上与当前的主流算法进行了分割精度的对比。在S3DIS数据集中，本文算法在全局准确率以及平均准确率上均取得了较高精度，分别为87.2%，81.7%。在Semantic3D数据集上，本文算法在全局准确率和平均交并比上均取得了较高精度，分别为93.5%，74.0%，比GACNet（graph attention convolution network）分别高1.6%，3.2%。

结论

实验结果验证了本文提出的残差优化网络在大规模点云语义分割的应用中，可以缓解深层次特征提取过程中梯度消失和网络过拟合现象并保持良好的分割性能。

Abstract

Objective

The semantic segmentation of 3D point cloud is to take the point cloud as input and output the semantic label of each point according to the category. However

the existing semantic segmentation methods based on 3D point cloud are mainly limited to processing on small-scale point cloud blocks. When a large-scale point cloud is cut into small point clouds

the geometric features of the cut boundary can be easily destroyed

which results in obvious boundary phenomena. In addition

traditional semantic segmentation deep networks have difficult in meeting the computational efficiency requirements of large-scale data. Therefore

an efficient deep learning network model that directly takes the original point cloud as input for point cloud semantic segmentation is urgently needed. However

most networks still choose to input small point cloud blocks for training

mainly because there are many difficulties in directly handling point clouds in large scenes. The first is that the spatial size and number of points of the 3D point cloud data collected through sensor scanning are uncertain. This requires that the network does not have a specific number of input points and is not sensitive to the number of points. Second

the geometric structure of large scenes is more complicated than that of small-scale point cloud blocks

which increases the difficulty of segmentation. The third is that the direct processing of point clouds in large scenes will bring a lot of calculations

which poses a huge challenge to existing graphics processing unit(GPU) memory. The main obstacle to be overcome by our framework is to directly deal with large-scale 3D point clouds. For different point cloud spatial structures and points

they can be directly input into the network for training under the condition of ensuring time complexity and space complexity.

Method

In this study

a residual optimization network based on large-scale point cloud semantic segmentation is proposed. First

we choose random sampling as the down-sampling strategy

and its calculation time is independent of the number of points. Each coding layer has a random sampling module. This design makes it possible to gradually increase the dimension of each point feature while gradually reducing the size of the point cloud. The input to the network is the entire large-scale 3D point cloud scene. At the same time

a local feature extraction module is designed to capture the neighbor feature

geometric feature

and semantic feature of each point. The final feature set is obtained by weighted summation of the three types of features. The network introduces an attention mechanism to optimize the feature set

thereby further building a feature aggregation module to aggregate the most discriminative features in the point cloud. Finally

the residual block is added in the feature aggregation module to optimize the training of the network. The network adopts the encoder-decoder structure to realize the construction of the network framework. Different from the traditional encoder-decoder structure

this study adjusts the internal structure of each layer of the encoder for the special application scenario of large scene point cloud

including the down-sampling ratio of point cloud and the dimension of feature output. The output of the network is the score of each point in each category in the dataset. In summary

the network first passes the input point cloud through the multilayer perceptron(MLP) layer to extract the features of center point. Then

five encoding and five decoding layers are used to learn the features of each point. Finally

three fully connected layers are used to predict the semantic label of each point.

Result

The experiment was compared with the latest methods on two datasets

including Stanford large-scale 3D Indoor Spaces Dataset(S3DIS) dataset and Semantic3D dataset. The S3DIS dataset contains 12 semantic elements

which is more fine-grained and challenging than multiple semantic indoor segmentation datasets. Four criteria such as intersection over union (IoU)

mean IoU (mIoU)

mean accuracy (mAcc)

and overall accuracy (OA) were evaluated. We set the

value of the

-nearest neighbor algorithm to 16. The batch number for training is 8

and the batch number for evaluation is 24. The number of training and verification steps for each epoch is 500 and 100

respectively. The maximum epoch during testing is 150. The experiment was conducted using a NVIDIA GTX 1080 Ti GPU. On the S3DIS dataset

our algorithm achieves the best accuracy in OA and mAcc. Compared with the super point graphs (SPG) network that also takes the entire large scene as input

the proposed algorithm improves the OA

mAcc

and mIoU by 1.7%

8.7%

and 3.8%

respectively. The Semantic3D dataset contains eight semantic classes

covering a wide range of urban outdoor scenes: churches

streets

railroad tracks

squares

villages

football fields

castles

etc. There are about 4 billion manually marked points and various natural and artificial scenes to prevent overfitting of the classifier. We set the batch number for training to 4 and the batch number for evaluation to 16. Other settings are the same as S3DIS. On the Semantic3D dataset

the mIoU value is increased by 3.2%

and the OA is increased by 1.6% compared with the latest algorithm GACNet. Our network also achieved optimal accuracy in multiple categories such as high vegetation

buildings

and remaining hard landscapes. This verifies the outstanding performance of the residual optimization network proposed in large-scale point cloud semantic segmentation in this study

which solves the problem of gradient disappearance and network degradation in the process of feature extraction.

Conclusion

We propose a new semantic segmentation framework that introduces the residual network into the semantic segmentation of large-scale point clouds

thereby increasing the network layers and extracting more distinguishing features. Our network shows excellent performance in multiple datasets

making the semantic segmentation results more accurate.

关键词

计算机视觉3维点云大场景语义分割多特征融合残差网络

Keywords

computer visionthree-dimensional point cloudlarge scenesemantic segmentationmulti-feature fusionresidual network

references

Armeni I, Sener O, Zamir A R, Jiang H, Brilakis I, Fischer M and Savarese S. 2016. 3D semantic parsing of large-scale indoor spaces//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1534-1543[DOI: 10.1109/CVPR.2016.170http://dx.doi.org/10.1109/CVPR.2016.170]

Biasutti P, Lepetit V, Aujol J F, Brédif M and Bugeau A. 2019. LU-Net: an efficient network for 3D LiDAR point cloud semantic segmentation based on end-to-end-learned 3D features and U-Net//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE: 942-950[DOI: 10.1109/ICCVW.2019.00123http://dx.doi.org/10.1109/ICCVW.2019.00123]

Boulch A, Le Saux B and Audebert N. 2017. Unstructured point cloud semantic labeling using deep segmentation networks. In: Eurographics Workshop on 3D Object Retrieval. [s. l.]: The Eurographics Association: 17-24[DOI: 10.2312/3dor.20171047http://dx.doi.org/10.2312/3dor.20171047]

Engelmann F, Kontogianni T, Hermans A and Leibe B. 2017. Exploring spatial context for 3D semantic segmentation of point clouds//Proceedings of 2019 IEEE International Conference on Computer Vision. Venice, Italy: TEEE: 716-724[DOI: 10.1109/ICCVW.2017.90http://dx.doi.org/10.1109/ICCVW.2017.90]

Hackel T, Savinov N, Ladicky L, Wegner J D, Schindler K and Pollefeys M. 2017. Semantic3D. Net: a new large-scale point cloud classification benchmark//ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Hannover, Germany: ISPRS: 91-98[DOI: 10.5194/isprs-annals-Ⅳ-1-W1-91-2017http://dx.doi.org/10.5194/isprs-annals-Ⅳ-1-W1-91-2017]

Hu Q Y, Yang B, Xie L H, Rosa S, Guo Y L, Wang Z H, Trigoni N and Markham A. 2020. RandLA-Net: efficient semantic segmentation of large-scale point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11108-11117[DOI: 10.1109/CVPR42600.2020.01112http://dx.doi.org/10.1109/CVPR42600.2020.01112]

Huang Q G, Wang W Y and Neumann U. 2018. Recurrent slice networks for 3D segmentation of point clouds//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2626-2635[DOI: 10.1109/CVPR.2018.00278http://dx.doi.org/10.1109/CVPR.2018.00278]

Kong X, Zhai G Y, Zhong B Q, and Liu Y. 2019. PASS3D: precise and accelerated semantic segmentation for 3D point cloud//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE: 3467-3473[DOI: 10.1109/IROS40897.2019.8968296http://dx.doi.org/10.1109/IROS40897.2019.8968296]

Landrieu L, Simonovsky M. 2018. Large-scale point cloud semantic segmentation with superpoint graphs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4558-4567[DOI: 10.1109/CVPR.2018.00479http://dx.doi.org/10.1109/CVPR.2018.00479]

Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018. PointCNN: Convolution onX-transformed points//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc. : 820-830

Liu F Y, Li S P, Zhang L Q, Zhou C G, Ye R T, Wang Y B and Lu J W. 2017. 3DCNN-DQN-RNN: a deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5679-5688[DOI: 10.1109/ICCV.2017.605http://dx.doi.org/10.1109/ICCV.2017.605]

Liu Y C, Fan B, Xiang S M and Pan C H. 2019. Relation-shape convolutional neural network for point cloud analysis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: IEEE: 8887-8896[DOI: 10.1109/CVPR.2019.00910http://dx.doi.org/10.1109/CVPR.2019.00910]

Milioto A, Vizzo I, Behley J and Stachniss C. 2019. RangeNet++: fast and accurate LiDAR semantic segmentation//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE: 4213-4220[DOI: 10.1109/IROS40897.2019.8967762http://dx.doi.org/10.1109/IROS40897.2019.8967762]

Qi C R, Su H, Mo K and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]

Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: Deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5099-5108

Qing C, Yu J, Xiao C B and Duan J. 2020. Deep convolutional neural network for semantic image segmentation. Journal of Image Graphics, 25(6): 1069-1090

青晨, 禹晶, 肖创柏, 段娟. 2020. 深度卷积神经网络图像语义分割研究进展. 中国图象图形学报, 25(6): 1069-1090 [DOI:10.11834/jig.190355]

Rethage D, Wald J, Sturm J, Navab N and Tombari F. 2018. Fully-convolutional point networks for large-scale point clouds//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 625-640[DOI: 10.1007/978-3-030-01225-0_37http://dx.doi.org/10.1007/978-3-030-01225-0_37]

Tang Y J. 2019. 3D Point Clouds Semantic Segmentation in Large Scale Outdoor Scenes. Dalian: Dalian University of Technology

汤怡君. 2019. 大范围室外场景三维点云语义分割. 大连: 大连理工大学

Tchapmi L, Choy C, Armeni I, Gwak J Y and Savarese S. 2017. SEGCloud: semantic segmentation of 3D point clouds//Proceedings of 2017 International Conference on 3D Vision. Qingdao, China: IEEE: 537-547[DOI: 10.1109/3DV.2017.00067http://dx.doi.org/10.1109/3DV.2017.00067]

Thomas H, Goulette F, Deschaud J E, Marcotegui B and LeGall Y. 2018. Semantic classification of 3D point clouds with multiscale spherical neighborhoods//Proceedings of 2018 International Conference on 3D Vision. Verona, Italy: IEEE: 390-398[DOI: 10.1109/3DV.2018.00052http://dx.doi.org/10.1109/3DV.2018.00052]

Thomas H, Qi C R, Jean-Emmanuel D, Marcotegui B, Goulette F and Guibas L J. 2019. KPConv: flexible and deformable convolution for point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision. Seoul, South Korea: IEEE: 6410-6419[DOI: 10.1109/ICCV.2019.00651http://dx.doi.org/10.1109/ICCV.2019.00651]

Wang F. 2019. Real-time Semantic Segmentation of Laser Point Clouds in Large-scale Outdoor Scenes. Dalian: Dalian University of Technology

王非. 2019. 面向大范围室外场景的激光点云实时语义分割. 大连: 大连理工大学

Wang L, Huang Y C, Hou Y L, Zhang S M and Shan J. 2019a. Graph attention convolution for point cloud semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: IEEE: 10288-10297[DOI: 10.1109/CVPR.2019.01054http://dx.doi.org/10.1109/CVPR.2019.01054]

Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M and Solomon J M. 2019b. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): 146[DOI:10.1145/3326362]

Wu W X, Qi Z G and Li F X. 2019. PointConv: deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9621-9630[DOI: 10.1109/CVPR.2019.00985http://dx.doi.org/10.1109/CVPR.2019.00985]

Xu Q G, Sun X D, Wu C Y, Wang P Q and Neumann U. 2020. Grid-GCN for fast and scalable point cloud learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5660-5669[DOI: 10.1109/CVPR42600.2020.00570http://dx.doi.org/10.1109/CVPR42600.2020.00570]

Ye X Q, Li J M, Huang H X, Du L and Zhang X L. 2018. 3D recurrent neural networks with context fusion for point cloud semantic segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 415-430[DOI: 10.1007/978-3-030-01234-2_25http://dx.doi.org/10.1007/978-3-030-01234-2_25]

Zhang Z Y, Hua B S and Yeung S K. 2019. ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE: 1607-1616[DOI: 10.1109/ICCV.2019.00169http://dx.doi.org/10.1109/ICCV.2019.00169]

Zhao H S, Jiang L, Fu C W and Jia J Y. 2019. PointWeb: enhancing local neighborhood features for point cloud processing//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5560-5568[DOI: 10.1109/CVPR.2019.00571http://dx.doi.org/10.1109/CVPR.2019.00571]

Zhong Z B, Zhang C, Liu Y H and Wu Y. 2019. VIASEG: visual information assisted lightweight point cloud segmentation//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China:IEEE: 1500-1504[DOI: 10.1109/ICIP.2019.8803061http://dx.doi.org/10.1109/ICIP.2019.8803061]

文章被引用时，请邮件提醒。

提交

RTDNet：面向高分辨率卫星影像的赤潮探测网络

高分辨率遥感影像的边缘损失增强地物分割

多尺度特征融合工件目标语义分割

结合双边交叉增强与自注意力补偿的点云语义分割

面向无人机海岸带生态系统监测的语义分割基准数据集