基于递归切片网络的三维点云语义分割与实例分割

刘苏毅; 迟剑宁; 吴成东; 徐方

doi:10.11834/jig.220154

图像理解和计算机视觉 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

基于递归切片网络的三维点云语义分割与实例分割
Recurrent slice networks-based 3D point cloud-relevant integrated segmentation of semantic and instances
2023年28卷第7期页码：2135-2150
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.220154
稿件说明：

移动端阅览

刘苏毅，迟剑宁，吴成东，徐方. 2023. 基于递归切片网络的三维点云语义分割与实例分割. 中国图象图形学报， 28(07):2135-2150

Liu Suyi， Chi Jianning， Wu Chengdong， Xu Fang. 2023. Recurrent slice networks-based 3D point cloud-relevant integrated segmentation of semantic and instances. Journal of Image and Graphics， 28(07):2135-2150
刘苏毅，迟剑宁，吴成东，徐方. 2023. 基于递归切片网络的三维点云语义分割与实例分割. 中国图象图形学报， 28(07):2135-2150 DOI： 10.11834/jig.220154.

Liu Suyi， Chi Jianning， Wu Chengdong， Xu Fang. 2023. Recurrent slice networks-based 3D point cloud-relevant integrated segmentation of semantic and instances. Journal of Image and Graphics， 28(07):2135-2150 DOI： 10.11834/jig.220154.

摘要

目的

针对三维点云语义与实例分割特征点提取精度不高、实例分割精度极度依赖语义分割的性能、在密集场景或小单元分割目标中出现语义类别错分以及实例边缘模糊等问题，提出了基于递归切片网络的三维点云语义分割与实例分割网络。

方法

网络对输入点云进行切片，并将无序点云映射到有序序列上；利用双向长短期记忆网络（bidirectional long short-term memory，BiLSTM）得到带有局部特征和全局特征的编码特征矩阵；将编码特征矩阵解码为两个并行分支，进行多尺度的特征融合；对语义与实例特征进行融合，得到并行的语义与实例分割网络。

结果

在斯坦福大尺度3D室内场景数据集（Stanford large-scale 3D indoor spaces dataset，S3DIS）以及ShapeNet数据集上，与目前最新点云分割方法进行实验对比。实验结果表明，在S3DIS数据集上，本文算法的语义分割的平均交并比指标为73%，较动态核卷积方法（position adaptive convolution，PAConv）提高7.4%，并且在13个类别中的8个类别取得最好成绩；实例分割中平均实例覆盖率指标为67.7%。在ShapeNet数据集上，语义分割的平均交并比为89.2%，较PAConv算法提高4.6%，较快速、鲁棒的点云语义与实例分割方法（fast and robust joint semantic-instance segmentation，3DCFS）提高1.6%。

结论

本文提出的语义与实例分割融合网络，综合了语义分割与实例分割的优点，有效提高语义分割与实例分割精度。

Abstract

Objective

The growth of GPU-based computing power is beneficial for 3D spatial-contexts of computer vision domain. The 3D point cloud-based segmentation technique has been facilitating such sub-research contexts like robot and manipulator. The 3D point cloud segmentation is mainly categorized into two aspects of semantic segmentation and instance segmentation， and both of them can be focused on detecting minimum unit set-represented specific information areas in the scene. The following sampled scene point cloud can be parsed into points groups as well， in which each group is recognized as a separate instance or class of objects. The integration of two methods are challenged to be mutual-benefited although each optimization of semantic and instance segmentation task can be achieved. Existing challenges is still in related to lower accuracy via 3D point cloud features extraction. Incorrect instance segmentation prediction will distort the effects of semantic segmentation and classification because accuracy of instance segmentation is highly cohesive to the performance of semantic segmentation， such as semantic classification error， instance edge fuzzy and other related problems. We develop a recurrent slice network for the integrated instances and semantics segmentation in the context of 3D point cloud.

Method

The backbone network consists of two networks in related to an improved recursive slice feature extraction and an integrated feature. First， its slice-pooling layer of recursive slice feature-extracted network is oriented to slice the input point cloud for each spatial direction of three， and the maximum pooling method can balance the disordered point cloud sequence. Second， the bidirectional recurrent neural network （RNN） is ineffective derived of such non-updated prior input information like insufficient learning ability， gradient disappearance and other related problems. To obtain local and global features-encoded matrix， the bidirectional long short term （BiLSTM） network is used to exchange local information of different slice. Third， the extracted features can be decoded into two kind of paralleling branches for semantic segmentation and instance segmentation. The multiple receptive fields-based feature fusion can melt each branch before semantic and instance feature are fused together. To get information-semantic instance segmentation model， semantic perceptive information is leaked out from the high-dimensional semantic features， and it can be combined with the instance features. To realize the semantic segmentation model of instance embedding， the instance-embed k-nearest neighbor （KNN） clustering method is facilitated to sort out a fixed number of adjacent points for each point in the instance clustering space. The points of the same class are correlated and the points of different classes are discrete. Meanwhile， super-parameters can filter some outliers to preserve the generalization performance of the model.

Result

to verify the performance of the point cloud segmentation， two public datasets of Stanford 3D indoor semantics dataset （S3DIS）， and the ShapeNet dataset is involved in for comparative analysis. This model analysis is in comparison with other related state-of-the-art saliency models， including such segmentation approaches of their semantic， instance and the joint contexts. For the 6-fold cross validation experiment on S3DIS dataset， the results show that semantic segmentation accuracy of the proposed algorithm can be reached to 73% of mean intersection over union （mIOU）， 82.3% of mean accuracy （mAcc） and 89.3% of overall accuracy （oAcc）. It is 4.4%， 10.2% and 1.9% higher than the position adaptive convolution （PAConv） algorithm； the m-Cov （mean instance coverage） and mean instance weighted coverage （mw-Cov） of the instance segmentation can be reached 64.1% and 65.3% texting on area 5， which is 0.6% and 0.7% higher than PointGroup algorithms. Furthermore， for the semantic segmentation experiment on the S3DIS dataset， our algorithm has achieved its ability for its 8/13 categories. For the ShapeNet dataset， the semantic segmentation accuracy of the proposed algorithm can be achieved 89.2% of mIOU， higher than PAConv algorithm 4.6%.

Conclusion

The problems of semantic segmentation and instance segmentation in 3D point cloud can be focused on， and a feature slice network-based fusion algorithm of semantic segmentation and instance segmentation is developed as well. To get instance-embed semantic segmentation， instance features are melted into semantic branches， and semantic features can be conveyed to instance segmentation channel. The proposed algorithm demonstrates that the integration of semantic segmentation and instance segmentation is in comparison with other related point cloud segmentation algorithms in S3DIS and ShapeNet datasets.

关键词

三维点云语义分割实例分割递归切片网络（RSNet）语义特征实例特征特征融合

Keywords

3D point cloudsemantic segmentationinstance segmentationrecurrent slice network （RSNet）semantic featureinstance featurefeature fusion

references

Armeni I， Sener O， Zamir A R， Jiang H， Brilakis I， Fischer M and Savarese S. 2016. 3D semantic parsing of large-scale indoor spaces//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1534-1543 ［DOI： 10.1109/CVPR.2016.170http://dx.doi.org/10.1109/CVPR.2016.170］

Boulch A， Guerry J， Le Saux B and Audebert N. 2018. SnapNet： 3D point cloud semantic labeling with 2D deep segmentation networks. Computers and Graphics， 71： 189-198 ［DOI： 10.1016/j.cag.2017.11.010http://dx.doi.org/10.1016/j.cag.2017.11.010］

Comaniciu D and Meer P. 2002. Mean shift： a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence， 24（5）： 603-619 ［DOI： 10.1109/34.1000236http://dx.doi.org/10.1109/34.1000236］

Du J and Cai G R. 2021. Point cloud semantic segmentation method based on multi-feature fusion and residual optimization. Journal of Image and Graphics， 26（5）： 1105-1116

杜静，蔡国榕. 2021. 多特征融合与残差优化的点云语义分割方法. 中国图象图形学报， 26（5）： 1105-1116 ［DOI： 10.11834/jig.200374http://dx.doi.org/10.11834/jig.200374］

Du L， Tan J G， Xue X Y， Chen L L， Wen H K， Feng J F， Li J M and Zhang X L. 2020. 3DCFS： fast and robust joint 3D semantic-instance segmentation via coupled feature selection//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris， France： IEEE： 6868-6875 ［DOI： 10.1109/ICRA40945.2020.9197242http://dx.doi.org/10.1109/ICRA40945.2020.9197242］

Engelmann F， Bokeloh M， Fathi A， Leibe B and Nießner M. 2020. 3D-MPA： multi-proposal aggregation for 3D semantic instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 9028-9037 ［DOI： 10.1109/CVPR42600.2020.00905http://dx.doi.org/10.1109/CVPR42600.2020.00905］

Fang G R. 2021. A review of three-dimensional point cloud segmention. Metrology Measurement Technique， 48（7）： 52-55

方国润. 2021. 3维点云分割研究综述. 计量与测试技术， 48（7）： 52-55 ［DOI： 10.15988/j.cnki.1004-6941.2021.7.016http://dx.doi.org/10.15988/j.cnki.1004-6941.2021.7.016］

He T， Gong D， Tian Z and Shen C H. 2020. Learning and memorizing representative prototypes for 3D point cloud semantic and instance segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 564-580 ［DOI： 10.1007/978-3-030-58523-5_33http://dx.doi.org/10.1007/978-3-030-58523-5_33］

He T， Shen C H and van den Hengel A. 2021. DyCo3D： robust instance segmentation of 3D point clouds through dynamic convolution//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 354-363 ［DOI： 10.1109/CVPR46437.2021.00042http://dx.doi.org/10.1109/CVPR46437.2021.00042］

Hou J， Dai A and Nießner M. 2019. 3D-SIS： 3D semantic instance segmentation of RGB-D scans//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4416-4425 ［DOI： 10.1109/CVPR.2019.00455http://dx.doi.org/10.1109/CVPR.2019.00455］

Hu Q Y， Yang B， Xie L H， Rosa S， Guo Y L， Wang Z H， Trigoni N and Markham A. 2020. RandLA-Net： efficient semantic segmentation of large-scale point clouds//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11105-11114 ［DOI： 10.1109/CVPR42600.2020.01112http://dx.doi.org/10.1109/CVPR42600.2020.01112］

Huang Q G， Wang W Y and Neumann U. 2018. Recurrent slice networks for 3D segmentation of point clouds//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2626-2635 ［DOI： 10.1109/CVPR.2018.00278http://dx.doi.org/10.1109/CVPR.2018.00278］

Huang Z H， Xu W and Yu K. 2015. Bidirectional LSTM-CRF models for sequence tagging ［EB/OL］. ［2022-03-14］. https://arxiv.org/pdf/1508.01991.pdfhttps://arxiv.org/pdf/1508.01991.pdf

Jiang L， Zhao H S， Shi S S， Liu S， Fu C W and Jia J Y. 2020. PointGroup： dual-set point grouping for 3D instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 4866-4875 ［DOI： 10.1109/CVPR42600.2020.00492http://dx.doi.org/10.1109/CVPR42600.2020.00492］

Jing Z W， Guan H Y， Zang Y F， Ni H， Li D L and Yu Y T. 2021. Survey of point cloud semantic segmentation based on deep learning. Journal of Frontiers of Computer Science and Technology， 15（1）： 1-26

景庄伟，管海燕，臧玉府，倪欢，李迪龙，于永涛. 2021. 基于深度学习的点云语义分割研究综述. 计算机科学与探索， 15（1）： 1-26 ［DOI： 10.3778/j.issn.1673-9418.2006025http://dx.doi.org/10.3778/j.issn.1673-9418.2006025］

Kamnitsas K， Ledig C， Newcombe V F J， Simpson J P， Kane A D， Menon D K， Rueckert D and Glocker B. 2017. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis， 36： 61-78 ［DOI： 10.1016/j.media.2016.10.004http://dx.doi.org/10.1016/j.media.2016.10.004］

Li X X. 2021. Research on 3D Point Cloud Semantic Segmentation Technology Based on Deep Learning. Chengdu： University of Electronic Science and Technology of China

李晓溪. 2021. 基于深度学习的3维点云语义分割技术研究. 成都：电子科技大学

Liu S， Jia J Y， Fidler S and Urtasun R. 2017. SGN： sequential grouping networks for instance segmentation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 3516-3524 ［DOI： 10.1109/ICCV.2017.378http://dx.doi.org/10.1109/ICCV.2017.378］

Long X X， Cheng X J， Zhu H， Zhang P J， Liu H M， Li J， Zheng L T， Hu Q Y， Liu H， Cao X， Yang R G， Wu Y H， Zhang G F， Liu Y B， Xu K， Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics， 26（6）： 1389-1428

龙霄潇，程新景，朱昊，张朋举，刘浩敏，李俊，郑林涛，胡庆拥，刘浩，曹汛，杨睿刚，吴毅红，章国锋，刘烨斌，徐凯，郭裕兰，陈宝权. 2021. 3维视觉前沿进展. 中国图象图形学报， 26（6）： 1389-1428 ［DOI： 10.11834/jig.210043http://dx.doi.org/10.11834/jig.210043］

Maturana D and Scherer S. 2015. VoxNet： a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg， Germany： IEEE： 922-928 ［DOI： 10.1109/IROS.2015.7353481http://dx.doi.org/10.1109/IROS.2015.7353481］

Pham Q H， Nguyen T， Hua B S， Roig G and Yeung S K. 2019. JSIS3D： joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 8819-8828 ［DOI： 10.1109/CVPR.2019.00903http://dx.doi.org/10.1109/CVPR.2019.00903］

Qi C R， Su H， Mo K C and Guibas L J. 2017a. PointNet： deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 77-85 ［DOI： 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16］

Qi C R， Yi L， Su H and Guibas L J. 2017b. PointNet++： deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 5105-5114

Qiu S， Anwar S and Barnes N. 2021. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 1757-1767 ［DOI： 10.1109/CVPR46437.2021.00180http://dx.doi.org/10.1109/CVPR46437.2021.00180］

Ren M Y and Zemel R S. 2017. End-to-end instance segmentation with recurrent attention//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 293-301 ［DOI： 10.1109/CVPR.2017.39http://dx.doi.org/10.1109/CVPR.2017.39］

Rusu R B， Blodow N and Beetz M. 2009. Fast Point Feature Histograms （FPFH） for 3D registration//Proceedings of 2009 IEEE International Conference on Robotics and Automation. Kobe， Japan： IEEE： 3212-3217 ［DOI： 10.1109/ROBOT.2009.5152473http://dx.doi.org/10.1109/ROBOT.2009.5152473］

Wang T， Wang W J and Cai Y. 2021. Research of deep learning-based semantic segmentation for 3D point cloud. Computer Engineering and Applications， 57（23）： 18-26

王涛，王文举，蔡宇. 2021. 基于深度学习的3维点云语义分割方法研究. 计算机工程与应用， 57（23）： 18-26

Wang W Y， Yu R， Huang Q G and Neumann U. 2018. SGPN： similarity group proposal network for 3D point cloud instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2569-2578 ［DOI： 10.1109/CVPR.2018.00272http://dx.doi.org/10.1109/CVPR.2018.00272］

Wang X L， Liu S， Shen X Y， Shen C H and Jia J Y. 2019a. Associatively segmenting instances and semantics in point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4091-4100 ［DOI： 10.1109/CVPR.2019.00422http://dx.doi.org/10.1109/CVPR.2019.00422］

Wang Y， Sun Y B， Liu Z W， Sarma S E， Bronstein M M and Solomon J M. 2019b. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics， 38（5）： #146 ［DOI： 10.1145/3326362http://dx.doi.org/10.1145/3326362］

Wu W X， Qi Z G and Li F X. 2019. PointConv： deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9613-9622 ［DOI： 10.1109/CVPR.2019.00985http://dx.doi.org/10.1109/CVPR.2019.00985］

Xu M T， Ding R Y， Zhao H S and Qi X J. 2021. PAConv： position adaptive convolution with dynamic kernel assembling on point clouds//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 3172-3181 ［DOI： 10.1109/CVPR46437.2021.00319http://dx.doi.org/10.1109/CVPR46437.2021.00319］

Yang B， Wang J N， Clark R， Hu Q Y， Wang S， Markham A and Trigoni N K. 2019. Learning object bounding boxes for 3D instance segmentation on point clouds ［EB/OL］. ［2019-09-05］. https://arxiv.org/pdf/1906.01140.pdfhttps://arxiv.org/pdf/1906.01140.pdf

Yi L， Kim V G， Ceylan D， Shen I C， Yan M Y， Su H， Lu C W， Huang Q X， Sheffer A and Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics， 35（6）： #210 ［DOI： 10.1145/2980179.2980238http://dx.doi.org/10.1145/2980179.2980238］

Yu Q， Yang C Z， Fan H H and Wei H. 2020. Latent-MVCNN： 3D shape recognition using multiple views from pre-defined or random viewpoints. Neural Processing Letters， 52（1）： 581-602 ［DOI： 10.1007/s11063-020-10268-xhttp://dx.doi.org/10.1007/s11063-020-10268-x］

Zhao L and Tao W B. 2020. JSNet： joint instance and semantic segmentation of 3D point clouds. Proceedings of 2020 AAAI Conference on Artificial Intelligence， 34（7）： 12951-12958 ［DOI： 10.1609/aaai.v34i07.6994http://dx.doi.org/10.1609/aaai.v34i07.6994］

Zhuo W， Salzmann M， He X M and Liu M M. 2017. Indoor scene parsing with instance segmentation， semantic labeling and support relationship inference//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6269-6277 ［DOI： 10.1109/CVPR.2017.664http://dx.doi.org/10.1109/CVPR.2017.664］

文章被引用时，请邮件提醒。

提交

结合双边交叉增强与自注意力补偿的点云语义分割

深度学习多模态图像语义分割前沿进展

融合上下文和注意力的海洋涡旋小目标检测

结合语义分割与模型匹配的室内场景重建方法

融合改进ASPP和极化自注意力的自底向上全景分割