融合小样本元学习和原型对齐的点云分割算法

邱云飞; 牛佳璐

doi:10.11834/jig.220942

图像理解和计算机视觉 | 浏览量 : 0 下载量: 2 CSCD: 0

PDF
导出
分享
收藏
专辑

融合小样本元学习和原型对齐的点云分割算法
Point cloud segmentation algorithm fusing few-shot meta-learning and prototype alignment
2023年28卷第12期页码：3884-3896
纸质出版日期： 2023-12-16 ，
DOI： 10.11834/jig.220942
稿件说明：

移动端阅览

邱云飞，牛佳璐. 2023. 融合小样本元学习和原型对齐的点云分割算法. 中国图象图形学报， 28(12):3884-3896

Qiu Yunfei， Niu Jialu. 2023. Point cloud segmentation algorithm fusing few-shot meta-learning and prototype alignment. Journal of Image and Graphics， 28(12):3884-3896
邱云飞，牛佳璐. 2023. 融合小样本元学习和原型对齐的点云分割算法. 中国图象图形学报， 28(12):3884-3896 DOI： 10.11834/jig.220942.

Qiu Yunfei， Niu Jialu. 2023. Point cloud segmentation algorithm fusing few-shot meta-learning and prototype alignment. Journal of Image and Graphics， 28(12):3884-3896 DOI： 10.11834/jig.220942.

摘要

目的

针对点云分割需要大量监督信息所造成的时间成本高、计算效率低的问题，采用融合原型对齐的小样本元学习算法对点云进行语义分割，使模型能够在监督信息很少的情况下完成分割任务。

方法

首先，为了避免小样本训练时易导致的过拟合问题，采用2个边缘卷积层（edge convolution layer，EdgeConv）与6个MLP（multilayer perceptron）交叉构造DGCNN（dynamic graph convolutional neural network），同时还保证了能充分学习到点云信息；然后，以

-way

-shot的形式将数据集输入上述网络学习支持集与查询集的特征，通过average pooling feature获取类别原型并融合原型对齐算法得到更为鲁棒的支持集原型；最后，通过计算查询集点云特征与支持集原型的欧氏距离实现点云分割。

结果

在S3DIS（Stanford large-scale 3D indoor spaces dataset）、ScanNet和闽南古建筑数据集上进行点云语义分割实验，与原型网络和匹配网络在S3DIS数据集上进行比较。分割1-way时，平均交并比（mean intersection over union，mIoU）相比原型网络和匹配网络分别提高了0.06和0.33，最高类别的mIoU达到0.95；分割2-way时，mIoU相比原型网络提高了0.04；将DGCNN网络与PointNet++做特征提取器的对比时，分割ceiling和floor的mIoU分别提高了0.05和0.30。方法应用在ScanNet数据集和闽南古建筑数据集上的分割mIoU分别为0.63和0.51。

结论

提出的方法可以在少量标记数据的情况下取得良好的点云分割效果。相比于此前需用大量标记数据所训练的模型而言，只需要很少的监督信息，便能够分割出该新类，提高了模型的泛化能力。当面临样本的标记数据难以获得的情况时，提出的方法更能够发挥关键作用。

Abstract

Objective

With the application of 3D point cloud in many fields， such as automatic driving， navigation and positioning， AR house viewing， and model reconstruction， people have started to focus on point cloud research and application. However， given their disorderly and unorganized nature， using irregular point clouds in data processing or directly including them in network training presents a challenge because the standard deep neural network model needs to have rules input data for its structure. To this end， the PointNet network is proposed in this paper. This pioneering network learns to use the per-point features of the shared multilayer perceptron （MLP） and the global features using the symmetric pooling function. PointNet focuses on the global information network but ignores the local information of the point cloud and the neighborhood features. As an improvement， PointNet++ is built by adding the sampling feature extraction of local neighborhood information. This improved model has three main parts， namely， the sampling， grouping， and PointNet layers， which not only extract local information but also combine the advantages of PointNet extracting global information. PointNet++ also has its drawbacks. For instance， this model ignores the geometric relationship between points and is unable to capture local features. To solve these drawbacks， dynamic graph convolutional neural network （DGCNN） proposes EdgeConv， which enhances its data representation ability by establishing topological relationships between points. EdgeConv not only maintains the invariance of the arrangement of point clouds but also captures local geometric features. Most of the related researches are based on a large amount of supervised data， which are too time consuming and labor intensive to process. Given the limited application of few-shot learning in 3D data， this paper proposes a few-shot meta-learning algorithm to semantically segment 3D point cloud data. The prototype alignment algorithm， which can efficiently learn the information of the support set， is also used to split the query set， and the learning ability is adjusted in order for the model to complete the segmentation task even with minimal supervised data.

Method

This paper proposes a method for semantic segmentation of 3D point clouds that differs from the traditional deep learning model segmentation method based on a large amount of supervised information. The proposed method uses the few-shot learning mode to segment point clouds. In order to avoid using a large amount of labeled data for training， this paper adopts a small-sample meta-learning algorithm. Specifically， in the form of multiple

-way

-shot meta-tasks， the dataset is inputted into the network to learn meta-knowledge during the meta-training stage. The training mode of the support set training-query set validation is stopped and repeated until learning to recognize new classes， and then final point cloud segmentation is applied on new classes that have not been learned during the meta-test stage. To avoid overfitting， after several experiments， we use 2 EdgeConv layers and 6-multilayer perceptron to construct the DGCNN network as our feature extractor. Point clouds have an uneven density， with a closer distance corresponding to a higher density. Therefore， using farthest point sampling will lead to a large amount of calculations. Therefore， we use EdgeConv in the DGCNN network， apply k-nearest neighbors （KNN） to search for the nearest neighbors to construct a graph structure， extract the features for each edge using MLP， and aggregate the edge features via average pooling to dynamically update the features of the center point. Given that the comprehensively learned information can express the corresponding category， combined with the idea of the prototype network， the features obtained after the support set and the query set pass through the network are averagely pooled to obtain the prototype of each category. A prototype is used to represent a class， and the fusion of prototype alignment algorithms can efficiently obtain the prototype of the support set and reverse the process of support set training-query set verification. In this reversed process， the query set features and predicted segmentation mask constitute a new “support set” that learns its prototype and segments the original support set data to allow the model to learn the information of the support set， extract a robust prototype， and calculate the Euclidean distance between the point cloud feature of the query set and the prototype of the support set to implement point cloud segmentation.

Result

Point cloud semantic segmentation is performed on the S3DIS， ScanNet， and Minnan ancient buildings （collected by the researchers） datasets to verify the segmentation performance of the proposed model. Compared with the prototype network and the matching network， which is a classical network of few-shot learning， the mean intersection over union （mIoU） of the proposed method is comprehensively improved by 6%. For a single category of 1-way， the highest mIoU of the proposed method can reach 95%， which is 12% higher than that reached by the prototype network. Meanwhile， for 2-way， the mIoU of the proposed method is 4% higher than that of the prototype network. Compared with the matching network， the mIoU of the proposed method is comprehensively improved by 6%. The comparative experiment that use DGCNN and PointNet++ as feature extractors also confirm that DGCNN， as a feature extractor， has a superior learning effect. When segmenting the ceiling and floor categories， DGCNN improves the segmentation mIoU by 5% and 30%， respectively， compared with PointNet++， representing an overall increase of 17%. Meanwhile， the segmentation mIoUs of DGCNN on the ScanNet and Minnan ancient buildings datasets are 63% and 51%， respectively. These experimental results prove that the proposed algorithm can achieve better results compared with traditional prototype network algorithms in point cloud segmentation even with a small amount of labeled data.

Conclusion

Compared with previous models that are trained with a large amount of labeled data， the proposed point cloud segmentation algorithm can segment a new class with little supervision information， thus improving the generalization of the model. This algorithm thus saves manpower， material resources， and time costs in practical applications. When faced with the situation where the labeled data of some samples are difficult to obtain， few-shot learning can play a key role.

关键词

点云分割小样本学习（FSL）元学习原型对齐闽南古建筑

Keywords

point cloud segmentationfew-shot learning（FSL）meta-learningprototype alignmentMinnan ancient buildings

references

Armeni I， Sax S， Zamir A R and Savarese S. 2017. Joint 2D-3D-semantic data for indoor scene understanding. ［EB/OL］. ［2022-09-30］. http://arxiv.org/pdf/1702.01105.pdfhttp://arxiv.org/pdf/1702.01105.pdf

Cheraghian A， Rahman S， Campbell D and Petersson L. 2020. Transductive zero-shot learning for 3D point cloud classification//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass， USA： IEEE： 912-922 ［DOI： 10.1109/WACV45572.2020.9093545http://dx.doi.org/10.1109/WACV45572.2020.9093545］

Dai A， Chang A X， Savva M， Halber M， Funkhouser T and Nießner M. 2017. Scannet： richly-annotated 3D reconstructions of indoor scenes//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2432-2443 ［DOI： 10.1109/CVPR.2017.261http://dx.doi.org/10.1109/CVPR.2017.261］

Dai M F， Xing S， Xu Q， Li P C and Chen K. 2022. Semantic segmentation of airborne LiDAR point cloud based on multi-feature fusion and geometric convolution. Journal of Image and Graphics， 27（2）： 574-585

戴莫凡，邢帅，徐青，李鹏程，陈坤. 2022. 多特征融合与几何卷积的机载LiDAR点云地物分类. 中国图象图形学报， 27（2）： 574-585 ［DOI： 10.11834/jig.210555http://dx.doi.org/10.11834/jig.210555］

Guerry J， Boulch A， Le Saux B， Moras J， Plyer A and Filliat D. 2017. SnapNet-R： consistent 3D multi-view semantic labeling for robotics//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice， Italy： IEEE： 669-678 ［DOI： 10.1109/ICCVW.2017.85http://dx.doi.org/10.1109/ICCVW.2017.85］

Hospedales T， Antoniou A， Micaelli P and Storkey A. 2022. Meta-learning in neural networks： a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（9）： 5149-5169 ［DOI： 10.1109/TPAMI.2021.3079209http://dx.doi.org/10.1109/TPAMI.2021.3079209］.

Huang Z T， Yu Y K， Xu J W， Ni F and Le X Y. 2020. PF-Net： point fractal network for 3D point cloud completion//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 7659-7667 ［DOI： 10.1109/CVPR42600.2020.00768http://dx.doi.org/10.1109/CVPR42600.2020.00768］

Li F F， Fergus and Perona. 2003. A Bayesian approach to unsupervised one-shot learning of object categories//Proceedings the 9th IEEE International Conference on Computer Vision. Nice， France： IEEE： 1134-1141 ［DOI： 10.1109/ICCV.2003.1238476http://dx.doi.org/10.1109/ICCV.2003.1238476］

Li Y， Ma L F， Zhong Z L， Liu F， Chapman M A， Cao D P and Li J. 2021. Deep learning for lidar point clouds in autonomous driving： a review. IEEE Transactions on Neural Networks and Learning Systems， 32（8） 3412-3432 ［DOI： 10.1109/TNNLS.2020.3015992http://dx.doi.org/10.1109/TNNLS.2020.3015992］

Liu M and Siegwart R. 2014. Navigation on point-cloud—A Riemannian metric approach//Proceedings of 2014 IEEE International Conference on Robotics and Automation （ICRA）. Hong Kong， China： IEEE： 4088-4093 ［DOI： 10.1109/ICRA.2014.6907453http://dx.doi.org/10.1109/ICRA.2014.6907453］

Liu Z， Li Q Y， Chen X F， Wu C， Ishihara S， Li J and Ji Y S. 2021. Point cloud video streaming： challenges and solutions. IEEE Network， 35（5）： 202-209 ［DOI： 10.1109/MNET.101.2000364http://dx.doi.org/10.1109/MNET.101.2000364］

Mandikal P and Radhakrishnan V B. 2019. Dense 3D point cloud reconstruction using a deep pyramid network//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision （WACV）. Waikoloa， USA： IEEE： 1052-1060 ［DOI： 10.1109/WACV.2019.00117http://dx.doi.org/10.1109/WACV.2019.00117］

Phan A V， Le Nguyen M， Nguyen Y L H and Bui L T. 2018. DGCNN： a convolutional neural network over large-scale labeled graphs. Neural Networks， 108： 533-543 ［DOI： 10.1016/j.neunet.2018.09.001http://dx.doi.org/10.1016/j.neunet.2018.09.001］

Puri R， Zakhor A and Puri R. 2020. Few shot learning for point cloud data using model agnostic meta learning//Proceedings of 2020 IEEE International Conference on Image Processing （ICIP）. Abu Dhabi， United Arab Emirates： IEEE： 1906-1910 ［DOI： 10.1109/ICIP40778.2020.9190819http://dx.doi.org/10.1109/ICIP40778.2020.9190819］

Qi C R， Su H， Kaichun M and Guibas L J. 2017a. PointNet： deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 77-85 ［DOI： 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16］

Qi C R， Yi L， Su H and Guibas L J. 2017b. PointNet++： deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 5105-5114

Qiu Y F and Zhu M Y. 2021. Point cloud analysis model combining inverse density function and relation-shape convolution neural network. Journal of Image and Graphics， 26（4）： 898-909

邱云飞，朱梦影. 2021. 融合逆密度函数与关系形状卷积神经网络的点云分析. 中国图象图形学报， 26（4）： 898-909 ［DOI： 10.11834/jig.200159http://dx.doi.org/10.11834/jig.200159］

Riegler G， Ulusoy A O and Geiger A. 2017. OctNet： Learning deep 3D representations at high resolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6620-6629 ［DOI： 10.1109/CVPR.2017.701http://dx.doi.org/10.1109/CVPR.2017.701］

Snell J， Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 4080-4090

Socher R， Ganjoo M， Manning C D and Ng A Y. 2013. Zero-shot learning through cross-modal transfer//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe， USA： Curran Associates Inc.： 935-943

Tchapmi L， Choy C， Armeni I， Gwak J Y and Savarese S. 2017. SEGCloud： semantic segmentation of 3D point clouds//Proceedings of 2017 International Conference on 3D Vision （3DV）. Qingdao， China： IEEE： 537-547 ［DOI： 10.1109/3DV.2017.00067http://dx.doi.org/10.1109/3DV.2017.00067］

Uy M A， Pham Q H， Hua B S， Nguyen T and Yeung S K. 2019. Revisiting point cloud classification： a new benchmark dataset and classification model on real-world data//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1588-1597 ［DOI： 10.1109/ICCV.2019.00167http://dx.doi.org/10.1109/ICCV.2019.00167］

Vinyals O， Blundell C， Lillicrap T， Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 3637-3645

Wang K X， Liew J H， Zou Y T， Zhou D Q and Feng J S. 2019a. PANet： few-shot image semantic segmentation with prototype alignment//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 9196-9205 ［DOI： 10.1109/ICCV.2019.00929http://dx.doi.org/10.1109/ICCV.2019.00929］

Wang R G， Zheng Y， Yang J and Xue L X. 2019. Representative feature networks for few-shot learning. Journal of Image and Graphics， 24（9）： 1514-1527

汪荣贵，郑岩，杨娟，薛丽霞. 2019. 代表特征网络的小样本学习方法. 中国图象图形学报， 24（9）： 1514-1527 ［DOI： 10.11834/jig.180629http://dx.doi.org/10.11834/jig.180629］.

Wang Y， Sun Y B， Liu Z W， Sarma S E， Bronstein M M and Solomon J M. 2019b. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics， 38（5）： #146 ［DOI： 10.1145/3326362http://dx.doi.org/10.1145/3326362］

Wang Y Q， Yao Q M， Kwok J T and Ni L M. 2020. Generalizing from a few examples： a survey on few-shot learning. ACM Computing Surveys， 53（3）： #63 ［DOI： 10.1145/3386252http://dx.doi.org/10.1145/3386252］

Wu B， Yu B L， Wu Q S， Yao S J， Zhao F， Mao W Q and Wu J P. 2017. A graph-based approach for 3D building model reconstruction from airborne LiDAR point clouds. Remote Sensing， 9（1）： #92 ［DOI： 10.3390/rs9010092http://dx.doi.org/10.3390/rs9010092］

Wu B C， Wan A， Yue X Y and Keutzer K. 2018. SqueezeSeg： convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud//Proceedings of 2018 IEEE International Conference on Robotics and Automation （ICRA）. Brisbane， Australia： IEEE： 1887-1893 ［DOI： 10.1109/ICRA.2018.8462926http://dx.doi.org/10.1109/ICRA.2018.8462926］

Yang H， Shi J N and Carlone L. 2021. TEASER： fast and certifiable point cloud registration. IEEE Transactions on Robotics， 37（2）： 314-333 ［DOI： 10.1109/TRO.2020.3033695http://dx.doi.org/10.1109/TRO.2020.3033695］

文章被引用时，请邮件提醒。

提交

场景视点偏移的激光雷达点云分割

深度学习图像数据增广方法研究综述