Current Issue Cover
融合小样本元学习和原型对齐的点云分割算法

邱云飞1, 牛佳璐1,2(1.辽宁工程技术大学软件学院, 葫芦岛 125100;2.中国科学院海西研究院泉州装备制造研究所, 泉州 362216)

摘 要
目的 针对点云分割需要大量监督信息所造成的时间成本高、计算效率低的问题,采用融合原型对齐的小样本元学习算法对点云进行语义分割,使模型能够在监督信息很少的情况下完成分割任务。方法 首先,为了避免小样本训练时易导致的过拟合问题,采用2个边缘卷积层(edge convolution layer,EdgeConv)与6个MLP (multilayer perceptron)交叉构造DGCNN (dynamic graph convolutional neural network),同时还保证了能充分学习到点云信息;然后,以N-way K-shot的形式将数据集输入上述网络学习支持集与查询集的特征,通过average pooling feature获取类别原型并融合原型对齐算法得到更为鲁棒的支持集原型;最后,通过计算查询集点云特征与支持集原型的欧氏距离实现点云分割。结果 在S3DIS (Stanford large-scale 3D indoor spaces dataset)、ScanNet和闽南古建筑数据集上进行点云语义分割实验,与原型网络和匹配网络在S3DIS数据集上进行比较。分割1-way时,平均交并比(mean intersection over union,mIoU)相比原型网络和匹配网络分别提高了0.06和0.33,最高类别的mIoU达到0.95;分割2-way时,mIoU相比原型网络提高了0.04;将DGCNN网络与PointNet++做特征提取器的对比时,分割ceiling和floor的mIoU分别提高了0.05和0.30。方法应用在ScanNet数据集和闽南古建筑数据集上的分割mIoU分别为0.63和0.51。结论 提出的方法可以在少量标记数据的情况下取得良好的点云分割效果。相比于此前需用大量标记数据所训练的模型而言,只需要很少的监督信息,便能够分割出该新类,提高了模型的泛化能力。当面临样本的标记数据难以获得的情况时,提出的方法更能够发挥关键作用。
关键词
Point cloud segmentation algorithm fusing few-shot meta-learning and prototype alignment

Qiu Yunfei1, Niu Jialu1,2(1.College of Software, Liaoning Technical University, Huludao 125100, China;2.Quanzhou Institute of Equipment Manufacturing Haixi Institutes, Chinese Academy of Sciences, Quanzhou 362216, China)

Abstract
Objective With the application of 3D point cloud in many fields, such as automatic driving, navigation and positioning, AR house viewing, and model reconstruction, people have started to focus on point cloud research and application. However, given their disorderly and unorganized nature, using irregular point clouds in data processing or directly including them in network training presents a challenge because the standard deep neural network model needs to have rules input data for its structure. To this end, the PointNet network is proposed in this paper. This pioneering network learns to use the per-point features of the shared multilayer perceptron(MLP)and the global features using the symmetric pooling function. PointNet focuses on the global information network but ignores the local information of the point cloud and the neighborhood features. As an improvement, PointNet++ is built by adding the sampling feature extraction of local neighborhood information. This improved model has three main parts, namely, the sampling, grouping, and PointNet layers, which not only extract local information but also combine the advantages of PointNet extracting global information. PointNet++ also has its drawbacks. For instance, this model ignores the geometric relationship between points and is unable to capture local features. To solve these drawbacks, dynamic graph convolutional neural network(DGCNN)proposes EdgeConv, which enhances its data representation ability by establishing topological relationships between points. EdgeConv not only maintains the invariance of the arrangement of point clouds but also captures local geometric features. Most of the related researches are based on a large amount of supervised data, which are too time consuming and labor intensive to process. Given the limited application of few-shot learning in 3D data, this paper proposes a few-shot metalearning algorithm to semantically segment 3D point cloud data. The prototype alignment algorithm, which can efficiently learn the information of the support set, is also used to split the query set, and the learning ability is adjusted in order for the model to complete the segmentation task even with minimal supervised data. Method This paper proposes a method for semantic segmentation of 3D point clouds that differs from the traditional deep learning model segmentation method based on a large amount of supervised information. The proposed method uses the few-shot learning mode to segment point clouds. In order to avoid using a large amount of labeled data for training, this paper adopts a small-sample meta-learning algorithm. Specifically, in the form of multiple N-way K-shot meta-tasks, the dataset is inputted into the network to learn meta-knowledge during the meta-training stage. The training mode of the support set training-query set validation is stopped and repeated until learning to recognize new classes, and then final point cloud segmentation is applied on new classes that have not been learned during the meta-test stage. To avoid overfitting, after several experiments, we use 2 EdgeConv layers and 6-multilayer perceptron to construct the DGCNN network as our feature extractor. Point clouds have an uneven density, with a closer distance corresponding to a higher density. Therefore, using farthest point sampling will lead to a large amount of calculations. Therefore, we use EdgeConv in the DGCNN network, apply k-nearest neighbors(KNN)to search for the nearest neighbors to construct a graph structure, extract the features for each edge using MLP, and aggregate the edge features via average pooling to dynamically update the features of the center point. Given that the comprehensively learned information can express the corresponding category, combined with the idea of the prototype network, the features obtained after the support set and the query set pass through the network are averagely pooled to obtain the prototype of each category. A prototype is used to represent a class, and the fusion of prototype alignment algorithms can efficiently obtain the prototype of the support set and reverse the process of support set training-query set verification. In this reversed process, the query set features and predicted segmentation mask constitute a new"support set"that learns its prototype and segments the original support set data to allow the model to learn the information of the support set, extract a robust prototype, and calculate the Euclidean distance between the point cloud feature of the query set and the prototype of the support set to implement point cloud segmentation. Result Point cloud semantic segmentation is performed on the S3DIS, ScanNet, and Minnan ancient buildings(collected by the researchers)datasets to verify the segmentation performance of the proposed model. Compared with the prototype network and the matching network, which is a classical network of few-shot learning, the mean intersection over union(mIoU)of the proposed method is comprehensively improved by 6%. For a single category of 1-way, the highest mIoU of the proposed method can reach 95%, which is 12% higher than that reached by the prototype network. Meanwhile, for 2-way, the mIoU of the proposed method is 4% higher than that of the prototype network. Compared with the matching network, the mIoU of the proposed method is comprehensively improved by 6%. The comparative experiment that use DGCNN and PointNet++ as feature extractors also confirm that DGCNN, as a feature extractor, has a superior learning effect. When segmenting the ceiling and floor categories, DGCNN improves the segmentation mIoU by 5% and 30%, respectively, compared with PointNet++, representing an overall increase of 17%. Meanwhile, the segmentation mIoUs of DGCNN on the ScanNet and Minnan ancient buildings datasets are 63% and 51%, respectively. These experimental results prove that the proposed algorithm can achieve better results compared with traditional prototype network algorithms in point cloud segmentation even with a small amount of labeled data. Conclusion Compared with previous models that are trained with a large amount of labeled data, the proposed point cloud segmentation algorithm can segment a new class with little supervision information, thus improving the generalization of the model. This algorithm thus saves manpower, material resources, and time costs in practical applications. When faced with the situation where the labeled data of some samples are difficult to obtain, few-shot learning can play a key role.
Keywords

订阅号|日报