宋巍,蔡万源,何盛琪,李文俊(上海海洋大学信息学院, 上海 201306;中国极地研究中心, 上海 200136)
目的 随着3维采集技术的飞速发展，点云在计算机视觉、自动驾驶和机器人等领域有着广泛的应用前景。深度学习作为人工智能领域的主流技术，在解决各种3维视觉问题上已表现出巨大潜力。现有基于深度学习的3维点云分类分割方法通常在聚合局部邻域特征的过程中选择邻域特征中的最大值特征，忽略了其他邻域特征中的有用信息。方法 本文提出一种结合动态图卷积和空间注意力的点云分类分割方法（dynamic graph convolution spatial attention neural networks，DGCSA）。通过将动态图卷积模块与空间注意力模块相结合，实现更精确的点云分类分割效果。使用动态图卷积对点云数据进行K近邻构图并提取其边特征。在此基础上，针对局部邻域聚合过程中容易产生信息丢失的问题，设计了一种基于点的空间注意力（spatial attention，SA）模块，通过使用注意力机制自动学习出比最大值特征更具有代表性的局部特征，从而提高模型的分类分割精度。结果 本文分别在ModelNet40、ShapeNetPart和S3DIS（Stanford Large-scale 3D Indoor Spaces Dataset）数据集上进行分类、实例分割和语义场景分割实验，验证模型的分类分割性能。实验结果表明，该方法在分类任务上整体分类精度达到93.4%；实例分割的平均交并比达到85.3%；在室内场景分割的6折交叉检验平均交并比达到59.1%，相比基准网络动态图卷积网络分别提高0.8%、0.2%和3.0%，有效改善了模型性能。结论 使用动态图卷积模块提取点云特征，在聚合局部邻域特征中引入空间注意力机制，相较于使用最大值特征池化，可以更好地聚合邻域特征，有效提高了模型在点云上的分类、实例分割与室内场景语义分割的精度。
Dynamic graph convolution with spatial attention for point cloud classification and segmentation
Song Wei,Cai Wanyuan,He Shengqi,Li Wenjun(College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;Polar Research Institute of China, Shanghai 200136, China)
Objective With the rapid development of 3D acquisition technologies, point cloud has wide applications in many areas, such as medicine, autonomous driving, and robotics. As a dominant technique in artificial intelligence(AI), deep learning has been successfully used to solve various 2D vision problems and has shown great potential in solving 3D vision problems. Using regular grid convolutional neural networks (CNN) for non-Euclidian space of point cloud data and capturing the hidden shapes from irregular points remains challenging. In recent years, deep learning-based methods have been more effective in point cloud classification and segmentation than traditional methods. Deep learning-based methods can be divided into three groups:pointwise methods, convolutional-based methods, and graph convolutional-based methods. These methods include two important processes:feature extraction and feature aggregation. Most of the methods focus on the design of feature extraction and pay less attention to feature aggregation. At present, most point cloud classification and segmentation methods based on deep learning use max pooling for feature aggregation. However, using the maximum value features of neighborhood features in local neighborhood features has the problem of information loss caused by ignoring other neighborhood information. Method This paper proposes a dynamic graph convolution with spatial attention for point cloud classification and segmentation method based on deep learning-dynamic graph convolution spatial attention (DGCSA) neural networks. The key of the network is to learn from the relationship between the neighbor points and the center point, which avoid the information loss caused by feature aggregation using max pool layers in feature aggregation. This network is composed of a dynamic graph convolution module and a spatial attention (SA) module. The dynamic graph convolution module mainly performs K-nearest neighbor (KNN) search algorithm and multiple-layer perception. For each point cloud, it first uses the KNN algorithm to search its neighbor points. Then, it extracts the features of the neighbor points and center points by convolutional layers. The K-nearest neighbors of each point vary in different network layers, leading to a dynamic graph structure updated with layers. After feature extraction, it applies a point-based SA module to learn the local features that are more representative than the maximum feature automatically. The key of the SA module is to use the attention mechanism to calculate the weight of K-neighbor points of the center point. It consists of four units:1) attention activation unit, 2) attention scores unit, 3) weighted features unit, and 4) multilayer perceptron unit. First, the attention activation of each potential feature is learned through the fully connected layer. Second, the attention score of the corresponding feature is calculated by applying the SoftMax function on the attention activation value. The learned attention score can be regarded as a mask for automatically selecting useful potential features. Third, the attention score is multiplied by the corresponding elements of the local neighborhood features to generate a set of weighted features. Finally, the sum of the weighted features is determined to obtain the locally representative local features, followed by another fully connected convolutional layer to control the output dimension of the SA module. The SA module has strong learning ability, thereby improving the classification and segmentation accuracy of the model. DGCSA implements a high-performance classification and segmentation of point clouds by stacking several dynamic graph convolution modules and SA modules. Moreover, feature fusion is used to fuse the output features of different spatial attention layers that can effectively obtain the global and local characteristics of point cloud data, achieving better classification and segmentation results. Result To evaluate the performance of the proposed DGCSA model, experiments are carried out in classification, instance segmentation, and semantic scene segmentation on the datasets of ModelNet40, ShapeNetPart, and Stanford large-scale 3D Indoor spaces dataset, respectively. Experiment results show that the overall accuracy (OA) of our method reaches 93.4%, which is 0.8% higher than the baseline network dynamic graph CNN (DGCNN). The mean intersection-to-union (mIoU) of instance segmentation reaches 85.3%, which is 0.2% higher than DGCNN; for indoor scene segmentation, the mIoU of the six-fold cross-validation reaches 59.1%, which is 3.0% higher than DGCNN. Overall, the classification accuracy of our method on the ModelNet40 dataset surpasses that of most existing point cloud classification methods, such as PointNet, PointNet++, and PointCNN. The accuracy of DGCSA in instance segmentation and indoor scene segmentation reaches the segmentation accuracy of the current excellent point cloud segmentation network. Furthermore, the validity of the SA module is verified by an ablation study, where the max pooling operations in PointNet and linked dynamic graph CNN (LDGCNN) are replaced by the SA module. The classification results on the ModelNet40 dataset show that the SA module contributes to a more than 0.5% increase of classification accuracy for PointNet and LDGCNN. Conclusion DGCSA can effectively aggregate local features of point cloud data and achieve better classification and segmentation results. Through the design of SA module, this network solves the problem of partial information loss in the aggregation local neighborhood information. The SA module fully considers all neighborhood contributions, selectively strengthens the features containing useful information, and suppresses useless features. Combining the spatial attention module with the dynamic graph convolution module, our network can improve the accuracy of classification, instance segmentation, and indoor scene segmentation. In addition, the spatial attention module can integrate with other point cloud classification model and substantially improve the model performance. Our future work will improve the accuracy of DGCSA in segmentation task in the condition of an unbalanced dataset.