结合动态图卷积和空间注意力的点云分类与分割

宋巍; 蔡万源; 何盛琪; 李文俊

doi:10.11834/jig.200550

图像理解和计算机视觉 | 浏览量 : 0 下载量: 129 CSCD: 6

PDF
导出
分享
收藏
专辑

结合动态图卷积和空间注意力的点云分类与分割
Dynamic graph convolution with spatial attention for point cloud classification and segmentation
2021年26卷第11期页码：2691-2702
收稿：2020-09-15，

修回：2020-12-24，

录用：2020-12-31，

纸质出版：2021-11-16
DOI： 10.11834/jig.200550
稿件说明：

移动端阅览

宋巍, 蔡万源, 何盛琪, 李文俊. 结合动态图卷积和空间注意力的点云分类与分割[J]. 中国图象图形学报, 2021,26(11):2691-2702. DOI： 10.11834/jig.200550.

Wei Song, Wanyuan Cai, Shengqi He, Wenjun Li. Dynamic graph convolution with spatial attention for point cloud classification and segmentation[J]. Journal of Image and Graphics, 2021, 26(11): 2691-2702. DOI： 10.11834/jig.200550.

摘要

目的

随着3维采集技术的飞速发展，点云在计算机视觉、自动驾驶和机器人等领域有着广泛的应用前景。深度学习作为人工智能领域的主流技术，在解决各种3维视觉问题上已表现出巨大潜力。现有基于深度学习的3维点云分类分割方法通常在聚合局部邻域特征的过程中选择邻域特征中的最大值特征，忽略了其他邻域特征中的有用信息。

方法

本文提出一种结合动态图卷积和空间注意力的点云分类分割方法（dynamic graph convolution spatial attention neural networks，DGCSA）。通过将动态图卷积模块与空间注意力模块相结合，实现更精确的点云分类分割效果。使用动态图卷积对点云数据进行K近邻构图并提取其边特征。在此基础上，针对局部邻域聚合过程中容易产生信息丢失的问题，设计了一种基于点的空间注意力（spatial attention，SA）模块，通过使用注意力机制自动学习出比最大值特征更具有代表性的局部特征，从而提高模型的分类分割精度。

结果

本文分别在ModelNet40、ShapeNetPart和S3DIS（Stanford Large-scale 3D Indoor Spaces Dataset）数据集上进行分类、实例分割和语义场景分割实验，验证模型的分类分割性能。实验结果表明，该方法在分类任务上整体分类精度达到93.4%；实例分割的平均交并比达到85.3%；在室内场景分割的6折交叉检验平均交并比达到59.1%，相比基准网络动态图卷积网络分别提高0.8%、0.2%和3.0%，有效改善了模型性能。

结论

使用动态图卷积模块提取点云特征，在聚合局部邻域特征中引入空间注意力机制，相较于使用最大值特征池化，可以更好地聚合邻域特征，有效提高了模型在点云上的分类、实例分割与室内场景语义分割的精度。

Abstract

Objective

With the rapid development of 3D acquisition technologies

point cloud has wide applications in many areas

such as medicine

autonomous driving

and robotics. As a dominant technique in artificial intelligence(AI)

deep learning has been successfully used to solve various 2D vision problems and has shown great potential in solving 3D vision problems. Using regular grid convolutional neural networks (CNN) for non-Euclidian space of point cloud data and capturing the hidden shapes from irregular points remains challenging. In recent years

deep learning-based methods have been more effective in point cloud classification and segmentation than traditional methods. Deep learning-based methods can be divided into three groups: pointwise methods

convolutional-based methods

and graph convolutional-based methods. These methods include two important processes: feature extraction and feature aggregation. Most of the methods focus on the design of feature extraction and pay less attention to feature aggregation. At present

most point cloud classification and segmentation methods based on deep learning use max pooling for feature aggregation. However

using the maximum value features of neighborhood features in local neighborhood features has the problem of information loss caused by ignoring other neighborhood information.

Method

This paper proposes a dynamic graph convolution with spatial attention for point cloud classification and segmentation method based on deep learning-dynamic graph convolution spatial attention (DGCSA) neural networks. The key of the network is to learn from the relationship between the neighbor points and the center point

which avoid the information loss caused by feature aggregation using max pool layers in feature aggregation. This network is composed of a dynamic graph convolution module and a spatial attention (SA) module. The dynamic graph convolution module mainly performs K-nearest neighbor (KNN) search algorithm and multiple-layer perception. For each point cloud

it first uses the KNN algorithm to search its neighbor points. Then

it extracts the features of the neighbor points and center points by convolutional layers. The K-nearest neighbors of each point vary in different network layers

leading to a dynamic graph structure updated with layers. After feature extraction

it applies a point-based SA module to learn the local features that are more representative than the maximum feature automatically. The key of the SA module is to use the attention mechanism to calculate the weight of K-neighbor points of the center point. It consists of four units: 1) attention activation unit

2) attention scores unit

3) weighted features unit

and 4) multilayer perceptron unit. First

the attention activation of each potential feature is learned through the fully connected layer. Second

the attention score of the corresponding feature is calculated by applying the SoftMax function on the attention activation value. The learned attention score can be regarded as a mask for automatically selecting useful potential features. Third

the attention score is multiplied by the corresponding elements of the local neighborhood features to generate a set of weighted features. Finally

the sum of the weighted features is determined to obtain the locally representative local features

followed by another fully connected convolutional layer to control the output dimension of the SA module. The SA module has strong learning ability

thereby improving the classification and segmentation accuracy of the model. DGCSA implements a high-performance classification and segmentation of point clouds by stacking several dynamic graph convolution modules and SA modules. Moreover

feature fusion is used to fuse the output features of different spatial attention layers that can effectively obtain the global and local characteristics of point cloud data

achieving better classification and segmentation results.

Result

To evaluate the performance of the proposed DGCSA model

experiments are carried out in classification

instance segmentation

and semantic scene segmentation on the datasets of ModelNet40

ShapeNetPart

and Stanford large-scale 3D Indoor spaces dataset

respectively. Experiment results show that the overall accuracy (OA) of our method reaches 93.4%

which is 0.8% higher than the baseline network dynamic graph CNN (DGCNN). The mean intersection-to-union (mIoU) of instance segmentation reaches 85.3%

which is 0.2% higher than DGCNN; for indoor scene segmentation

the mIoU of the six-fold cross-validation reaches 59.1%

which is 3.0% higher than DGCNN. Overall

the classification accuracy of our method on the ModelNet40 dataset surpasses that of most existing point cloud classification methods

such as PointNet

PointNet++

and PointCNN. The accuracy of DGCSA in instance segmentation and indoor scene segmentation reaches the segmentation accuracy of the current excellent point cloud segmentation network. Furthermore

the validity of the SA module is verified by an ablation study

where the max pooling operations in PointNet and linked dynamic graph CNN (LDGCNN) are replaced by the SA module. The classification results on the ModelNet40 dataset show that the SA module contributes to a more than 0.5% increase of classification accuracy for PointNet and LDGCNN.

Conclusion

DGCSA can effectively aggregate local features of point cloud data and achieve better classification and segmentation results. Through the design of SA module

this network solves the problem of partial information loss in the aggregation local neighborhood information. The SA module fully considers all neighborhood contributions

selectively strengthens the features containing useful information

and suppresses useless features. Combining the spatial attention module with the dynamic graph convolution module

our network can improve the accuracy of classification

instance segmentation

and indoor scene segmentation. In addition

the spatial attention module can integrate with other point cloud classification model and substantially improve the model performance. Our future work will improve the accuracy of DGCSA in segmentation task in the condition of an unbalanced dataset.

关键词

Keywords

references

Armeni I, Sener O, Zamir A R, Jiang H L, Brilakis I, Fischer M and Savarese S. 2016. 3D semantic parsing of large-scale indoor spaces//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1534-1543[ DOI: 10.1109/CVPR.2016.170 http://dx.doi.org/10.1109/CVPR.2016.170 ]

Besl P J and Jain R C. 1988. Segmentation through variable-order surface fitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(2): 167-192[DOI: 10.1109/34.3881]

Bruna J, Zaremba W, Szlam A and LeCun Y. 2013. Spectral networks and locally connected networks on graphs[EB/OL]. [2020-08-15] . https://arxiv.org/pdf/1312.6203.pdf https://arxiv.org/pdf/1312.6203.pdf

Chen X Z, Ma H M, Wan J, Li B and Xia T. 2017. Multi-view 3D object detection network for autonomous driving//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6526-6534[ DOI: 10.1109/CVPR.2017.691 http://dx.doi.org/10.1109/CVPR.2017.691 ]

Defferrard M, Bresson X and Vandergheynst P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS: 3844-3852[ DOI: 10.5555/3157382.3157527 http://dx.doi.org/10.5555/3157382.3157527 ]

Filin S and Pfeifer N. 2006. Segmentation of airborne laser scanning data using a slope adaptive neighborhood. ISPRS Journal of Photogrammetry and Remote Sensing, 60(2): 71-80[DOI: 10.1016/j.isprsjprs.2005.10.005]

Fischler M A and Bolles R C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6): 381-395[DOI: 10.1145/358669.358692]

Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2020. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence[DOI: 10.1109/TPAMI.2020.3005434]

Kipf T N and Welling M. 2016. Semi-supervised classification with graph convolutional networks[EB/OL]. [2020-08-15] . https://arxiv.org/pdf/1609.02907.pdf https://arxiv.org/pdf/1609.02907.pdf

LeCun Y, Bengio Y and Hinton G. 2015 Deep learning. Nature, 2015, 521(7553): 436-444[DOI: 10.1038/nature14539]

Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018. PointCNN: convolution on Χ -transformed points//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: NeurIPS: 828-838

Qi C R, Su H, Mo K C and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[ DOI: 10.1109/CVPR.2017.16 http://dx.doi.org/10.1109/CVPR.2017.16 ]

Qi C R, Yi L, Su H and Guibas L J. 2017b. Pointnet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: NIPS: 5105-5114[ DOI: 10.5555/3295222.3295263 http://dx.doi.org/10.5555/3295222.3295263 ]

Simonovsky M and Komodakis N. 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 29-38[ DOI: 10.1109/CVPR.2017.11 http://dx.doi.org/10.1109/CVPR.2017.11 ]

Su H, Maji S, Kalogerakis E and Learned-Miller E. 2015. Multi-view convolutional neural networks for 3D shape recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 945-953[ DOI: 10.1109/ICCV.2015.114 http://dx.doi.org/10.1109/ICCV.2015.114 ]

Te G S, Hu W, Zheng A M and Guo Z M. 2018. RGCNN: regularized graph CNN for point cloud segmentation//Proceedings of the 26th ACM International Conference on Multimedia. Seoul, Korea (South): ACM: 746-754[ DOI: 10.1145/3240508.3240621 http://dx.doi.org/10.1145/3240508.3240621 ]

Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F and Guibas L. 2019. Kpconv: flexible and deformable convolution for point clouds//Proceedings of 2019 IEEE/CVF International Conference onComputer Vision. Seoul, Korea (South): IEEE: 6410-6419[ DOI: 10.1109/ICCV.2019.00651 http://dx.doi.org/10.1109/ICCV.2019.00651 ]

Wang C, Samari B and Siddiqi K. 2018. Local spectral graph convolution for point set feature learning//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 56-71[ DOI: 10.1007/978-3-030-01225-0_4 http://dx.doi.org/10.1007/978-3-030-01225-0_4 ]

Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M and Solomon J M. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): 146, 1-12[DOI: 10.1145/3326362]

Wu W X, Qi Z G and Li F X. 2019. PointConv: deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9613-9622[ DOI: 10.1109/cvpr.2019.00985 http://dx.doi.org/10.1109/cvpr.2019.00985 ]

Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D shapenets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920[ DOI: 10.1109/CVPR.2015.7298801 http://dx.doi.org/10.1109/CVPR.2015.7298801 ]

Xu Y F, Fan T Q, Xu M Y, Zeng L and Qiao Y. 2018. Spidercnn: deep learning on point sets with parameterized convolutional filters//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 90-105[ DOI: 10.1007/978-3-030-01237-3_6 http://dx.doi.org/10.1007/978-3-030-01237-3_6 ]

Yi L, Kim V G, Ceylan D, Shen I C, Yan M Y, Su H, Lu C W, Huang Q X, Sheffer A and Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics, 35(6): 210, 1-12[DOI: 10.1145/2980179.2980238]

Zhang K G, Hao M, Wang J, de Silva C W and Fu C L. 2019. Linked dynamic graph CNN: learning on point cloud via linking hierarchical features. [EB/OL]. [2020-08-15] . https://arxiv.org/pdf/1904.10014.pdf https://arxiv.org/pdf/1904.10014.pdf

Zhang Y X and Rabbat M. 2018. A graph-CNN for 3D point cloud classification//Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, Canada: IEEE: 6279-6283[ DOI: 10.1109/ICASSP.2018.8462291 http://dx.doi.org/10.1109/ICASSP.2018.8462291 ]

Zhao H S, Jiang L, Fu C W and Jia J Y. 2019. PointWeb: enhancing local neighborhood features for point cloud processing//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5560-5568[ DOI: 10.1109/CVPR.2019.00571 http://dx.doi.org/10.1109/CVPR.2019.00571 ]

Zhou Y and Tuzel O. 2018. Voxelnet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4490-4499[ DOI:10.1109/CVPR.2018.00472 http://dx.doi.org/10.1109/CVPR.2018.00472 ]