面向形状特征的多维度多层级点云分析

徐嘉利; 方志军; 伍世虔

doi:10.11834/jig.210592

三维点云分割 | 浏览量 : 0 下载量: 84 CSCD: 0

PDF
导出
分享
收藏
专辑

面向形状特征的多维度多层级点云分析
Multi-dimensional multi-layer point cloud analysis for shape features
2022年27卷第2期页码：562-573
收稿：2021-07-23，

修回：2021-9-12，

录用：2021-9-19，

纸质出版：2022-02-16
DOI： 10.11834/jig.210592
稿件说明：

移动端阅览

徐嘉利, 方志军, 伍世虔. 面向形状特征的多维度多层级点云分析[J]. 中国图象图形学报, 2022,27(2):562-573. DOI： 10.11834/jig.210592.

Jiali Xu, Zhijun Fang, Shiqian Wu. Multi-dimensional multi-layer point cloud analysis for shape features[J]. Journal of Image and Graphics, 2022, 27(2): 562-573. DOI： 10.11834/jig.210592.

摘要

目的

3维点云是编码几何信息的主要数据结构，与2维视觉数据不同的是，点云中隐藏了3维物体中重要的形状特征。为更好地从无序的点云中挖掘形状特征，本文提出一种能够端到端且鲁棒地处理点云数据的多维度多层级神经网络（multi-dimensional multi-layer neural network，MM-Net）。

方法

多维度特征修正与融合（multi-dimensional feature correction and fusion module，MDCF）模块从多个维度自适应地修正局部特征和逐点特征，并将其整合至高维空间以获得丰富的区域形状。另一方面，多层级特征衔接（multi-layer feature articulation module，MLFA）模块利用多个层级间的远程依赖关系，推理得到网络所需的全局形状。此外设计了两种分别应用于点云分类与分割任务的网络结构MM-Net-C（multi-dimensional multi-layer feature classification network）和MM-Net-S（multi-dimensional multi-layer feature segmentation network）。

结果

在公开的ModelNet40数据集与ShapeNet数据集上进行测试，并与多种方法进行比较。在ModelNet40数据集中，MM-Net-C的分类精度较PointNet++和DGCNN（dynamic graph convolutional neural network）方法分别提高了2.2%和1.9%；在ShapeNet数据集中，MM-Net-S的分割精度较ELM（extreme learning machine）和A-CNN（annularly convolutional neural networks）方法分别提高了1.2%和0.4%。此外，在ModelNet40数据集中的消融实验验证了多维度多层级神经网络（MM-Net）架构的可靠性，消融实验的结果也表明了多维度特征修正与融合（MDCF）模块和多层级特征衔接（MLFA）模块设计的必要性。

结论

本文提出的多维度多层级神经网络（MM-Net）在分类与分割任务中取得了优秀的性能。

Abstract

Objective

With the widespread use of depth cameras and 3D scanning equipment

3D data with point clouds as the main structure have become more readily available to people. As a result

3D point clouds are widely used in practical applications such as self-driving cars

location recognition

robot localization

and remote sensing. In recent years

the great success of convolutional neural networks (CNNs) has changed the landscape of 2D computer vision. However

CNNs cannot directly process unstructured data such as point clouds due to the disorderly

irregular characteristics of 3D point clouds. Therefore

mine shape features from disordered point clouds have become a viable research direction in point cloud analysis.

Method

An end-to-end multidimensional multilayer neural network (MM-Net)

which can directly process point cloud data

is presented in this paper. The multi-dimensional feature correction and fusion (MDCF) module can correct local features in different dimensions rationally. First

the local area division unit

using farthest point sampling and ball query

constructs local areas at different radii from which the 10D geometric relations and local features required are obtained for the module. Inspired by related research

the module uses geometric relations to modify the point-wise features

enhance the interaction between points

and encode useful local features

which are supplemented by point-wise features. Finally

the shape features of different region ranges are fused and mapped to a higher dimensional space. At the same time

the multi-layer feature articulation (MLFA) module focuses on integrating the contextual relationships between local regions to extract global features. In particular

these local regions are seen as distinct nodes

and global features are acquired by using convolution and jump fusion. The MLFA module uses the long-range dependencies between multiple layers to reason about the global shape required for the network. Furthermore

two network architectures (multidimensional multi-layer feature classification network (MM-Net-C) and multidimensional multi-layer feature segmentation network (MM-Net-S)) for point cloud classification and segmentation tasks are designed in this paper. In detail

MM-Net-C goes through three tandem MDCF modules with three layers of interlinked local shape features. The global features are then obtained by connecting and integrating the correlations between each local region through the MLFA module. In MM-Net-S

after processing by the MLFA module

the object data are encoded global feature vector with 1 024 dimensions. Then

the features are summed to obtain shapes that fuse local and global information

so that they are linked to the labels of the objects (e.g.

motorbikes

cars). This process is followed by feature propagation

where successive up sampling operations are performed to recover the details in the original object data and to obtain a robust point-wise vector. Finally

the outputs of the different feature propagation layers are integrated and fed into the convolution operation. The features are transformed to obtain an accurate prediction of each point cloud within the object.

Result

The method in this paper is adequately tested on the publicly available ModelNet40 dataset and ShapeNet dataset. The experimental results are compared with various methods. In the ModelNet40 dataset

MM-Net-C is compared with several pnt-based (input point cloud coordinates only)

such as dynamic graph convolutional neural network(DGCNN) (92.2%) with 1.9% accuracy improvement and relation-shape convolutional neural network(RS-CNN) (93.6%) with 0.5% accuracy improvement. MM-Net-C is also compared with several pnt-nor (coordinates and normal vectors of the input point cloud) based: point attention transformers(PAT) (91.7%) improves accuracy by 2.4%; PointConv (92.5%) improves accuracy by 1.6%; PointASNL (93.2%) improves accuracy by 0.9%. Even when several studies input more points for training

MM-Net-C still outperforms them. For example

PointNet++ (5 k

91.9%) improves accuracy by 2.2%

and self-organizing network(SO-Net) (5 k

93.4%) improves accuracy by 0.7%. In addition

MM-Net-C achieves higher accuracy rates than other studies with less complexity. For example

compared with PointCNN (8.20 M

91.7%)

MM-Net-C has less than one-eighth of the number of parameters while the accuracy rate is increased by 2.4%. Compared with RS-CNN (1.41 M

93.6%)

MM-Net-C has 0.33 M fewer parameters while the accuracy rate is increased by 0.5%. In the ShapeNet dataset

MM-Net-S compared with DGCNN (85.1%)

the accuracy is improved by 1.4%; compared with shape-oriented convolutional neural network(SO-CNN) (85.7%)

the accuracy is improved by 0.8%; and compared with annularly convolutional neural networks(A-CNN) (86.1%)

the accuracy is improved by 0.4%. Ablation experiments are also conducted on the ModelNet40 dataset to confirm the effectiveness of the MM-Net architecture. The ablation experiments results validate the need for the MDCF module and MLFA module design. The results further confirm that MDCF module

which uses rich point-wise features modified and fused with potential local features

can effectively improve the network's mining of shape information within a local region. By contrast

the MLFA module captures contextual information at the global scale and reinforces the long-range dependency links that exist between different layers

effectively enhancing the robustness of the model in dealing with complex shapes. Ablation experiments are conducted on whether the MDCF needs to be designed with different dimensions. The experimental results demonstrate that MM-Net performs better than RS-CNN for the same dimensionality.

Conclusion

In this paper

an MM-Net with MDCF module and MLFA module as core components is proposed. After conducting sufficient experiments

thorough comparisons and verifying MM-Net

a higher correct rate is achieved with the advantage of fewer parameters.

关键词

Keywords

references

Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q X, Li Z M, Savarese S, Savva M, Song S R, Su H, Xiao J X, Yi L and Yu F. 2015. ShapeNet: an information-rich 3D model repository. [EB/OL]. [2021-06-23] . https://arxiv.org/pdf/1512.03012.pdf https://arxiv.org/pdf/1512.03012.pdf arXiv

Fujiwara K and Hashimoto T. 2020. Neural implicit embedding for point cloud analysis//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11731-11740[ DOI: 10.1109/CVPR42600.2020.01175 http://dx.doi.org/10.1109/CVPR42600.2020.01175 ]

Hua B S, Tran M K and Yeung S K. 2018. Pointwise convolutional neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 984-993[ DOI: 10.1109/CVPR.2018.00109 http://dx.doi.org/10.1109/CVPR.2018.00109 ]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2264-2269[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Kamousi P, Lazard S, Maheshwari A and Wuhrer S. 2016. Analysis of farthest point sampling for approximating geodesics in a graph. Computational Geometry, 57: 1-7[DOI: 10.1016/j.comgeo.2016.05.005]

Komarichev A, Zhong Z C and Hua J. 2019. A-CNN: annularly convolutional neural networks on point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7413-7422[ DOI: 10.1109/CVPR.2019.00760 http://dx.doi.org/10.1109/CVPR.2019.00760 ]

Li J X, Chen B M and Lee G H. 2018a. SO-Net: self-organizing network for point cloud analysis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9397-9406[ DOI: 10.1109/CVPR.2018.00979 http://dx.doi.org/10.1109/CVPR.2018.00979 ]

Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018b. PointCNN: convolution on X -transformed points//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc., 820-830

Liu X H, Han Z Z, Liu Y S and Zwicker M. 2019a. Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 8778-8785[DOI: 10.1609/aaai.v33i01.33018778]

Liu Y C, Fan B, Meng G F, Lu J W, Xiang S M and Pan C H. 2019b. DensePoint: learning densely contextual representation for efficient point cloud processing//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5238-5247[ DOI: 10.1109/ICCV.2019.00534 http://dx.doi.org/10.1109/ICCV.2019.00534 ]

Liu Y C, Fan B, Xiang S M and Pan C H. 2019c. Relation-shape convolutional neural network for point cloud analysis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8887-8896[ DOI: 10.1109/CVPR.2019.00910 http://dx.doi.org/10.1109/CVPR.2019.00910 ]

Liu Z, Zhou S B, Suo C Z, Yin P, Chen W, Wang H S, Li H A and Liu Y H. 2019d. LPD-Net: 3D point cloud learning for large-scale place recognition and environment analysis//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 2831-2840[ DOI: 10.1109/ICCV.2019.00292 http://dx.doi.org/10.1109/ICCV.2019.00292 ]

Qi C R, Liu W, Wu C X, Su H and Guibas L J. 2018. Frustum PointNets for 3D object detection from RGB-D data//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE:918-927[ DOI: 10.1109/CVPR.2018.00102 http://dx.doi.org/10.1109/CVPR.2018.00102 ]

Qi C R, Su H, Mo K, and Guibas L J. 2017a. Pointnet: Deep learning on point sets for 3 d classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Honolulu, USA: IEEE: 77-85

Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5105-5114

Shi S S, Wang X G and Li H S. 2019. PointRCNN: 3D object proposal generation and detection from point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 770-779[ DOI: 10.1109/CVPR.2019.00086 http://dx.doi.org/10.1109/CVPR.2019.00086 ]

Simonovsky M and Komodakis N. 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 29-38[ DOI: 10.1109/CVPR.2017.11 http://dx.doi.org/10.1109/CVPR.2017.11 ]

Wang C, Samari B and Siddiqi K. 2018a. Local spectral graph convolution for point set feature learning//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 56-71[ DOI: 10.1007/978-3-030-01225-0_4 http://dx.doi.org/10.1007/978-3-030-01225-0_4 ]

Wang X L, Girshick R, Gupta A and He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[ DOI: 10.1109/CVPR.2018.00813 http://dx.doi.org/10.1109/CVPR.2018.00813 ]

Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M and Solomon J M. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): #146[DOI: 10.1145/3326362]

Wu W X, Qi Z G and Li F X. 2019. Point Conv: deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9613-9622[ DOI: 10.1109/CVPR.2019.00985 http://dx.doi.org/10.1109/CVPR.2019.00985 ]

Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920[ DOI: 10.1109/CVPR.2015.7298801 http://dx.doi.org/10.1109/CVPR.2015.7298801 ]

Xu Y F, Fan T Q, Xu M Y, Zeng L and Qiao Y. 2018. SpiderCNN: deep learning on point sets with parameterized convolutional filters//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 90-105[ DOI: 10.1007/978-3-030-01237-3_6 http://dx.doi.org/10.1007/978-3-030-01237-3_6 ]

Yan X, Zheng C D, Li Z, Wang S and Cui S G. 2020. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5588-5597[ DOI: 10.1109/CVPR42600.2020.00563 http://dx.doi.org/10.1109/CVPR42600.2020.00563 ]

Yang J C, Zhang Q, Ni B B, Li L G, Liu J X, Zhou M D and Tian Q. 2019. Modeling point clouds with self-attention and gumbel subset sampling//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3318-3327[ DOI: 10.1109/CVPR.2019.00344 http://dx.doi.org/10.1109/CVPR.2019.00344 ]

Zeiler M D and Fergus R. 2014. Visualizing and understanding convolutional networks//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 818-833[ DOI: 10.1007/978-3-319-10590-1_53 http://dx.doi.org/10.1007/978-3-319-10590-1_53 ]

Zhang C Y, Song Y, Yao L N and Cai W D. 2020. Shape-oriented convolution neural network for point cloud analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 12773-12780[DOI: 10.1609/aaai.v34i07.6972]

Zhu Y K, Mottaghi R, Kolve E, Lim J J, Gupta A, Li F F and Farhadi A. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning//Proceedings of 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore, Singapore: IEEE, 3357-3364[ DOI: 10.1109/ICRA.2017.7989381 http://dx.doi.org/10.1109/ICRA.2017.7989381 ]