Current Issue Cover
融合逆密度函数与关系形状卷积神经网络的点云分析

邱云飞1, 朱梦影1,2(1.辽宁工程技术大学软件学院, 葫芦岛 125100;2.中国科学院海西研究院泉州装备制造研究所, 泉州 362216)

摘 要
目的 3D点云与以规则的密集网格表示的图像不同,不仅不规则且无序,而且由于输入输出大小和顺序差异,具有密度不均匀以及形状和缩放比例存在差异的特性。为此,提出一种对3D点云进行卷积的方法,将关系形状卷积神经网络(relation-shape convolution neural network,RSCNN)与逆密度函数相结合,并在卷积网络中增添反卷积层,实现了点云更精确的分类分割效果。方法 在关系形状卷积神经网络中,将卷积核视为由权重函数和逆密度函数组成的3D点局部坐标的非线性函数。对给定的点,权重函数通过多层感知器网络学习,逆密度函数通过核密度估计(kernel density estimation,KDE)学习,逆密度函数的引入对点云采样率不均匀的情况进行弥补。在点云分割任务中,引入由插值和关系形状卷积层两部分组成的反卷积层,将特征从子采样点云传播回原始分辨率。结果 在ModelNet40、ShapeNet、ScanNet数据集上进行分类、部分分割和语义场景分割实验,验证模型的分类分割性能。在分类实验中,与PointNet++相比,整体精度提升3.1%,在PointNet++将法线也作为输入的情况下,精度依然提升了1.9%;在部分分割实验中,类平均交并比(mean intersection over union,mIoU)比PointNet++在法线作为输入情况下高6.0%,实例mIoU比PointNet++高1.4%;在语义场景分割实验中,mIoU比PointNet++高13.7%。在ScanNet数据集上进行不同步长有无逆密度函数的对比实验,实验证明逆密度函数将分割精度提升0.8%左右,有效提升了模型性能。结论 融合逆密度函数的关系形状卷积神经网络可以有效获取点云数据中的局部和全局特征,并对点云采样不均匀的情况实现一定程度的补偿,实现更优的分类和分割效果。
关键词
Point cloud analysis model combining inverse density function and relation-shape convolution neural network

Qiu Yunfei1, Zhu Mengying1,2(1.College of Software, Liaoning Technical University, Huludao 125100, China;2.Quanzhou Institute of Equipment Manufacturing Haixi Institutes, Chinese Academy of Sciences, Quanzhou 362216, China)

Abstract
Objective 3D point clouds have received widespread attention for their wide range of applications, such as robotics, autonomous driving, and virtual reality applications. However, due to the characteristics of input and output size and order differences, uneven density, and differences in shape and scaling, point cloud processing is very challenging, and because of this, various shapes formed by irregular points are often difficult to distinguish. For this problem, sufficient contextual semantic information must be captured to thoroughly grasp the elusive shapes. In 2D images, convolutional neural networks (CNN) have fundamentally changed the landscape of computer vision by greatly improving the results of almost all vision tasks. CNN succeeds by using translation invariance, so the same set of convolution filters can be applied to all positions in the image, thereby reducing the number of parameters and improving the generalization ability. We hope to transfer these successes to 3D point cloud analysis. However, a 3D point cloud is an unordered set of 3D points, each with or without additional features (such as RGB), which does not conform to the regular lattice grid in a 2D image. Applying conventional CNNs to such unordered inputs is difficult. Some works convert point clouds into regular voxels or multi-view images. However, these conversions usually cause a large amount of inherent geometric information to be lost, and the amount of data in the 3D point cloud is huge, increasing the complexity of the conversion. Another solution is to learn directly from an irregular point cloud. PointNet learns each point independently, and then applies a symmetric function to accumulate features. This direction can realize the invariance of point replacement. Although impressive, it ignores local patterns that have proven to be important for advanced visual semantics in abstract image CNNs. To correct this, KCNet mines local patterns by creating KNN (K-nearest neighbor) graphs at each point of PointNet. However, it does not have a pooling layer that can explicitly increase the semantic level. PointNet++ hierarchical groups point clouds into local subsets and learn them through PointNet. This design does work similar to CNN, but the basic operation of PointNet requires high complexity to achieve sufficient effectiveness, which usually results in huge computational costs. Method To solve this problem, some works divide the point cloud into several subsets by sampling. Then, a hierarchical structure is constructed to learn the context representation from local to global. However, this greatly depends on effective inductive learning of local subsets, which is difficult to achieve for point clouds with uneven density and irregular shape. Inspired by inverse density function in point cloud networks, we propose a relational shape CNN (RSCNN) fused with inverse density function. The key of the network is to learn from the relationship, that is, to learn from the geometric topology constraints between the points, which encode meaningful shape information in the 3D point cloud, and the convolution kernel is regarded as a nonlinear function of the local coordinates of 3D points composed of 3D points. For a given point, predefined geometric prior convolution weights are used in advanced relational expressions, and kernel density estimation is used to study inverse density functions. The introduction of the inverse density function is further applied to non-uniformly sampled point clouds. In addition, for the segmentation task, we propose a deconvolution operator. Relation-shape deconvolution layer(RSDeconv) consists of two parts:interpolation and Relation-shape convolution layer(RSConv). The feature is propagated from the sampling point cloud to the original resolution through convolution. The spatial layout of the points can be used to calculate the induction through convolution, and it can reflect the potential shape formed by irregular points differently, so as to realize the context shape awareness learning of point cloud analysis. Benefit from the geometric prioris, the invariance to point displacement and the robustness to rigid transformations (such as translation and rotation) are realized. Result Classification, partial segmentation, and semantic scene segmentation experiments were conducted on ModelNet40, ShapeNet, and ScanNet datasets to verify the classification and segmentation performance of the model. In the classification experiment of ModelNet40, its overall accuracy is improved by 3.1% compared with PointNet++. Even when PointNet++ takes normal as input, its accuracy is improved by 1.9% compared with PointNet++. In the ShapeNet partially segmented dataset, the mean intersection-over-union (mIoU) is 6.0% higher than that of PointNet++, and the instance mIoU is 1.4% higher than that of PointNet++. In the ScanNet indoor scene dataset, the mIoU is 13.7% higher than that of PointNet++. A comparison experiment is conducted on the ScanNet dataset to compare the unsynchronized length with and without density function. The experiment proves that the inverse density function improves the segmentation accuracy by about 0.8% and effectively improves the model performance. Conclusion The experimental results show that the heterogeneous sampling of point clouds can be realized by introducing the inverse density function into the relational CNN. In addition, the deconvolution layer further propagates the feature from the subsampling point cloud back to the original resolution, which improves the segmentation accuracy. In general, the proposed network model can effectively obtain the global and local characteristics of point cloud data, achieving better classification and segmentation effect.
Keywords

订阅号|日报