|
发布时间: 2021-04-16 |
图像分析和识别 |
|
|
收稿日期: 2020-06-16; 修回日期: 2020-08-13; 预印本日期: 2020-08-20
作者简介:
邱云飞, 1976年生, 男, 教授, 主要研究方向为数据挖掘、机器学习、智能数据处理。E-mail: 7415575@qq.com
朱梦影, 通信作者, 女, 硕士研究生, 主要研究方向为点云数据处理。E-mail: 3066964438@qq.com *通信作者: 朱梦影 E-mail: 3066964438@qq.com
中图法分类号: TP391
文献标识码: A
文章编号: 1006-8961(2021)04-0898-12
|
摘要
目的 3D点云与以规则的密集网格表示的图像不同,不仅不规则且无序,而且由于输入输出大小和顺序差异,具有密度不均匀以及形状和缩放比例存在差异的特性。为此,提出一种对3D点云进行卷积的方法,将关系形状卷积神经网络(relation-shape convolution neural network,RSCNN)与逆密度函数相结合,并在卷积网络中增添反卷积层,实现了点云更精确的分类分割效果。方法 在关系形状卷积神经网络中,将卷积核视为由权重函数和逆密度函数组成的3D点局部坐标的非线性函数。对给定的点,权重函数通过多层感知器网络学习,逆密度函数通过核密度估计(kernel density estimation,KDE)学习,逆密度函数的引入对点云采样率不均匀的情况进行弥补。在点云分割任务中,引入由插值和关系形状卷积层两部分组成的反卷积层,将特征从子采样点云传播回原始分辨率。结果 在ModelNet40、ShapeNet、ScanNet数据集上进行分类、部分分割和语义场景分割实验,验证模型的分类分割性能。在分类实验中,与PointNet++相比,整体精度提升3.1%,在PointNet++将法线也作为输入的情况下,精度依然提升了1.9%;在部分分割实验中,类平均交并比(mean intersection over union,mIoU)比PointNet++在法线作为输入情况下高6.0%,实例mIoU比PointNet++高1.4%;在语义场景分割实验中,mIoU比PointNet++高13.7%。在ScanNet数据集上进行不同步长有无逆密度函数的对比实验,实验证明逆密度函数将分割精度提升0.8%左右,有效提升了模型性能。结论 融合逆密度函数的关系形状卷积神经网络可以有效获取点云数据中的局部和全局特征,并对点云采样不均匀的情况实现一定程度的补偿,实现更优的分类和分割效果。
关键词
关系形状卷积神经网络(RSCNN); 逆密度函数; 非均匀采样; 反卷积层; 点云的分类与分割
Abstract
Objective 3D point clouds have received widespread attention for their wide range of applications, such as robotics, autonomous driving, and virtual reality applications. However, due to the characteristics of input and output size and order differences, uneven density, and differences in shape and scaling, point cloud processing is very challenging, and because of this, various shapes formed by irregular points are often difficult to distinguish. For this problem, sufficient contextual semantic information must be captured to thoroughly grasp the elusive shapes. In 2D images, convolutional neural networks (CNN) have fundamentally changed the landscape of computer vision by greatly improving the results of almost all vision tasks. CNN succeeds by using translation invariance, so the same set of convolution filters can be applied to all positions in the image, thereby reducing the number of parameters and improving the generalization ability. We hope to transfer these successes to 3D point cloud analysis. However, a 3D point cloud is an unordered set of 3D points, each with or without additional features (such as RGB), which does not conform to the regular lattice grid in a 2D image. Applying conventional CNNs to such unordered inputs is difficult. Some works convert point clouds into regular voxels or multi-view images. However, these conversions usually cause a large amount of inherent geometric information to be lost, and the amount of data in the 3D point cloud is huge, increasing the complexity of the conversion. Another solution is to learn directly from an irregular point cloud. PointNet learns each point independently, and then applies a symmetric function to accumulate features. This direction can realize the invariance of point replacement. Although impressive, it ignores local patterns that have proven to be important for advanced visual semantics in abstract image CNNs. To correct this, KCNet mines local patterns by creating KNN (K-nearest neighbor) graphs at each point of PointNet. However, it does not have a pooling layer that can explicitly increase the semantic level. PointNet++ hierarchical groups point clouds into local subsets and learn them through PointNet. This design does work similar to CNN, but the basic operation of PointNet requires high complexity to achieve sufficient effectiveness, which usually results in huge computational costs. Method To solve this problem, some works divide the point cloud into several subsets by sampling. Then, a hierarchical structure is constructed to learn the context representation from local to global. However, this greatly depends on effective inductive learning of local subsets, which is difficult to achieve for point clouds with uneven density and irregular shape. Inspired by inverse density function in point cloud networks, we propose a relational shape CNN (RSCNN) fused with inverse density function. The key of the network is to learn from the relationship, that is, to learn from the geometric topology constraints between the points, which encode meaningful shape information in the 3D point cloud, and the convolution kernel is regarded as a nonlinear function of the local coordinates of 3D points composed of 3D points. For a given point, predefined geometric prior convolution weights are used in advanced relational expressions, and kernel density estimation is used to study inverse density functions. The introduction of the inverse density function is further applied to non-uniformly sampled point clouds. In addition, for the segmentation task, we propose a deconvolution operator. Relation-shape deconvolution layer(RSDeconv) consists of two parts: interpolation and Relation-shape convolution layer(RSConv). The feature is propagated from the sampling point cloud to the original resolution through convolution. The spatial layout of the points can be used to calculate the induction through convolution, and it can reflect the potential shape formed by irregular points differently, so as to realize the context shape awareness learning of point cloud analysis. Benefit from the geometric prioris, the invariance to point displacement and the robustness to rigid transformations (such as translation and rotation) are realized. Result Classification, partial segmentation, and semantic scene segmentation experiments were conducted on ModelNet40, ShapeNet, and ScanNet datasets to verify the classification and segmentation performance of the model. In the classification experiment of ModelNet40, its overall accuracy is improved by 3.1% compared with PointNet++. Even when PointNet++ takes normal as input, its accuracy is improved by 1.9% compared with PointNet++. In the ShapeNet partially segmented dataset, the mean intersection-over-union (mIoU) is 6.0% higher than that of PointNet++, and the instance mIoU is 1.4% higher than that of PointNet++. In the ScanNet indoor scene dataset, the mIoU is 13.7% higher than that of PointNet++. A comparison experiment is conducted on the ScanNet dataset to compare the unsynchronized length with and without density function. The experiment proves that the inverse density function improves the segmentation accuracy by about 0.8% and effectively improves the model performance. Conclusion The experimental results show that the heterogeneous sampling of point clouds can be realized by introducing the inverse density function into the relational CNN. In addition, the deconvolution layer further propagates the feature from the subsampling point cloud back to the original resolution, which improves the segmentation accuracy. In general, the proposed network model can effectively obtain the global and local characteristics of point cloud data, achieving better classification and segmentation effect.
Key words
relation-shape convolutional neural network(RSCNN); inverse density function; non-uniform sampling; deconvolution layer; classification and segmentation of point cloud
0 引言
随着3D点云在自动驾驶(Qi等,2018)和导航定位(Zhu等,2017;文威威等,2019)等诸多领域的应用,3D点云引起了越来越多的关注。由于很难推断不规则点云形成的基本形状,此项工作非常具有挑战性,为此提出了很多方法,主要有基于体素和基于视图的方法。
鉴于卷积神经网络(convolution neural network,CNN)在图像识别领域强大的语义信息抽象能力(Chen等,2018;黄凯奇等,2014),很多方法将其在图像分析即规则网格数据上的处理用于不规则点云处理。为使点数据适用于卷积,一种简单的方法是将其体素化为3D网格结构(Maturana和Scherer,2015;Wu等,2015),但是由于大多数体素通常未能占用,因此体素的表示方式效率不高。为此,Riegler等人(2017)提出OctNet,探索了体素数据的稀疏性并缓解了这个问题,但是当涉及更深的神经网络时,内存占用仍然很高,而且由于体素是空间的离散表示,因此该方法需要高分辨率的网格,对存储器容量提出了较高要求。多视图(Qi等,2016;Su等,2015)是另一种常见的3D表示形式,多视图将点数据投影到3D空间中的各种特定图像平面上形成2D图像,通过这种方式可以在2D图像上使用常规卷积处理点数据,但是这种方法忽略了3D点的内在几何关系,投影导致的3D数据中的遮挡部分未得到处理,并且由于点云数据密度分布的不均匀性,这些方法无法充分识别点云细粒度的局部结构。Qi等人(2017a)提出的PointNet通过在每个点上独立学习并通过最大池收集最终特征,然而这种设计同样忽略了局部结构信息。事实证明,局部结构信息对于CNN的成功至关重要。为了解决这个问题,Qi等人(2017b)提出PointNet++将PointNet分层应用到点云的多个子集,使其具有可扩展性,然而该方法是在点周围半径内选择固定数量的随机样本,在计算中没有考虑点密度,因此导致误差。Shen等人(2018)提出使用内核相关层来开发局部几何结构,并提出了一个基于图的池化层,以利用局部特征结构来增强网络的鲁棒性。在小规模点集上表现出良好的性能,但对底层的图结构仍然特别敏感。大多数卷积方法无论是否使用层次结构都是基于
以上方法很大程度上解决的只是点云积分的不规则性问题。对此,有些研究在解决点云积分不规则性的同时考虑了点云采样密度不均匀问题。Hermosilla等人(2018)利用蒙特卡罗卷积解决点云上的采样不均匀问题,但包含多个点的接收域计算速度较慢,效率低。Wu等人(2019)提出的PointConv将卷积核视为由权函数和密度函数组成的3维点的局部坐标的非线性函数。Li等人(2019)设计了一个密度感知卷积模块,利用每个点的点态密度加权卷积核的可学习权重。
关系卷积神经网络是从关系中学习,关系即点之间的几何拓扑约束。受密度函数的启发,本文提出将关系形状卷积神经网络(relation-shape convolution neural network,RSCNN)(Liu等,2019)与逆密度函数相融合,得到点之间的高级关系表达式,以此捕获点的空间布局,并逐步执行上下文局部到全局形状的特征提取。通过学习点之间几何关系的方法,对刚性变换实现了很好的鲁棒性。逆密度函数通过核密度估计(kernel density estimation,KDE)来学习,目的是对点云采样率不均匀情况进行补偿,实现对点云进行非均匀采样。同时在点云分割任务中,引入关系卷积神经网络的反卷积层(Deconv),该层更好地将特征从子采样点云传播回原始分辨率。实验证明,本文方法对点云分割精确度的提升起到了很好的作用。
1 算法流程
本文提出一种对3D点云进行卷积的方法,能够对点云采样率不均匀的情况进行一定程度的补偿,并且使用逆密度函数对多层感知器学习的连续函数进行加权。在分割任务中,添加一个基于关系形状卷积神经网络的反卷积层,使用反卷积层将特征信息从较粗层传播到较细的层,实现了改进语义分割任务的性能。本文算法流程如图 1所示,图中,
2 原理与方法
融合逆密度函数的关系形状卷积神经网络从3D空间中的几何先验学习高级关系表达式,并执行上下文局部到全局的形状学习。
2.1 关系学习
从关系中获取与数据相关权重的方法在图像和视频分析领域已经有所应用。空间变换器(Jaderberg等,2015)通过学习过渡矩阵对齐2D图像,非局域网(Wang等,2018)学习跨视频帧的长期关系,关系网络(Hu等,2018)学习跨对象的位置关系。还有一些研究专注于对3D点云中的关系进行学习,DGCNN(Wang等,2019)通过学习高维特征空间中的点关系来捕获相似的局部形状,但这种关系在某些情况下可能不可靠。
2.2 关系形状卷积
卷积运算可以看做是连续卷积算子的离散逼近。在3D空间中,可以将此卷积运算符的权重视为相对于参考3D点的局部点坐标的连续函数,连续功能可以通过多层感知器进行近似。在关系形状卷积神经网络中,通过将采样点
关系形状卷积(RSConv)将规则网格CNN扩展到不规则配置,并最终学习点云的上下文形状感知表示,但是非常依赖从不规则点子集中进行形状感知的归纳学习。对此,关系形状卷积网络将局部点子集
$ \begin{array}{l} {\mathit{\boldsymbol{f}}_{{p_{{\rm{sub }}}}}} = \sigma \left({A\left({\left\{ {T\left({{\mathit{\boldsymbol{f}}_{{x_j}}}} \right), \forall {x_j}} \right\}} \right)} \right)\\ \;\;\;\;\;\;{d_{ij}} < r, \forall {x_j} \in \mathit{\boldsymbol{N}}\left({{x_i}} \right) \end{array} $ | (1) |
式中,
2.3 权重函数
在经典CNN中,
在3D空间的邻域中,
$ T\left({{\mathit{\boldsymbol{f}}_{{x_j}}}} \right) = {w_{ij}} \cdot {\mathit{\boldsymbol{f}}_{{x_j}}} = M\left({{\mathit{\boldsymbol{h}}_{ij}}} \right) \cdot {\mathit{\boldsymbol{f}}_{{x_j}}} $ | (2) |
映射
$ {\mathit{\boldsymbol{f}}_{{p_{{\rm{sub }}}}}} = \sigma \left({A\left({\left\{ {M\left({{\mathit{\boldsymbol{h}}_{ij}}} \right) \cdot {\mathit{\boldsymbol{f}}_{{x_j}}}, \forall {x_j}} \right\}} \right)} \right) $ | (3) |
这种卷积表示包含
3 逆密度函数
从点云空间分布的角度出发,3维点的不规则结构会导致某些区域点集的过采样或欠采样,从而导致卷积核上的密度不同。在2D图像中,这种视差类比为1个像素接收多个值,当更新内核时就会产生偏差,这是因为在过采样区域设置了过多的权重,在欠采样区域设置了较少的权重。在数字化图像中,像素均匀地放置在2维网格上,相邻像素之间的距离是恒定的,网格中的每个单元格都只有1个值,并且没有空的单元格,整个栅格密度均匀,但这并不适用于不规则的3维点云。为了克服这种由于点云不规则带来的不平衡,同时考虑到本文的基础网络——关系形状卷积神经网络更加适用均匀采样的点云而没有考虑采样率不均匀的情况,在关系形状卷积神经网络中引入逆密度函数,利用局部区域的密度估计来重新缩放核函数,对非均匀采样的点进行补偿。由此,卷积公式变换为
$ {\mathit{\boldsymbol{f}}_{{p_{{\rm{sub}}}}}} = \mathit{\boldsymbol{\sigma }}\left({A\left({\left\{ {S\left({{x_i}, {x_j}} \right) \cdot M\left({{\mathit{\boldsymbol{h}}_{ij}}} \right) \cdot {\mathit{\boldsymbol{f}}_{{x_j}}}, \forall {x_j}} \right\}} \right)} \right) $ | (4) |
式中,
本文的主要思想是通过对3D点的坐标进行多层感知器的高层映射和通过核密度估计的逆密度函数的点乘来近似权重函数
为了计算逆密度函数
$ S{\left({{x_i}, {x_j}} \right)_{{\rm{KDE}}}} = \frac{1}{{Nh}}\sum\limits_{{x_j} \in N\left({{x_i}} \right)} G \left({\frac{{{x_i} - {x_j}}}{h}} \right) $ | (5) |
式中,
$ S\left({{x_i}, {x_j}} \right) = \frac{1}{{MLP\left({S{{\left({{x_i}, {x_j}} \right)}_{{\rm{KDE}}}}} \right)}} $ | (6) |
式中,
本文没有使用绝对距离,而是使用从相邻点
4 反卷积的特征传播
对于分割任务需要进行逐点预测,为了获得所有输入点的特征,需要一种将特征从子采样点云传播到更密集点云的方法。PointNet++使用基于距离的插值来传播特征,主要用于从粗糙级别捕获传播信息的局部相关性,但是并没有充分利用反卷积的操作。对此,本文在关系形状卷积神经网络的最后一层添加一个由插值和关系形状卷积两部分组成的反卷积层,如图 6所示。
5 点云模型分析
融合逆密度函数的关系形状卷积神经网络可以开发出类似经典CNN的分层形状感知学习架构用于点云分析,表示为
$ \mathit{\boldsymbol{F}}_{{P_{{N_l}}}}^l = RSCONV\left({\mathit{\boldsymbol{F}}_{{p_{{N_{l - 1}}}}}^{l - 1}} \right) $ | (7) |
式中,
6 实验结果与分析
6.1 实验配置
在实验中,本文提出的融合逆密度函数的关系形状卷积神经网络的聚合函数使用具有对称性质的最大池化层,ReLU(rectified linear unit)用做非线性激活因子。对映射函数
为验证本文模型的有效性,在ModelNet40(Yi等,2016)、ShapeNet(Wu等,2015)和ScanNet(Dai等,2017)数据集上进行实验。实验环境为Ubuntu14.04操作系统、GTX1080Ti显卡、GPU显卡、Python3.0,基于Pytorch框架。
6.2 点云分类
6.2.1 数据集
ModelNet40数据集由40个种类的12 311个CAD模型组成,包含9 843个训练模型和2 468个测试模型。在采样率不均匀的情况下随机采样1 024个点并将其标准化为单位球体。在训练过程中,使用[-0.66,1.5]范围内的随机缩放和[-0.2,0.2]范围内的平移来扩充输入数据。为防止过拟合,在全连接层中使用丢弃技术,dropout = 0.5。在测试过程中,执行10次具有随机缩放比例的投票测试,并对预测结果取平均。
6.2.2 分类结果
表 1是本文方法与PointNet(Qi等,2017)、PointNet++(Qi等,2017)、PointCNN(Li等,2018)、DGCNN(Wang等,2019)、PCNN(Atzmon等,2018)、RSCNN(Liu等,2019)和SpiderCNN(Xu等,2018)方法在ModelNet40数据集上点云分类性能的定量比较。可以看出,本文提出的融合逆密度函数的关系形状卷积神经网络优于所有
表 1
不同方法在ModelNet40数据集上的分类精确度对比
Table 1
Comparison of classification accuracy on ModelNet40 dataset among different methods
方法 | 输入 | 采样点数 | 精确度/% |
PointNet(Qi等,2017) | 点坐标 | 1 024 | 86.3 |
PointNet++(Qi等,2017) | 点坐标 | 1 024 | 90.1 |
PointCNN(Li等,2018) | 点坐标 | 1 024 | 91.1 |
DGCNN(Wang等,2019) | 点坐标 | 1 024 | 91.8 |
PCNN(Atzmon等,2018) | 点坐标 | 1 024 | 91.5 |
RSCNN(Liu等,2019) | 点坐标 | 1 024 | 92.6 |
PointNet++(Qi等,2017) | 点坐标+法线 | 5 120 | 91.3 |
SpiderCNN(Xu等,2018) | 点坐标+法线 | 5 120 | 91.7 |
本文 | 点坐标 | 1 024 | 93.2 |
注:加粗字体表示最优结果。 |
6.3 点云分割
3D对象零件分割是一项具有挑战性的细粒度3D识别任务。给定3D网格模型后,任务是为每个点或每个面分配零件类别(例如椅子腿、杯柄等)标签。
6.3.1 数据集
ShapeNet数据集包含来自16个类别的50个零件的16 881个形状。任务的输入是由点云表示的形状,本文将零件细分问题转化为按点分类问题,评估指标按点计算。将网络模型应用在ShapeNet数据集的部件基准上,并进行评估零件分割任务,不均匀地随机选择2 048个点作为输入,并将对象标签的一键编码连接到最后一个特征层。在测试期间,依然使用随机缩放比例进行10次投票测试,最后对预测结果取平均。
6.3.2 分割结果
表 2是本文方法与PointNet、PCNN、DGCNN、RSCNN和PointNet++方法在ShapeNet数据集上点云分割性能的定量比较。实验使用平均交并比(mean intersection over union,mIoU)作为评估指标,分别计算了在所有类别和所有实例中两种类型的平均交并比。可以看出,本文方法的类mIoU和实例mIoU分别为87.9%和86.5%,实现了最佳性能,不仅超过了RSCNN的验证结果,而且在PointNet++高级版本输入情况下,结果也得到了较高提升。实验结果表明,逆密度函数能够较好地实现对不均匀采样的点云密度补偿,反卷积层的引入也进一步提升了点云分割对各种形状结构的鲁棒性。
表 2
不同方法在ShapeNet数据集上的分割结果对比
Table 2
Comparison of segmentation results on ShapeNet dataset among different methods
方法 | 输入(采样点数) | 类平均交并比/% | 实例平均交并比/% |
PointNet | 点坐标(2 048) | 80.4 | 83.7 |
PCNN | 点坐标(2 048) | 81.8 | 85.1 |
DGCNN | 点坐标(2 048) | 82.3 | 85.1 |
RSCNN | 点坐标(2 048) | 84.0 | 86.2 |
PointNet++ | 点坐标+法线(2 048) | 81.9 | 85.1 |
本文 | 点坐标(2 048) | 87.9 | 86.5 |
注:加粗字体表示各列最优结果。 |
图 9显示了本文方法在ShapeNet数据集上的一些分割示例。可以看到,尽管隐含在不规则点中的零件形状多种多样,并且可能很难识别,但是本文方法依然可以对其以相当不错的准确率进行点云分割。
6.4 法线估计
点云中的法线估计对于许多应用(例如曲面重建和渲染)是至关重要的一步,由于需要更高层次的推理,超越了潜在的形状识别,此任务非常具有挑战性。本文将法线估计(袁小翠等,2017)作为有监督的回归任务并使用细分网络实现。将归一化输出与真值法线之间的余弦损失用于回归训练,使用ModelNet40数据集进行评估,以统一采样的1 024个点作为输入。
表 3是本文方法与PointNet、PointNet++和PCNN方法在ModelNet40数据集上法线估算的定量比较。可以看出,本文方法的误差为0.14,表现优于其他方法,将PointNet++的误差减少了0.15。实验结果表明,本文方法能从点云的几何拓扑关系学习中以更低的误差实现对法线的估计,更好地对点云的法线进行预测。
表 3
不同方法在ModelNet40数据集上的法线估计误差对比
Table 3
Comparison of normal estimation errors on ModelNet40 dataset among different methods
方法 | 采样点数 | 误差 |
PointNet | 1 024 | 0.47 |
PointNet++ | 1 024 | 0.29 |
PCNN | 1 024 | 0.19 |
本文 | 1 024 | 0.14 |
注:加粗字体表示最优结果。 |
6.5 语义场景标签
为了评估本文方法处理包含大量嘈杂数据的现实点云的能力,在ScanNet数据集上进行语义场景分割。任务是在给定由点云表示的室内场景的情况下,预测每个3D点上的语义对象标签。
ScanNet数据集的最新版本包含所有1 513个ScanNet扫描和100个新测试扫描的更新注释,其中所有语义标签均不公开,算法的输入仅使用3D坐标数据和RGB信息。在实验中,通过从室内房间随机采样3 m × 1.5 m × 1.5 m立方体来生成训练样本,并在整个扫描过程中使用滑动窗口进行评估,使用平均交并比(mIoU)作为主要衡量标准。
表 4是本文方法与SPLATNet、PointNet++和ScanNet方法在ScanNet数据集上所有类别IoU的平均值(mIoU)的定量比较。可以看出,本文模型的语义分割精度比PointNet++方法提升了13.7%,相较其他算法均有较大幅度的提升。实验结果表明,本文方法能够较好地实现具有大量噪声的真实点云场景数据的语义分割,体现了处理真实点云的鲁棒性和良好的泛化能力。
表 4
不同方法在ScanNet数据集上的语义场景分割结果对比
Table 4
Comparison of semantic scene segmentation results on ScanNet dataset among different methods
/% | |||||||||||||||||||||||||||||
方法 | 类平均交并比 | ||||||||||||||||||||||||||||
ScanNet | 30.6 | ||||||||||||||||||||||||||||
PointNet++ | 33.9 | ||||||||||||||||||||||||||||
SPLATNet | 39.3 | ||||||||||||||||||||||||||||
本文 | 47.6 | ||||||||||||||||||||||||||||
注:加粗字体表示最优结果。 |
6.6 逆密度函数评估
为验证逆密度函数
表 5
在ScanNet数据集上有无逆密度函数的实验结果对比
Table 5
Comparison of experimental results with and without inverse density function on ScanNet dataset
/% | |||||||||||||||||||||||||||||
输入 | 步长 | mIoU | mIoU1 | mIoU2 | |||||||||||||||||||||||||
点坐标 | 0.5 | 61.0 | 60.2 | 60.1 | |||||||||||||||||||||||||
点坐标 | 1.0 | 59.0 | 58.2 | 57.7 | |||||||||||||||||||||||||
点坐标 | 1.5 | 58.2 | 56.9 | 57.3 | |||||||||||||||||||||||||
点坐标+RGB | 0.5 | 60.8 | 58.9 | - | |||||||||||||||||||||||||
点坐标+RGB | 1.0 | 58.6 | 56.7 | - | |||||||||||||||||||||||||
点坐标+RGB | 1.5 | 57.5 | 56.1 | - | |||||||||||||||||||||||||
注:加粗字体表示各列最优结果,“-”表示未知结果。mIoU1表示无逆密度情况,mIoU2表示有逆密度、无线性层情况。 |
6.7 鲁棒性及复杂性测试
为验证点置换和刚性变换的鲁棒性,将本文方法与PointNet和PointNet++方法进行点云的鲁棒性比较。在实验过程中,所有模型都经过训练,没有相关的数据扩充(例如平移或旋转),以避免在测试中造成混淆。此外,尽管融合逆密度函数的关系卷积神经网络中的关系学习对旋转具有鲁棒性,但3D坐标的初始输入特征对旋转的鲁棒性仍有一定影响,对此,通过将每个采样点的子集归一化为相应的局部坐标系来解决,该局部坐标系由每个采样点及其法线确定。为了公平比较,对PointNet++执行此归一化,因为这样可以从局部子集中学习。在测试中,将3D欧氏距离作为本文方法中的几何关系,测试结果如表 6所示。可以看出,所有方法对于排列都是不变的。但是,PointNet易受平移和旋转的影响,而PointNet++对旋转敏感。相比之下,本文方法对这些扰动是不变的,表明本文方法对点云平移或旋转变换之后的形状识别仍然具有较大潜力。
表 6
不同模型在点云平移或旋转后的精确度对比
Table 6
Comparison of accuracy after point cloud translation or rotation among different models
/% | |||||||||||||||||||||||||||||
方法 | 平移 | 平移 | 旋转 | 旋转 | |||||||||||||||||||||||||
0.2 | -0.2 | 90° | 180° | ||||||||||||||||||||||||||
PointNet | 70.8 | 70.6 | 42.5 | 38.6 | |||||||||||||||||||||||||
PointNet++ | 88.2 | 88.2 | 47.9 | 39.7 | |||||||||||||||||||||||||
本文 | 90.3 | 90.3 | 90.3 | 90.3 | |||||||||||||||||||||||||
注:加粗字体表示各列最优结果,旋转为沿 |
6.8 复杂度分析
表 7是在输入为1 024个点时本文方法与PointNet和PointNet++方法分类的空间(参数数量)和运行时间(浮点运算/采样)复杂度对比。可以看出,与PointNet相比,本文模型的参数减少了59.4%,采样复杂度减少了21.1%。实验结果表明,本文模型在实时应用(例如自动驾驶中的场景解析)中具有巨大的潜力。
表 7
不同模型在点云分类中的复杂度对比
Table 7
Comparison of complexity of point cloud classification among different models
/MB | |||||||||||||||||||||||||||||
方法 | 参数 | 浮点计算(采样) | |||||||||||||||||||||||||||
PointNet | 3.5 | 440 | |||||||||||||||||||||||||||
PointNet++ | 1.48 | 1 684 | |||||||||||||||||||||||||||
本文 | 1.42 | 347 | |||||||||||||||||||||||||||
注:加粗字体表示各列最优结果。 |
7 结论
本文通过引入逆密度函数对关系形状卷积神经网络(RSCNN)进行改进,以使关系形状卷积神经网络对采样率不均匀的的点云更加鲁棒,并在原网络架构的基础上添加了关系形状反卷积层,用于从粗糙级别捕获传播信息的局部相关性,同时保持关系形状卷积神经网络本身具备的排列不变性、刚性转换的鲁棒性和权重共享等关键特性,将规则网格CNN扩展到不规则配置以进行点云分析,可以从关系(即点之间的几何拓扑约束)中学习。通过对点的空间布局进行明确推理,获得有区别的形状意识。通过点云分类、部分分割和语义分割3项任务的具有挑战性的基准测试的广泛实验,以及详尽的经验和理论分析,证明融合逆密度函数的关系形状卷积神经网络相对于其他网络模型都有一定程度的提升。
但是,每个局部点子集的强制归一化一定程度上会给形状识别带来困难,另一方面融合逆密度函数的关系形状卷积网络在采样率相对均匀情况下的分类精度与原始关系卷积神经网络呈现的精度结果相当,这主要是因为ModelNet40数据集在整体上呈现均匀状态,尽管局部存在非均匀情况,但实际使用的数据集出现不均匀情况是比较常见的,这也是本文网络今后的改进点。下一步工作是将原始坐标和几何特征融合到同一特征空间,进一步提升点云识别精度。
参考文献
-
Atzmon M, Maron H, Lipman Y. 2018. Point convolutional neural networks by extension operators. ACM Transactions on Graphics, 37(4): #71
-
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI:10.1109/TPAMI.2017.2699184]
-
Dai A, Chang A X, Savva M, Halber M, Funkhouser T and Nieβner M. 2017. ScanNet: richly-annotated 3D reconstructions of indoor scenes//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 2432-2443[DOI: 10.1109/CVPR.2017.261]
-
Hermosilla P, Ritschel T, Vázquez P P, Vinacua À, Ropinski T. 2018. Monte Carlo convolution for learning on non-uniformly sampled point clouds. ACM Transactions on Graphics, 37(6): #235 [DOI:10.1145/3272127.3275110]
-
Hu H, Gu J Y, Zhang Z, Dai J F and Wei Y C. 2018. Relation networks for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3588-3597[DOI: 10.1109/CVPR.2018.00378]
-
Huang K Q, Ren W Q, Tan T N. 2014. A review on image object classification and detection. Chinese Journal of Computers, 37(6): 1225-1240 (黄凯奇, 任伟强, 谭铁牛. 2014. 图像物体分类与检测算法综述. 计算机学报, 37(6): 1225-1240) [DOI:10.3724/SP.J.1016.2014.01225]
-
Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2015. Spatial transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2017-2025
-
Jampani V, Kiefel M and Gehler P V. 2016. Learning sparse high dimensional filters: image filtering, dense CRFs and bilateral neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 4452-4461[DOI: 10.1109/CVPR.2016.482]
-
Klokov R and Lempitsky V. 2017. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 863-872[DOI: 10.1109/ICCV.2017.99]
-
Li X, Wang M Y, Wen C C, Wang L J, Zhou N and Fang Y. 2019. Density-aware convolutional networks with context encoding for airborne LiDAR point cloud classification[EB/OL].[2020-08-10]. https://arxiv.org/pdf/1910.05909.pdf
-
Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018. PointCNN: convolution on X-transformed points[EB/OL].[2020-08-13]. https://arxiv.org/pdf/1801.07791.pdf
-
Liu Y C, Fan B, Xiang S M and Pan C H. 2019. Relation-shape convolutional neural network for point cloud analysis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 8887-8896[DOI: 10.1109/CVPR.2019.00910]
-
Maturana D and Scherer S. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE: 922-928[DOI: 10.1109/IROS.2015.7353481]
-
Noh H, Hong S and Han B. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1520-1528[DOI: 10.1109/ICCV.2015.178]
-
Parzen E. 1962. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3): 1065-1076 [DOI:10.1214/aoms/1177704472]
-
Qi C R, Liu W, Wu C X, Su H and Guibas L J. 2018. Frustum PointNets for 3D object detection from RGB-D data//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 918-927[DOI: 10.1109/CVPR.2018.00102]
-
Qi C R, Su H, Nieβner M, Dai A, Yan M Y and Guibas L J. 2016. Volumetric and multi-view CNNS for object classification on 3D data//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 5648-5656[DOI: 10.1109/CVPR.2016.609]
-
Qi C R, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16]
-
Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space[EB/OL].[2020-07-25]. https://arxiv.org/pdf/1706.02413.pdf
-
Riegler G, Ulusoy A O and Geiger A. 2017. OctNet: learning deep 3D representations at high resolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 6620-6629[DOI: 10.1109/CVPR.2017.701]
-
Shen Y R, Feng C, Yang Y Q and Tian D. 2018. Mining point cloud local structures by kernel correlation and graph pooling//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4548-4557[DOI: 10.1109/CVPR.2018.00478]
-
Su H, Jampani V, Sun D Q, Maji S, Kalogerakis E, Yang M H and Kautz J. 2018. SPLATNet: sparse lattice networks for point cloud processing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2530-2539[DOI: 10.1109/CVPR.2018.00268]
-
Su H, Maji S, Kalogerakis E and Learned-Miller E. 2015. Multi-view convolutional neural networks for 3D shape recognition//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 945-953[DOI: 10.1109/ICCV.2015.114]
-
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[DOI: 10.1109/CVPR.2018.00813]
-
Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M, Solomon J M. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): #146
-
Wen W W, Wen G J, Hui B W, Chen D X. 2019. Model library construction by combining global and local surfaces for 3D object recognition. Journal of Image and Graphics, 24(2): 248-257 (文威威, 文贡坚, 回丙伟, 陈鼎新. 2019. 结合全局与局部信息的点云目标识别模型库构建. 中国图象图形学报, 24(2): 248-257) [DOI:10.11834/jig.180270]
-
Wu W X, Qi Z G and Li F X. 2019. PointConv: deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9613-9622[DOI: 10.1109/CVPR.2019.00985]
-
Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1912-1920[DOI: 10.1109/CVPR.2015.7298801]
-
Xu Y F, Fan T Q, Xu M Y, Zeng L and Qiao Y. 2018. SpiderCNN: deep learning on point sets with parameterized convolutional filters[EB/OL].[2020-08-06]. https://arxiv.org/pdf/1803.11527.pdf
-
Yi L, Kim V G, Ceylan D, Shen I C, Yan M Y, Su H, Lu C W, Huang Q X, Sheffer A, Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics, 35(6): #210 [DOI:10.1145/2980179.2980238]
-
Yuan X C, Chen H W, Li Y W. 2017. Normal estimation method for regular point cloud surface with sharp feature. Journal of Image and Graphics, 22(3): 334-341 (袁小翠, 陈华伟, 李彧雯. 2017. 规则特征曲面点云法向估计. 中国图象图形学报, 22(3): 334-341) [DOI:10.11834/jig.20170307]
-
Zhu Y K, Mottaghi R, Kolve E, Lim J J, Gupta A, Li F F and Farhadi A. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning//Proceedings of 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore, Singapore: IEEE: 3357-3364[DOI: 10.1109/ICRA.2017.7989381]