发布时间: 2021-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200262
2021 | Volume 26 | Number 11

图像理解和计算机视觉

激光点云的稀疏体素金字塔邻域构建与分类

陶帅兵^1,2,3, 梁冲^1,2,3, 蒋腾平⁴, 杨玉娇^1,2,3, 王永君^1,2,3

1. 江苏省地理信息资源开发与应用协同创新中心, 南京 210023;

2. 南京师范大学虚拟地理环境教育部重点实验室, 南京 210023;

3. 南京师范大学地理环境演变国家重点实验室培育点, 南京 210023;

4. 武汉大学测绘遥感信息工程国家重点实验室, 武汉 430072

收稿日期: 2020-06-09; 修回日期: 2020-09-27; 预印本日期: 2020-10-04

基金项目: 国家自然科学基金项目（41771439）；国家重点研发计划项目（2016YFB0502304）；自然资源部城市国土资源监测与仿真重点实验室项目（KF-2018-03-070）

作者简介: 陶帅兵, 1994年生, 男, 硕士研究生, 主要研究方向为激光点云数据处理、机器学习。E-mail: shuaibingtao@163.com
梁冲, 男, 硕士研究生, 主要研究方向为基于深度学习的激光点云分类方法。E-mail: 947192871@qq.com
蒋腾平, 男, 博士研究生, 主要研究方向为激光点云分类与特征提取。E-mail: 331217972@qq.com
杨玉娇, 女, 硕士研究生, 主要研究方向为3维空间数据模型。E-mail: 1264640726@qq.com
王永君, 通信作者, 男, 教授, 主要研究方向为激光点云数据处理、多维数据集成建模。E-mail: wangyongjun@njnu.edu.cn
*通信作者: 王永君 wangyongjun@njnu.edu.cn

中图法分类号: P23

文献标识码: A

文章编号: 1006-8961(2021)11-2703-10

摘要

目的在点云分类处理的各环节中，关键是准确描述点云的局部邻域结构并提取表达能力强的点云特征集合。为了改进传统邻域结构单尺度特征表达能力的有限性和多尺度特征的计算复杂性，本文提出了用于激光点云分类的稀疏体素金字塔邻域结构及对应的分类方法。方法通过对原始数据进行不同尺度下采样构建稀疏体素金字塔，并根据稀疏体素金字塔提取多尺度特征，利用随机森林分类器进行初始分类；构建无向图，利用直方图交集核计算邻域点之间连接边的权重，通过多标签图割算法优化分类结果。当体素金字塔的接收域增大时，邻域点密度随其距离中心点距离的增加而减小，有效减少了计算量。结果在地基Semantic3D数据集、车载点云数据和机载点云数据上进行实验，结果表明，在降低计算复杂性的前提下，本文方法的分类精度、准确性和鲁棒性达到了同类算法前列，验证了该框架作为点云分类基础框架的有效性。结论与类似方法相比，本文方法提取的多尺度特征既保持了点的局部结构信息，也更好地兼顾了较大尺度的点云结构特征，因而提升了点云分类的精度。

关键词

点云分类; 稀疏体素金字塔; 多尺度特征; 多标签图割; 直方图交集核

Sparse voxel pyramid neighborhood construction and classification of LiDAR point cloud

Tao Shuaibing^1,2,3, Liang Chong^1,2,3, Jiang Tengping⁴, Yang Yujiao^1,2,3, Wang Yongjun^1,2,3

1. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China;

2. Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China;

3. State Key Laboratory Cultivation Base of Geographical Environment Evolution, Nanjing Normal University, Nanjing 210023, China;

4. State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430072, China

Supported by: National Natural Science Foundation of China (41771439);National Key Research and Development Program of China (2016YFB0502304); Key Laboratory Project of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (KF-2018-03-070)

Abstract

Objective Point cloud classification is one of the hotspots of computer vision research. Among of various kinds of processing stages, accurately describing the local neighborhood structure of the point cloud and extracting the point cloud feature sets with strong expressive ability has become the key to point cloud classification. Traditionally, two methods can be used for modeling the neighborhood structure of point clouds: single-scale description and multiscale description. The former has a limited expressive ability, whereas the latter has a strong description ability but comes with a high computational complexity. To solve the above problems, this paper proposes a sparse voxel pyramid structure to express the local neighborhood structure of the point cloud and provides the corresponding point cloud classification and optimization method. Method First, after a comparative analysis of related point cloud classification methods, the paper describes in detail the structure of the proposed sparse voxel pyramid, analyzes the advantages of the sparse voxel pyramid in expressing the neighborhood structure of the point cloud, and provides the method to express the local neighborhood of the point could with this structure. When calculating point features, the influence of candidate points on the local feature calculation results gradually decreases as the distance decreases. Thus, a fixed number of neighbors is used to construct each layer of the sparse voxel pyramid. For each voxel, a sparse voxel pyramid of N layers is constructed, and the voxel radius of the 0th layer is set to R. The value of N can be set according to the computing power of hardware resources. The R value is the smallest voxel value in the entire voxel pyramid, and its size can be set according to the point cloud density and range of the scene. The voxel radius of each subsequent layer of the pyramid is in turn twice that of the previous layer. The voxel radius of the Nth layer is 2^NR, and each layer contains the same number of K voxels. Each point in the original point cloud is built into a spatial K-nearest neighbor index in voxel point clouds of different scales to form a sparse voxel pyramid by downsampling the original point cloud according to the above-mentioned proportions. This method can determine the multiscale neighborhood of the center point only based on a fixed-size K value. Near the center point, the point cloud density maintains the original density. As the distance from the center point increases, the density of neighboring points becomes sparser. Based on the sparse voxel pyramid structure, the local neighborhood constructed by points at different scales is used to extract feature-value-based features, neighborhood geometric features, projection features, and fast point feature histogram features of corresponding points. Then, the single point features are aggregated, the random forest method is used for supervised classification, and then the multilabel graph cut method is used to optimize the above classification results. After calculating the fast point feature histogram feature of each point, the histogram intersection core is used to calculate the edge potential between neighboring points. Result This paper selects three public datasets for experiments, namely, the ground-based Semantic3D dataset, the airborne LiDAR scanning data obtained by the airborne laser scanning system ALTM 2050 in different regions, and point cloud dataset at the main entrance of the Munich Technical University campus in Arcisstrasse by mobile LiDAR system, to verify the effectiveness of this method. The evaluation indicators used in the experiment are accuracy, recall, and F1 value. Sparse voxel pyramids of different scales are used for feature extraction and feature vector aggregation owing to the difference in point cloud density and coverage. Using the method proposed in this paper, the overall classification accuracy of the experimental results on the ground Semantic3D dataset can reach 89%, the classification accuracy of the airborne LiDAR scan dataset can reach 96%, and the classification accuracy of the mobile LiDAR scan dataset can reach 89%. Experimental results show that compared with other comparison methods, the multiscale features based on sparse voxel pyramids proposed can express the local structure of point clouds more effectively and robustly. When the receiving field of the voxel pyramid increases, the density of neighboring points decreases as the distance from the center point increases, which effectively reduces the amount of calculation. In addition, the histogram feature of fast point feature is used to calculate the difference between adjacent points through the histogram intersection kernel, which is used as the weight of the edge in the structural graph model to improve the optimization effect. This method is more accurate than the traditional method that uses Euclidean distance as the weight. The multilabel graph cut method can further optimize the results of single-point classification and provide better optimization results on issues such as the incorrect classification of vegetation into buildings and vice versa. In areas with large natural terrain undulations, the classification accuracy of terrain and low vegetation is greatly affected by the undulations of natural terrain in different regions, and misclassification easily occurs. By contrast, the classification accuracy on higher features such as tall vegetation, buildings, and pedestrians is less affected by the terrain, and the accuracy is higher. Conclusion Overall, compared with other similar and more advanced methods, the multiscale features extracted by the proposed method maintain the local structure information while considering a larger range of point cloud structure information, thereby improving the point cloud classification accuracy.

Key words

point cloud classfication; sparse voxel pyramid; multi-scale feature; multi-label graph cut; histogram intersection kernel

0 引言

随着激光雷达在自动驾驶(Ali等，2018)、道路设施调查(Zhi等，2017)、对象识别(Cheng等，2017；Grilli等，2017)和建筑物提取及建模(Chen等，2014)等领域的应用前景越来越明确，点云分类作为点云数据处理的关键步骤，已成为摄影测量与遥感、计算机视觉等学科重要的研究方向。但是，场景对象的复杂性、点云采样密度的不均匀、遮挡与缺失等使点云分类具有较大挑战，阻碍了激光点云数据的后期应用。因此，激光点云的分类处理研究受到广泛关注。

点云分类按流程分为先分类后分割和先分割后分类两种。前者一般首先恢复原始点云中各个点的局部邻域结构，并基于其提取每个点的特征集合，然后对提取的特征集合利用机器学习等方法进行监督分类(Dittrich等，2017)，最后在分类后的语义类别上通过聚类或分割等算法得到独立的目标对象。后者往往先基于随机采样一致性、3D霍夫变换、连接组件分析、聚类或随机采样一致性等算法对原始数据进行分割(Wang等，2018a)，在分割后的片段上计算该片段的特征集合，最后利用机器学习的方法或语义规则的约束对分割后片段进行分类(Yang和Dong，2013；Yang等，2015)。由于先分割后分类算法的最终分类精度严重依赖于前期的分割结果，受过分割或欠分割影响较大，本文选择先分类后分割方法进行点云分类研究。

激光点的局部邻域结构有3种常用的定义方法：3D球形邻域(Lee和Schenk，2002)、3D圆柱形邻域(Filin和Pfeifer，2005)和$ K $邻域(Linsen和Prautzsch，2001)。由于点云密度会随着目标与传感器的距离变化，导致点密度分布不均匀，采用固定大小3D球形邻域或圆柱形邻域会使得邻域内点的数目变化较大，因此采用$ K $邻域作为邻域定义会得到更鲁棒的局部邻域结构。为了确定最佳的$ K $值，Mitra等人(2004)在给定噪声估计、局部采样密度和局部曲率边界的情况下，根据局部信息自适应地计算各点的邻域大小。Weinmann等人(2014)通过基于特征熵的能量函数最小化得到点的最佳邻域。Demantké等人(2011)通过基于维度的特征确定最佳的邻域大小。基于特征熵方法可以得到鲁棒的局部邻域大小，然而计算点的特征值会占据大量计算资源，为获得最佳邻域大小需要遍历较多可能的邻域数目，例如将$ K $值设为一个区间[10, 100]，从而增加了冗余计算(Weinmann等，2014)。而多尺度方法不需要重复计算点云的特征值，在小尺度下捕获详细的局部信息，在大尺度下捕获点云数据的整体骨架信息(Lim和Suter，2009)。

在得到点云局部邻域结构后，需要提取表达能力强、鲁棒性能好的点云特征集合。Rusu等人(2008, 2009)提出将点特征直方图及快速点特征直方图作为鲁棒的特征表示。Tombari等人(2010)将带有签名的直方图与定向直方图描述符相结合，权衡特征的鲁棒性和表达性。Guo等人(2013)提出一种基于旋转投影统计量的3维局部地物描述符，具有较高的描述度、较强的噪声鲁棒性和较强的网格分辨率。West等人(2004)通过提取点的协方差特征值等来描述对象的形状和几何特征，在局部邻域应用主成分分析，并根据它们属于特征的可能性对点进行分类。Weinmann等人(2015b)分析了不同邻域类型及大小对点云特征提取的影响，并通过特征值分析计算了21个特征。基于得到的特征，可采用支持向量机(Laube等，2017；刘志青等，2016)、随机森林分类器(Weinmann等，2017；Ni等，2017；Bassier等，2019)、JointBoost分类器(Guo等，2015)和贝叶斯网络(Kang等，2017)等对点云进行分类，不同机器学习方法对点云分类的影响各有优劣(Weinmann等，2015a)。然而，这些方法提取的点特征仅考虑了点的局部特性，并没有利用点之间的邻接信息，因而有学者利用条件随机场和马尔可夫随机场对原始分类后的结果进行优化。Lim和Suter(2007)利用条件随机场的方法对点云数据进行分类。Niemeyer等人(2012)将随机森林分类器集成在条件随机场的框架中，对城市激光点云进行上下文分类，结果表明整体分类精度提升了2%。Najafi等人(2014)利用马尔可夫随机场对点云数据进行分类。Luo等人(2018)通过使用高阶的马尔可夫随机场模型考虑范围更大的上下文来提升标记结果。

除了传统的特征提取算法外，PointNet系列(Qi等，2017；Charles等，2017)、PointCNN(point convolutional neural network)(Li等，2018)和PointSIFT(point scale invariant feature transform)(Landrieu和Simonovsky，2017)等深层神经网络算法在图像处理领域取得了显著成果。PointNet作为3D深度学习的先驱，直接将原始点云数据作为神经网络的输入，通过多层感知机和对称函数提取点云特征，并以端到端的方式得到点云分类的结果。但是，深度学习需要大量的训练数据，有限的训练数据集会限制深度学习的精度，且深度学习适应于处理小范围的数据，面对大场景的点云分类任务，深度学习方法还有待提升。

综上所述，准确描述点云的局部邻域结构并提取表达能力强的点云特征集合成为点云分类的关键，上述关于点云特征提取的算法大多数不能在特征的表达能力和计算效率之间有很好的权衡。对此，本文提出基于稀疏体素金字塔的多尺度特征提取算法，在保持较高计算效率前提下，兼顾局部详细结构的信息保持与更大尺度上点云整体骨架结构的捕获，获得更具有鲁棒性的特征集合。

1 基于稀疏体素金字塔的单点多尺度特征提取及分类

1.1 基于稀疏体素金字塔的多尺度点特征提取

基于特征值的特征能够描述点的局部几何特征，广泛用于激光点云分类。Weinmann等人(2015b)认为，不同大小的邻域数值$ K $对最终分类结果会产生不同影响，其将$ K $值设置在[10, 100]之间，通过基于特征熵的能量函数最小化来求解最优的$ K $值。但在迭代求解$ K $值过程中需要遍历[10, 100]所有$ K $值，增加了大量冗余计算。

由于计算点的特征时，中心点附近的点需要详细描述其局部几何结构信息，而距中心点较远的点仅需主要骨架信息大范围描述点云的整体信息即可。为此，提出一种稀疏体素金字塔结构的多尺度邻域构建方法。对于每个体素，构建一个$ N $层的稀疏体素金字塔，从上到下依次为第$ 0,1, \cdots ,N $层，最上面第0层体素半径最小并假定为$ R $，定义第1层体素半径为$ 2R $，第2层体素半径为$ {2^2}R, \cdots $, 第$ D $层体素半径$ {2^d}R, \cdots $, 第$ N $层体素半径为$ {2^N}R, \cdots $。每层都包含相同个数的$ K $个体素。一个5层的稀疏体素金字塔，其每层体素大小关系如图 1所示，由左至右分别为金字塔第0，1，2，3，4层体素大小。

图 1 体素金字塔示意图

Fig. 1 Schematic of voxel pyramid

与Yang和Kang(2018)提出的多尺度特征不同，本文方法仅根据一个固定大小的$ K $值就可确定中心点的多尺度邻域。图 2为根据Yang和Kang(2018)方法构建的多尺度空间邻域与本文提出的稀疏体素金字塔结构构建的邻域图对比。

图 2 多尺度空间邻域

((a)Yang and Kang(2018); (b)ours)

Fig. 2 Multi-scale spatial neighborhood

图 2(a)为根据多尺度邻域方法构建的邻域图，分别将$ K $近邻设为15、30、60，包含105个邻域点，对不同$ K $值对应的邻域点按不同半径大小表示；图 2(b)为本文提出的采用5个层次稀疏体素金字塔结构构建的邻域图与图 2(a)的叠加显示，每层取$ K $值为15，整个金字塔共75个邻域点，分别对应采用对应金字塔层半径大小进行下采样的体素中心点，图中根据不同体素尺寸显示为不同大小的方形。可以看出，本文提出的稀疏体素金字塔在中心点附近保持了相对密集的邻域点，同时顾及了更大范围的邻域信息的获取，不仅可以在小尺度下捕获详细的局部结构信息，还能够捕获大尺度的整体几何信息，兼顾了不同尺度特征的获取，在提高整体描述能力的同时，减少了体素个数。

稀疏体素金字塔生成时，首先对原始点云数据进行不同尺度的体素滤波，假设最小体素半径取值为0.1 m，则体素金字塔各层体素半径分别为0.1 m, 0.2 m, 0.4 m, 0.8 m, 1.6 m。然后将原始点云中的每个点在不同尺度的体素点云中建立空间$ K $近邻索引，最后根据点在不同尺度下构建的局部邻域提取局部结构信息。如图 2中的中心点邻域结构图所示，在中心点附近，点云保持原始密度不变，距离中心点越远，邻域点的密度变得越稀疏，与其他多尺度特征相比减少了中心点邻域的点总数以提高运算效率，同时又没有损失局部邻域的局部信息和整体信息。

点云的空间邻域索引建立后，分别在不同尺度体素点云中提取特征。本文提取的特征包括基于特征值的特征、邻域几何特征(Weinmann等人，2015b)、投影特征和快速点特征直方图(fast point feature histograms，FPFH)的特征。

基于特征值的特征指的是根据局部协方差矩阵的特征值计算获得的各种几何特征。首先根据构建的局部邻域，计算点的局部协方差矩阵，并对其进行奇异值分解，得到3个特征值$ {\lambda _1},{\lambda _2} $和$ {\lambda _3} $，其中，$ {\lambda _1} > {\lambda _2} > {\lambda _3} $。然后根据特征值计算其他特征，如线性$ \left( {{L_\lambda }} \right) $、平面性$ \left( {{P_\lambda }} \right) $、散射性$ \left( {{S_\lambda }} \right) $、各向异性$ \left( {{A_\lambda }} \right) $、特征熵$ \left( {{E_\lambda }} \right) $、特征值之和$ \left( {{\mathit{\Sigma} _\lambda }} \right) $、曲率变化$ \left( {{C_\lambda }} \right) $和结构张量变化指数$ \left( {{O_\lambda }} \right) $。邻域几何特征指在局部邻域内统计当前邻域的最大高程差、高程标准差、邻域半径和局部密度。投影特征指将局部邻域内的点投影在$ XOY $平面上，然后计算其2维平面上的特征值及曲率变化和投影点构成的凸包面积。快速点特征直方图特征指通过描述点局部邻域的表面变化，在高维空间中描述特定点周围的几何特性。

为了更清晰地表示稀疏体素金字塔提取的特征，图 3给出了本文提取的多尺度特征结果的一个简单样例，分别是在0.1 m的体素和1.6 m的体素下提取的点特征。可以看出，相对0.1 m的尺度，1.6 m的尺度下尖锐物体的特征值更加凸显。

图 3 体素提取的特征可视化样例

((a)0.1 m; (b)1.6 m)

Fig. 3 Feature visualization examples of voxel extraction

1.2 对单点特征进行聚合并利用随机森林算法进行监督分类

将提取的多尺度特征聚合成为一个特征向量作为监督分类器的输入，并利用随机森林算法进行监督分类。随机森林由Breiman(2001)提出，由多棵决策树构成，每棵树随机从样本中选取特征，每棵树的节点使用的特征从所有特征中按照一定比例随机抽取。本文利用网格调优，构建50棵决策树、最大深度为25、最大特征数为20的随机森林分类器。

1.3 基于多标签图割法对初始分类结果进行优化

监督学习得到的分类结果的软标签不具有空间光滑性，因此需要利用空间邻域信息进一步优化分类结果。本文采用多标签图割法对初始分类结果进行优化，利用$ \alpha $-expansion算法实现能量的最小化(Boykov等，2001)。本文构建的图模型表示为$ \boldsymbol{G}=\langle V, E, w\rangle $，其中，$ V $表示顶点(即点云数据集中的每个点)，$ E $为相邻顶点之间的无向边(即点与邻域点之间的关系)，$ w $为相邻点连接边的权重。为了得到最优的标记，将上述分类问题转化为能量最小化问题，构建多标签能量函数，具体为

$ E(L)=E_{\text {data }}(L)+\lambda E_{\text {smooth }}(L) $

(1)

式中，$ E_{\mathrm{data}}(L) $为保真项，表示标签与观测数据的不一致性；$ E_{\mathrm{smooth}}(L) $为平滑项，表示局部邻域内标签的不一致性；$ \lambda $用来衡量保真项和平滑项之间的权重。本文利用随机森林分类器的后验概率估计生成的软标签表示保真项。

对于边缘势，本文提出新的计算方法。首先计算每个点的快速点特征直方图特征，然后利用直方图交集核计算邻域点之间的边缘势。快速点特征直方图(FPFH)能够准确描述点的局部结构信息，并且能够区分具有不同结构的相邻点，因此选择快速点特征直方图计算相邻点之间的相似性。Grauman和Darrell(2005)提出一种新的快速核函数，将无序特征集映射到多分辨率直方图中，并在此基础上计算加权直方图的交点，如图 4所示，$ H\left( y \right) $和$ H\left( z \right) $代表沿$ y $和$ z $方向的直方图。本文首次将其运用在快速点特征直方图中，计算相邻点之间的相似性作为相邻边的权重。

图 4 直方图交集核(Grauman和Darrell，2005)

Fig. 4 Histogram intersection kernel(Grauman and Darrell, 2005)

((a)point sets; (b)histogram pyramids; (c)intersections)

经过上述计算得到保真项和平滑项，为了更好地确定平滑项系数$ \lambda $，将其取值为0.25，0.5，0.75，1.0，1.25，1.5，1.75，2.0。通过实验发现，取值为1.5时可以更好地权衡保真项和平滑项之间的权重。在确定了能量函数之后，通过$ \alpha $-expansion算法将能量函数最小化，得到最优的标签。

2 实验与分析

为了验证算法的有效性和通用性，本文在3个公开数据集上进行实验。数据集A选取目前可公开获取的最大LiDAR数据集Semantic3D (Hackel等，2017)，该数据集通过固定站扫描仪获取，包含15个训练集和15个测试集，其中各种城市和乡村场景超过30亿个点，每个点的属性信息有$ X、Y、Z $、强度、$ R、G、B $等8个类别的语义标签，包括人造地形、自然地形、高植被、低矮植被、建筑物、硬景观、行人和车辆。数据集B选择由机载激光扫描系统ALTM 2050在不同地区获得的两组数据(Shapovalov和Velizhev，2011)，有5个语义标签，包括地面、建筑物、车辆、高植被和低矮植被。数据集C选择德国Arcisstrasse城区慕尼黑工业大学校园主入口MLS(mobile laser scanners)点云数据集，有8个语义类别。实验环境为windows10 64位操作系统，Inter Core i7-4770 CPU @ 3.40 GHz处理器，8 GB RAM，开发语言为C++和Python语言。采用准确率(precision)、召回率(recall)和F1值(F1-score)作为评价指标。

2.1 基于多尺度特征的随机森林分类结果与分析

为了验证本文提出的稀疏体素金字塔多尺度特征提取方法的有效性，分别在3个数据集中提取特征。

在数据集A中，由于点云密度较高、数据量巨大，因此在特征提取前，首先按照0.1 m的分辨率统一对数据集进行下采样。然后提取原始分辨率(此处取下采样后的0.1 m)、0.2 m、0.4 m、0.8 m和1.6 m等5个不同尺度的特征，并聚合成一个特征向量作为随机森林分类器的输入。最后通过网格搜索确定了包含50棵决策树、最大深度为25、最大特征数为20的随机森林分类器作为最优的分类器。分类结果如图 5所示。可以看出，建筑物顶部拐角部分容易错分为植被，同样植被也容易错分为建筑物。

图 5 对Semantic3D数据集的分类结果

Fig. 5 Classification results on Semantic3D dataset

((a)single-scale features; (b)multi-scale features)

在数据集B中，由于密度较低且覆盖范围大，因此需要将体素滤波尺寸适当放大。通过实验，将体素滤波尺寸设置为原始分辨率、1.0 m、2.0 m、4.0 m和8.0 m，其他参数与数据集A相同，分类结果如图 6所示。

图 6 机载点云分类结果

Fig. 6 Classification results of airborne point cloud

((a)single-scale features; (b)multi-scale features)

数据集C为移动车载点云数据，原始数据及分类结果如图 7所示，可以看出，本文方法可以对校园场景的点云数据进行准确分类。

图 7 慕尼黑工业大学数据集分类结果

Fig. 7 Munich Technical University dataset classification results

((a)original data; (b)classification results)

本文方法在3个数据集上不同尺度分类结果如表 1所示。可以看出，数据集A包含8个语义类别，不同类别之间的交叉较多且密度变化较大，极大影响了点云的分类精度。采用单尺度特征的分类精度较低，将多个不同尺度的特征聚合为多尺度特征对分类的准确率提升了11%。数据集B包含5个语义类别，数据密度分布均匀，尽管单尺度特征分类结果已经较高，但是利用多尺度特征进行分类仍然提高了3%。数据集C为城市车载点云数据，该数据集的实验结果表明本文方法可以为自动驾驶的场景解析提供基础。上述3种不同类型的数据集都证明了本文提出的多尺度特征对于提升分类结果的有效性和鲁棒性。

表 1 本文方法在不同数据集上不同尺度分类结果对比
Table 1 Comparison of classification results at different scales on different datasets by our method

下载CSV

数据集	准确率		召回率		F1-score
数据集	单尺度	多尺度	单尺度	多尺度	单尺度	多尺度
A	0.75	0.86	0.76	0.86	0.714	0.85
B	0.93	0.96	0.94	0.96	0.94	0.96
C	0.82	0.89	0.81	0.88	0.81	0.88

2.2 多标签图割法优化结果与分析

基于单点的分类结果具有空间的不连续性，需要利用局部空间上下文信息对分类结果进行优化，得到更平滑的分类结果。本文利用多标签图割法对分类结果进行优化，将随机森林的初始分类硬标签作为初始标签，软标签作为数据保真项，根据点的快速点特征直方图特征计算相邻点之间的相似性，并将其作为连接边的权重构建能量函数，利用α-expansion对构建的能量函数进行最小化。图 8为利用多标签图割法在Semantic3D数据集上的优化结果。

图 8 多标签图割法在Semantic3D数据集上的优化结果

Fig. 8 The optimization results by multi-label graph-cut on Semantic3D dataset

将图 8与图 5对比，发现经过多标签图割法优化之后的分类结果更加平滑。在图 5中，左上角建筑物上方有部分点错分为高植被，经多标签图割法优化，纠正为建筑物；同样，初始分类之后的车辆点与低矮植混杂在一起，经过优化，将错分为低矮植被的点纠正为车辆点。表 2为经过多标签图割法优化之后不同类别的准确率、召回率和F1值。可以看出，高植被、行人和建筑物分类精度较高，而自然地形和低矮植被分类精度较低，这是由于该区域自然地形高低起伏，低矮植被与自然地形特征相似所致。

表 2 多标签图割法在Semantic3D数据集上的优化结果
Table 2 The optimization results by multi-label graph-cut on Semantic3D dataset

下载CSV

标签	准确率	召回率	F1值
人造地形	0.81	0.90	0.86
自然地形	0.45	0.16	0.24
高植被	0.99	0.80	0.88
低矮植被	0.68	0.91	0.78
建筑物	0.92	0.99	0.95
硬景观	0.79	0.42	0.55
行人	0.98	0.59	0.73
车辆	0.90	0.90	0.90
平均	0.88	0.88	0.87

为了进一步验证本文提出的多尺度特征方法的有效性，在数据集A中，将本文方法与其他方法在精度上进行对比，结果如表 3所示。可以看出，与Weinmann等人(2015b)的方法相比，本文方法的精度较高，并与国际先进水平(Wang等，2018b)相近，本文提取的多尺度特征在保持局部结构信息的同时，兼顾了更大范围的点云结构信息，从而提高了分类精度。

表 3 不同方法在Semantic3D数据集上的分类结果对比
Table 3 Comparison of classification results on Semantic3D dataset among different methods

下载CSV

方法	精度
邻域优化法(Weinmann等，2015b)	0.742
DNNSP(Wang等，2018b)	0.893
PointNet++(Qi等，2017)	0.857
本文	0.880

3 结论

针对单尺度点云特征的有限性，提出基于稀疏体素金字塔的多尺度特征提取，将不同尺度的特征聚合进行分类，并通过多标签图割算法对初始分类结果优化。实验结果表明，与其他方法相比，本文提出的多尺度特征能够更高效更鲁棒地表达点云的局部结构。本文方法利用快速点特征直方图特征，通过直方图交集核计算相邻点之间的差异性作为构建图模型中边的权重改善了优化结果。在地基扫描仪数据集和机载数据集上进行实验，证明了本文方法的鲁棒性。

然而，由于场景的复杂性和点云数据的密度不均匀等特性，导致最终分类精度仍待进一步提高。同时，由于多标签图割法优化仅考虑点云的局部信息，在初始分类结果错误的情况下很难得到正确结果。在下一步工作中，在本文提出框架的基础上，将通过添加语义规则的限制来优化分类结果作为进一步研究的选择方向。

参考文献

Ali W, Abdelkarim S, Zahran M, Zidan M and Sallab A E. 2018. YOLO3D: end-to-end real-time 3D oriented object bounding box detection from LiDAR point cloud//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: #11131[DOI: doi.org/10.1007/978-3-030-11015-4_54]

Bassier M, Van Genechten B, Vergauwen M. 2019. Classification of sensor independent point cloud data of building objects using random forests. Journal of Building Engineering, 21: 468-477 [DOI:10.1016/j.jobe.2018.04.027]

Boykov Y, Veksler O, Zabih R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11): 1222-1239 [DOI:10.1109/34.969114]

Breiman L. 2001. Random Forests. Machine learning, 45(1): 5-32

Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16]

Chen D, Zhang L Q, Mathiopoulos T, Huang X F. 2014. A methodology for automated segmentation and reconstruction of urban 3-D buildings from ALS point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(10): 4199-4217 [DOI:10.1109/JSTARS.2014.2349003]

Cheng M, Zhang H C, Wang C, Li J. 2017. Extraction and classification of road markings using mobile laser scanning point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(3): 1182-1196 [DOI:10.1109/JSTARS.2016.2606507]

Demantké J, Mallet C, David N and Vallet B. 2011. Dimensionality based scale selection in 3D LiDAR point clouds//Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Calgary, Canada: ISPRS: 97-102[DOI: 10.5194/isprsarchives-XXXVIII-5-W12-97-2011]

Dittrich A, Weinmann M, Hinz S. 2017. Analytical and numerical investigations on the accuracy and robustness of geometric features extracted from 3D point cloud data. ISPRS Journal of Photogrammetry and Remote Sensing, 126: 195-208 [DOI:10.1016/j.isprsjprs.2017.02.012]

Filin S, Pfeifer N. 2005. Neighborhood systems for airborne laser data. Photogrammetric Engineering and Remote Sensing, 71(6): 743-755 [DOI:10.14358/PERS.71.6.743]

Grauman K and Darrell T. 2005. The pyramid match kernel: discriminative classification with sets of image features//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE: 1458-1465[DOI: 10.1109/ICCV.2005.239]

Grilli E, Menna F and Remondino F. 2017. A review of point clouds segmentation and classification algorithms//Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Nafplio, Greece: ISPRS: 339-344[DOI: 10.5194/isprs-archives-XLII-2-W3-339-2017]

Guo B, Huang X F, Zhang F, Sohn G. 2015. Classification of airborne laser scanning data using JointBoost. ISPRS Journal of Photogrammetry and Remote Sensing, 100: 71-83 [DOI:10.1016/j.isprsjprs.2014.04.015]

Guo Y L, Sohel F A, Bennamoun M, Wan J W and Lu M. 2013. RoPS: a local feature descriptor for 3D rigid objects based on rotational projection statistics//Proceedings of the 1st International Conference on Communications, Signal Processing, and Their Applications. Sharjah, United Arab Emirates: IEEE: 1-6[DOI: 10.1109/ICCSPA.2013.6487310]

Hackel T, Savinov N, Ladicky L, Wegner J D, Schindler K and Pollefeys M. 2017. Semantic3D. net: a new large-scale point cloud classification benchmark//Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Hannover, Germany: ISPRS: 91-98[DOI: 10.5194/isprs-annals-IV-1-W1-91-2017]

Kang Z Z, Yang J T, Zhong R F. 2017. A bayesian-network-based classification method integrating airborne LiDAR data with optical images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(4): 1651-1661 [DOI:10.1109/JSTARS.2016.2628775]

Landrieu L and Simonovsky M. 2017. Large-scale point cloud semantic segmentation with superpoint graphs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4558-4567[DOI: 10.1109/CVPR.2018.00479]

Laube P, Franz M O and Umlauf G. 2017. Evaluation of features for SVM-based classification of geometric primitives in point clouds//Proceedings of the 15th IAPR International Conference on Machine Vision Applications(MVA). Nagoya, Greece: IEEE: 59-62[DOI: 10.23919/MVA.2017.7986776]

Lee I and Schenk T. 2002. Perceptual organization of 3D surface points//International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 34(3/A): 193-198

Li Y Y, Bu R, Sun M C, Wu W, Di X H and Chen B Q. 2018. PointCNN: convolution on X-transformed points//Proceedings of Advances in Neural Information Processing Systems. Montréal, Canada: 820-830

Lim E H and Suter D. 2007. Conditional random field for 3D point clouds with adaptive data reduction//Proceedings of 2007 International Conference on Cyberworlds. Hannover, Germany: IEEE: 404-408[DOI: 10.1109/CW.2007.30]

Lim E H, Suter D. 2009. 3D terrestrial LIDAR classifications with super-voxels and multi-scale Conditional Random Fields. Computer-Aided Design, 41(10): 701-710 [DOI:10.1016/j.cad.2009.02.010]

Linsen L and Prautzsch H. 2001. Local versus global triangulations//Chalmers A and Rhyne T M, eds. Eurographics 2001. Oxford: Eurographics Association

Liu Z Q, Li P C, Chen X W, Zhang B M, Guo H T. 2016. Classification of airborne LiDAR point cloud data based on information vector machine. Optics and Precision Engineering, 24(1): 210-219 (刘志青, 李鹏程, 陈小卫, 张保明, 郭海涛. 2016. 基于信息向量机的机载激光雷达点云数据分类. 光学精密工程, 24(1): 210-219) [DOI:10.3788/OPE.20162401.0210]

Luo H, Wang C, Wen C L, Chen Z Y, Zai D W, Yu Y T, Li J. 2018. Semantic labeling of mobile LiDAR point clouds via active learning and higher order MRF. IEEE Transactions on Geoscience and Remote Sensing, 56(7): 3631-3644 [DOI:10.1109/TGRS.2018.2802935]

Mitra N J, Nguyen A, Guibas L. 2004. Estimating surface normals in noisy point cloud data. International Journal of Computational Geometry and Applications, 14(4/5): 261-276 [DOI:10.1142/S0218195904001470]

Najafi M, Namin S T, Salzmann M and Petersson L. 2014. Non-associative higher-order Markov networks for point cloud classification//Proceedings of 2014 European Conference on Computer Vision. Zurich, Switzerland: Spnhger: 500-515[DOI: doi.org/10.1007/978-3-319-10602-1_33]

Ni H, Lin X G, Zhang J X. 2017. Classification of ALS point cloud with improved point cloud segmentation and random forests. Remote Sensing, 9(3): 288 [DOI:10.3390/rs9030288]

Niemeyer J, Rottensteiner F and Soergel U. 2012. Conditional random fields for LIDAR point cloud classification in complex urban areas//Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Melbourne, Australia: ISPRS: 263-268[DOI: 10.5194/isprsannals-I-3-263-2012]

Qi C R, Yi L, Su H and Guibas L J. 2017. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc: 5105-5114

Rusu R B, Marton Z C, Blodow N and Beetz M. 2008. Persistent point feature histograms for 3D point clouds. Intelligent Autonomous Systems 10. Burgard W. et al (Eds. ). IOS Press[DOI: 10.3233/978-1-58603-887-8-119]

Rusu R B, Blodow N and Beetz M. 2009. Fast Point Feature Histograms (FPFH) for 3D registration//Proceedings of 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE: 3212-3217[DOI: 10.1109/ROBOT.2009.5152473]

Shapovalov R and Velizhev A. 2011. Cutting-plane training of non-associative Markov network for 3D point cloud segmentation//Proceedings of 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission. Hangzhou, China: IEEE: 1-8[DOI: 10.1109/3DIMPVT.2011.10]

Tombari F, Salti S and Di Stefano L. 2010. Unique signatures of histograms for local surface description//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 356-369[DOI: 10.1007/978-3-642-15558-1_26]

Wang C, Hou S W, Wen C L, Gong Z, Li Q, Sun X T, Li J. 2018a. Semantic line framework-based indoor building modeling using backpacked laser scanning point cloud. ISPRS Journal of Photogrammetry and Remote Sensing, 143: 150-166 [DOI:10.1016/j.isprsjprs.2018.03.025]

Wang Z, Zhang L Q, Zhang L, Li R J, Zheng Y B, Zhu Z D. 2018b. A deep neural network with spatial pooling (DNNSP) for 3-D point cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 56(8): 4594-4604 [DOI:10.1109/TGRS.2018.2829625]

Weinmann M, Jutzi B and Mallet C. 2014. Semantic 3D scene interpretation: a framework combining optimal neighborhood size selection with relevant features//Proceedings of SPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Zurich, Switzerland: ISPRS: 181-188[DOI: 10.5194/isprsannals-II-3-181-2014]

Weinmann M, Jutzi B, Hinz S, Mallet C. 2015b. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS Journal of Photogrammetry and Remote Sensing, 105: 286-304 [DOI:10.1016/j.isprsjprs.2015.01.016]

Weinmann Ma, Jutzi B, Mallet C, Weinmann Mi. 2017. Geometric features and their relevance for 3 d point cloud classification. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, 2017(4): 157-164 [DOI:10.5194/isprs-annals-IV-1-W1-157-2017]

Weinmann M, Urban S, Hinz S, Jutzi B, Mallet C. 2015a. Distinctive 2D and 3D features for automated large-scale scene analysis in urban areas. Computers and Graphics, 49: 47-57 [DOI:10.1016/j.cag.2015.01.006]

West K F, Webb B N, Lersch J R, Pothier S, Triscari J M and Iverson A E. 2004. Context-driven automated target detection in 3D data//Proceedings of SPIE 5426, Automatic Target Recognition XIV. Orlando, USA: SPIE: 133-143[DOI: 10.1117/12.542536]

Yang B S, Dong Z. 2013. A shape-based segmentation method for mobile laser scanning point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 81: 19-30 [DOI:10.1016/j.isprsjprs.2013.04.002]

Yang B S, Dong Z, Zhao G, Dai W X. 2015. Hierarchical extraction of urban objects from mobile laser scanning data. ISPRS Journal of Photogrammetry and Remote Sensing, 99: 45-57 [DOI:10.1016/j.isprsjprs.2014.10.005]

Yang J T, Kang Z Z. 2018. Multi-scale feature and Markov random field model for power line scene point cloud classification. Journal of Surveying and Mapping, 2: 188-197 (杨俊涛, 康志忠. 2018. 多尺度特征和马尔可夫随机场模型的电力线场景点云分类法. 测绘学报, 2: 188-197) [DOI:10.11947/j.AGCS.2018.20170556]

Zhi S F, Liu Y X, Li X and Guo Y L. 2017. LightNet: a lightweight 3D convolutional neural network for real-time 3D object recognition//Pratikakis I, Dupont F and Ovsjanikov M, eds. Eurographics Workshop on 3D Object Retrieval. Lyon: the Eurographics Association: 9-16[DOI:10.2312/3dor.20171046]