发布时间: 2022-02-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.210510
2022 | Volume 27 | Number 2

三维形状分析

嵌入Transformer结构的多尺度点云补全

刘心溥¹, 马燕新², 许可¹, 万建伟¹, 郭裕兰¹

1. 国防科技大学电子科学学院, 长沙 410005;

2. 国防科技大学气象海洋学院, 长沙 410005

收稿日期: 2021-07-05; 修回日期: 2021-10-26; 预印本日期: 2021-11-02

作者简介: 刘心溥, 1997年生, 男, 硕士研究生, 主要研究方向为3维点云补全。E-mail: liuxinpu@nudt.edu.cn
马燕新, 通信作者, 男, 讲师, 主要研究方向为3维计算机视觉。E-mail: mayanxin@nudt.edu.cn
许可, 男, 副教授, 主要研究方向为信号处理技术与系统。E-mail: xuke@nudt.edu.cn
万建伟, 男, 教授, 博士生导师, 主要研究方向为信号处理技术与系统、雷达技术和图像压缩。E-mail: kermitwjw@139.com
郭裕兰, 男, 副教授, 主要研究方向为3维视觉与机器学习。E-mail: yulan.guo@nudt.edu.cn
*通信作者: 马燕新 mayanxin@nudt.edu.cn

中图法分类号: TP242

文献标识码: A

文章编号: 1006-8961(2022)02-0538-12

摘要

目的当前点云补全的深度学习算法多采用自编码器结构，然而编码器端常用的多层感知器（multilayer perceptron，MLP）网络往往只聚焦于点云整体形状，很难对物体的细节特征进行有效提取，使点云残缺结构的补全效果不佳。因此需要一种准确的点云局部特征提取算法，用于点云补全任务。方法为解决该问题，本文提出了嵌入注意力模块的多尺度点云补全算法。网络整体采用编码器—解码器结构，通过编码器端的特征嵌入层和Transformer层提取并融合3种不同分辨率的残缺点云特征信息，将其输入到全连接网络的解码器中，输出逐级补全的缺失点云。最后在解码器端添加注意力鉴别器，借鉴生成对抗网络（generative adversarial networks，GAN）的思想，优化网络补全性能。结果采用倒角距离（Chamfer distance，CD）作为评价标准，本文算法在2个数据集上与相关的4种方法进行了实验比较，在ShapeNet数据集上，相比于性能第2的PF-Net（point fractal network）模型，本文算法的类别平均CD值降低了3.73%；在ModelNet10数据集上，相比于PF-Net模型，本文算法的类别平均CD值降低了12.75%。不同算法的可视化补全效果图，验证了本文算法具有更精准的细节结构补全能力和面对类别中特殊样本的强泛化能力。结论本文所提出的基于Transformer结构的多尺度点云补全算法，更好地提取了残缺点云的局部特征信息，使得点云补全的结果更加准确。

关键词

3维点云; 点云补全; 自编码器; 注意力机制; 生成对抗网络(GAN)

Multi-scale Transformer based point cloud completion network

Liu Xinpu¹, Ma Yanxin², Xu Ke¹, Wan Jianwei¹, Guo Yulan¹

1. College of Electronic Science and Technology, National University of Defense Technology, Changsha 410005, China;

2. College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410005, China

Abstract

Objective Three dimensional vision analysis is a key research aspect in computer vision research. Point cloud representation preserves the initial geometric information in 3D space under no discretization circumstances. Unfortunately, scanned 3D point clouds are incomplete due to occlusion, constrained sensor resolution and small viewing angle. Hence, a shape completion process is required for downstream 3D computer vision applications. Most deep learning based point cloud completion algorithms demonstrate an encoder-decoder structure and align multilayer perception (MLP) to extract point cloud features at the encoder. However, MLP networks tend to focus on the overall shape of the point cloud, and it is difficult to extract the local structural features of the object effectively. In addition, MLP does not generalize well to new objects, and it is difficult to complete the shape of objects with small training samples. So, it is a challenged issue that an efficient and accurate local structural feature extraction algorithm for point cloud completion. Method Multi-scale transformer based point cloud completion network (MSTCN) is illustrated. The entire network adopts an encoder decoder structure, which is composed of a multi-scale feature extractor, a pyramid point generator and a transformer based discriminator. The encoder of MSTCN extracts and aggregates the feature information of three types of incomplete point clouds with different resolutions through the transformer module, inputs them into a fully connected network based decoder, and then obtains the missing point clouds as outputs gradually. The feature embedding layer (FEL) and attention layer are melted into the encoder. The former improves the ability of the encoder to extract local structural features of point cloud via sampling and neighborhood grouping, the latter obtains the correlation information amongst points based on an improved self-attention module. As for decoder, pyramid point generator is mainly composed of a full connection layer and reshape operation. On the whole, a network adopts parallel operation on point clouds with three different resolutions, which are generated by the farthest down sampling approach. Similarly, point cloud completion is divided into three stages to achieve coarse-to-fine processing in the pyramid point generator. Based on generative adversarial network (GAN), MSTCN adds a transformer based discriminator at the back end of the decoder, so that the discriminator and the generator can promote each other in joint training and optimize the completion performance of network. The loss function of MSTCN is mainly composed of two parts: generating loss and adversarial loss. Generating loss is the weighted sum of chamfer-distance(CD) between the generated point cloud and its ground-truth of three scales, and adversarial loss is the cross entropy sum of the generated point cloud and its ground-truth through the transformer-based discriminator. Result The experiment was compared with the latest methods on the ShapeNet and ModelNet10 datasets. On the ShapeNet dataset, this paper used all of the 16 categories for training, the average CD value of category calculated by MSTCN was reduced by 3.73% as compared to the second best model. Specifically, the CD values of cap, car, chair, earphone, lamp, pistol and table are better than those of point fractal network(PF-Net). On the ModelNet10 dataset, the average CD value of each category calculated by MSTCN was decreased by 12.75% as compared to the second best model. Specifically, the CD values of bathtub, chair, desk, dresser, monitor, night-stand, sofa, table and toilet are better than those of PF-Net. According to the visualization results based on six categories of aircraft, hat, chair, headset, motorcycle and table, MSTCN can accurately complete special structures and generalize to special samples in one category. The ablation studies were also taken on the ShapeNet dataset. As a result, the full MSTCN network performs better than three other networks which were MSTCNs with no feature embedding layer, attention layer and discriminator respectively. It illustrates that the feature embedding layer can make the model more capable to extract local structure information of point clouds, the attention layer can make the model selectively refer to the local structure of the input point cloud when completing. The discriminator can promote the completion effect of network. Meanwhile, three groups of point cloud completion sub models for different missing ratios were trained on ShapeNet dataset to verify the completion robustness of MSTCN model for input point clouds with different missing ratios. The category of chair and visualized the effect of completion are opted. As a result, the MSTCN model always maintains a good point cloud completion effect although the number of input point clouds decreases gradually, in which the completion results of 25% and 50% missing ratios have similar CD values. Even the missing ratio reaches 75%, CD value of the chair category remains at a low level of 2.074/2.456. The entire chair shape can be identified and completed only in accordance with the incomplete chair legs. This demonstration verifies that MSTCN has strong completion robustness while dealing with input point clouds with different missing ratios. Conclusion A multi-scale transformer based point cloud completion network (MSTCN) for point cloud completion has been illustrated. MSTCN can better extract local feature information of the residual point cloud, which makes the result of point cloud completion more accurate. The current point cloud completion algorithm has achieved good results in the completion of a single object. Future research can focus on the completion of large-scale scenes because the incomplete point cloud in scenes has a variety of incomplete types, such as view missing, spherical missing and occlusion missing. It is more challenging and practical to complete large scenes. On the other hand, the point clouds of real scanned scenes have no ground truth point cloud for reference.The unsupervised completion algorithms have its priority than supervised completion algorithms.

Key words

three-dimensional point cloud; point cloud completion; autoencoder; attention mechanism; generative adversarial networks(GAN)

0 引言

3维视觉已经成为当今计算机视觉乃至人工智能领域的研究热点之一。点云(point cloud)凭借其能保持3维空间原有几何信息的优势，成为许多3维场景理解任务(目标分类、目标检测和目标分割等)的首选数据格式(Guo等，2021b)。然而实际应用中，由于物体遮挡、目标表面材料反光度差异以及视觉传感器分辨率和视角限制等原因，采集到的点云数据往往不完整，缺失部分几何和语义信息，影响后续3维场景理解任务的效果(龙霄潇等，2021)。因此，对残缺的点云进行补全是一项重要的基础工作，是进行高效3维场景理解的前提。

传统的形状补全工作主要有几何规律补全和模板匹配补全两种方法。几何规律补全方法中，Zhao等人(2007)利用平滑插值算法来补全3维形状中的缺失部分。Mitra等人(2013)识别输入形状中的对称轴和重复结构，以便利用对称性和重复性规律进行形状补全。模板匹配补全方法中，Li等人(2015)通过将输入形状与形状数据库中的模型进行匹配以实现补全操作。这些传统形状补全方法要求输入必须是尽量完整的，即对形状的残缺度有一个较高的下限要求，并且对新物体和环境噪声的鲁棒性较差。

3维点云补全工作大多利用深度学习的方法。PointNet(Charles等，2017a)是将神经网络直接作用在点云数据上的开创性工作，其采用具有置换不变性的池化操作，解决了输入点的排列不变性问题。PointNet+ +(Charles等，2017b)则在PointNet的基础上增加了采样和邻域聚合操作，使得网络更好地提取物体的局部信息。Achlioptas等人(2018)首次将深度学习方法应用到点云补全任务，构建了LGAN-AE(latent-space GAN autoencoder)网络模型。该网络模型在自编码器的基础上加入了生成对抗网络(generative adversarial networks，GAN)模块，有效地对点云进行了补全，但其解码器往往不能恢复稀有的几何结构，如带有空隙的椅背等。Yuan等人(2018)设计了PCN(point completion network)网络，其在PointNet结构和FoldingNet(Yang等，2018)的基础上提出了由粗略到精细的点云补全方法，但是无法提取物体的细微结构特征。Huang等人(2020)提出了PF-Net(point fractal network)网络结构，创新性地设计了多分辨率编码器和金字塔点解码器，并且加入了对抗损失函数，使得点云补全更加精细化。2020年，Transformer结构因其具有提取点之间相关性特征的优势，被引入到点云处理任务中。Guo等人(2021b)提出了PCT(point cloud transformer)网络结构，优化了自注意力模块，使得Transformer结构更适合于点云学习，在形状分类、部件分割等任务上取得了很好的性能。

基于上述问题和思路，本文提出了一种基于注意力模块的多尺度点云补全算法。具体贡献为：

1) 提出了一种特征嵌入模块，该模块在传统点云数据采用MLP模块进行特征升维的基础上，增加了多次的最远点采样和邻域聚合操作，并将聚合前后的特征密集连接，有效地增加了模型对于点云局部特征信息的提取能力。

2) 优化了点云Transformer结构并加入到点云补全任务中，通过多尺度注意力层特征提取和金字塔结构点云生成操作，构建了新颖的端到端点云补全网络，并将注意力层加入到计算对抗损失的鉴别器中，进一步提升了网络的性能。

3) 本文方法在3维物体数据集ShapeNet(Chang等，2015)和ModelNet(Wu等，2015)中取得了更优的补全效果。

1 相关研究工作

1.1 基于深度学习方法的点云补全算法

基于深度学习的点云补全算法大致可以分为两类：基于体素网格的补全方法和基于点卷积的补全方法。

采用基于体素网格的方法处理点云，初衷是为了更方便地应用3维卷积神经网络(three dimensional convolutional neural networks, 3D-CNN)操作。Dai等人(2017)提出的由3D卷积层组成的3D编码预测卷积神经网络(3D-encoder-predictor CNNs，3D-EPN)是体素网格法的代表，其通过体积深度神经网络和3D形状的组合来补全3维点云。Han等人(2017)提出了一种由全局结构推理网络和局部几何精化网络组合而成的新网络，全局结构推理网络结合了长短期记忆上下文融合模块，其根据输入的多视点深度信息来推断形状的全局结构，而后交由局部几何细化网络，通过体积编解码结构逐步生成高分辨率、完整的点云形状。Nguyen等人(2016)提出了一种马尔可夫随机场模型，其利用局部先验信息捕捉目标形状的局部结构，然后使用卷积深度置信网络从3D形状模板中学习残缺的点云结构。Xie等人(2020)为了解决多层感知机无法很好地获得点云的几何结构和上下文信息的问题，提出了GRNet(gridding residual network)，其网络中的grid结构可以很好地捕获点云的局部信息和结构，便于进行3维卷积操作，可以使用卷积神经网络对点云的局部特征信息进行提取。

基于点卷积的补全方法始于Yuan等人(2018)提出的PCN网络。PCN直接对原始点云进行操作，而不需要任何点云数据结构的先验信息。其采用编解码网络的设计，编码器采用两层堆叠的PointNet结构，以获取输入点云并输出点云的特征向量，解码器将特征向量转换为粗略和精细的点云。PF-Net由Huang等人(2020)提出，与现有网络所不同的是，PF-Net将不完整的点云作为输入，仅输出缺失部分的点云。整个网络采用自编码器结构，通过多尺度的编码和解码，较好地提取了点云的局部信息。Wen等人(2020)提出了SA-Net(skip-attention network)网络来完成点云补全任务，其设计了一种跳跃注意力机制来有效地利用不完整点云的局部结构细节，跳跃注意力机制选择性地从不完整点云的局部区域传递几何信息，以生成不同分辨率的完整点云。

1.2 Transformer注意力机制算法

Transformer结构起源于自然语言处理领域，由Vaswani等人(2017)提出并应用于机器翻译任务。其完全抛弃了循环神经网络和卷积神经网络等结构，而仅仅采用注意力机制来进行编解码操作，取得了很好的效果，注意力机制也成为了近期的研究热点。随后Transformer结构逐步向视觉领域跨界融合，ViT(vision transformer)模型(Dosovitskiy等，2021)的提出证实了在2维图像处理中，注意力层可以不再依赖于卷积神经网络，直接应用于图像块序列的纯Transformer结构也可以很好地完成图像分类和分割等任务。Guo等人(2021a)和Zhao等人(2021)将Transformer结构应用到3维点云数据处理中，其分别设计了偏移注意力模块和自注意力模块并应用于点特征提取，在点云目标分类、场景分割中取得了当时最好的效果。

2 本文方法

2.1 网络框架

图 1是本文提出的基于Tranformer结构的多尺度点云补全网络(multi-scale transformer based completion network，MSTCN) 模型整体框架。网络整体采用编码器—解码器结构，由多尺度特征提取器$E$、金字塔点生成器$G$和注意力鉴别器$D$ 3部分组成。MSTCN的输入为3个不同尺度的原始点云，然后分别送入到对应尺度的特征提取器$E$，生成该原始点云所对应的特征向量$\mathit{\boldsymbol{V}}$。随后将特征向量$\mathit{\boldsymbol{V}}$作为金字塔点生成器$G$的输入，输出3个尺度的生成点云。损失函数主要由生成损失和对抗损失两部分组成，生成损失由3个尺度的生成点云和与之对应的真实点云之间计算CD值(Chamfer distance)得到，对抗损失则由注意力鉴别器$D$计算得到。注意力鉴别器$D$借鉴了GAN的思想(Goodfellow等，2014)，其输入为第1尺度生成点云，输出为真实(real)或虚假(fake)的二分类值。引入注意力鉴别器$D$是为了和金字塔点生成器$G$联合训练，从而间接地提高点生成器的点云补全性能。

图 1 基于Transformer结构的多尺度点云补全算法(MSTCN)框架图

Fig. 1 The framework of point cloud multi-scale transformer based completion network (MSTCN)

2.2 多尺度特征提取器$E$

多尺度特征提取器$E$的输入由3个不同尺度的原始点云组成，其点云点数分别为${\mathit{\boldsymbol{p}}_{{\rm{num}}}} = {\rm{ }}[N, N/4, N/{\rm{ }}8\left] {{\rm{ }} = {\rm{ }}} \right[2\;048, 512, 256]$，其中$N$为原始点云点数，后两个尺度的点云由原始点云进行最远点采样(farthest point sampling，FPS)得到。然后分别依次通过特征嵌入层(feature embedding layer，FEL)和注意力层(attention)，输出对应尺度的注意力向量${\mathit{\boldsymbol{V}}_{\rm{T}}}$，经串联后通过多层感知机(multilayer perception，MLP)得到特征向量$\mathit{\boldsymbol{V}}$。

特征嵌入层(FEL)结构图如图 2所示，原始点云[Batchsize, ${\mathit{\boldsymbol{p}}_{{\rm{num}}}}$, 3]通过两次多层感知机(MLP)将逐点特征从3维升至64维，即[$B, {\mathit{\boldsymbol{p}}_{{\rm{num}}}}$, 64]。随后网络借鉴了PointNet+ +(Charles等，2017b)和DGCNN(dynamic graph CNN)(Wang等，2019)的思想，依次进行采样和K最近邻(K-nearest neighbors, KNN)聚合操作，采样点数和邻域点数的设置与该尺度的原始点云的点数相适应，目的是使得网络更加具备提取物体局部特征信息的能力。同时为了避免点的绝对坐标信息给邻域聚合操作带来不利影响，将邻域聚合之后的点云特征与邻域中心点特征的差值作为初始特征，随后通过多层感知机(MLP)并与邻域聚合之前的点云特征进行串联操作，最终经过最大值池化后，输出聚合特征${\mathit{\boldsymbol{V}}_{\rm{T}}} = \left[ {B, {\mathit{\boldsymbol{p}}_{{\rm{num}}}}/4, 128} \right]$。具体计算如下

$ \begin{array}{l} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{{\mathit{\boldsymbol{\tilde V}}}_P} = MLP\left({{\mathit{\boldsymbol{V}}_P}} \right)\\ {\mathit{\boldsymbol{V}}_E} = MP\left({C\left({MLP\left({SG\left({{{\mathit{\boldsymbol{\tilde V}}}_P}} \right) - {{\mathit{\boldsymbol{\tilde V}}}_P}} \right), RP\left({{{\mathit{\boldsymbol{\tilde V}}}_P}, k} \right)} \right)} \right) \end{array} $

(1)

图 2 特征嵌入层(FEL)结构图

Fig. 2 The framework of feature embedding layer(FEL)

式中, ${{\mathit{\boldsymbol{V}}_P}}$表示输入点云的特征，${{\mathit{\boldsymbol{\tilde V}}}_P}$表示经过$MLP$升维的点云特征，${\mathit{\boldsymbol{V}}_E}$表示FEL输出的点云聚合特征，$SG\left({} \right)$表示采样聚合操作，$RP\left({x, k} \right)$表示将$x$复制$k$倍的操作，$C$表示串联操作，$MP$表示最大值池化操作。

注意力层(attention)结构图如图 3所示，优化了PCT(Guo等，2021a)的偏移注意力(offset-attention)方法，首先对输入${\mathit{\boldsymbol{V}}_E}$进行线性变换，求取${\rm{Query}}(\mathit{\boldsymbol{Q}})$、${\rm{Key}}(\mathit{\boldsymbol{K}})$和${\rm{Value}}(\mathit{\boldsymbol{V}})$矩阵，即

$ \begin{array}{l} (\mathit{\boldsymbol{Q}}, \mathit{\boldsymbol{K}}, \mathit{\boldsymbol{V}}) = {\mathit{\boldsymbol{V}}_{\rm{E}}} \times ({\mathit{\boldsymbol{W}}_{\rm{q}}}, {\mathit{\boldsymbol{W}}_{\rm{k}}}, {\mathit{\boldsymbol{W}}_{\rm{v}}})\\ \;\;\;\;\;\mathit{\boldsymbol{Q}}, \mathit{\boldsymbol{K}} \in {{\bf{R}}^{N \times {d_a}}}, \mathit{\boldsymbol{V}} \in {{\bf{R}}^{N \times {d_e}}} \end{array} $

(2)

图 3 注意力层(attention)结构图

Fig. 3 The framework of attention layer

式中，${\mathit{\boldsymbol{W}}_{\rm{q}}}, {\mathit{\boldsymbol{W}}_{\rm{k}}}, \mathit{\boldsymbol{W}}$表示可学习的线性变换权重，$d_a$是Query和Key矩阵的维度，$d_e$是Value矩阵的维度。然后使用Query矩阵和Key矩阵通过矩阵点积计算注意力权重$\mathit{\boldsymbol{\tilde A}}$，并将其中元素${\tilde \alpha _{i, j}}$进行softmax和${\rm L}_1$范数归一化，中间特征${\mathit{\boldsymbol{V}}_{{\rm{sa}}}}$为使用相应注意力权重的Value向量的加权和。

$ \mathit{\boldsymbol{\tilde A}} = {\tilde \alpha _{i, j}} = \mathit{\boldsymbol{Q}} \cdot {\mathit{\boldsymbol{K}}^{\rm{T}}} $

(3)

$ \begin{array}{l} {{\bar \alpha }_{i, j}} = {\mathop{\rm softmax}\nolimits} \left({{{\tilde \alpha }_{i, j}}} \right) = \frac{{\exp \left({{{\tilde \alpha }_{i, j}}} \right)}}{{\sum\limits_k {\exp } \left({{{\tilde \alpha }_{k, j}}} \right)}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;{\alpha _{i, j}} = \frac{{{{\bar \alpha }_{i, j}}}}{{\sum\limits_k {{{\bar \alpha }_{i, k}}} }} \end{array} $

(4)

$ {\mathit{\boldsymbol{V}}_{{\rm{sa}}}} = \mathit{\boldsymbol{A}} \cdot \mathit{\boldsymbol{V}} $

(5)

受图卷积网络(Kipf和Welling，2017)的启发，将Transformer用于点云时，偏移注意力模块(offset-attention, OA)替换原有的自我注意力模块，可以获得更好的网络性能。具体来说，偏移注意力模块通过逐个元素之间的减法操作来计算${\mathit{\boldsymbol{V}}_{{\rm{sa}}}}$特征和${\mathit{\boldsymbol{V}}_{{\rm{E}}}}$特征的偏差值。该偏差送入多层感知机(MLP)后与${\mathit{\boldsymbol{V}}_{{\rm{E}}}}$特征再次构成残差连接，避免训练过程中的梯度消失，经最大值池化后，完成一次attention操作。最终经多次attention操作后输出注意力向量${\mathit{\boldsymbol{V}}_{{\rm{T}}}}$，即

$ {\mathit{\boldsymbol{V}}_{\rm{T}}} = {\rm{ }}OA({\mathit{\boldsymbol{V}}_{\rm{E}}}) = {\rm{ }}MLP({\mathit{\boldsymbol{V}}_{\rm{E}}} - {\mathit{\boldsymbol{V}}_{{\rm{sa}}}}) + {\mathit{\boldsymbol{V}}_{\rm{E}}} $

(6)

2.3 金字塔点生成器$G$

金字塔点生成器$G$以最终的特征向量$\mathit{\boldsymbol{V}}$作为输入，输出为3种尺度与之前对应的生成点云，其网络架构主要由全连接层(fully connected layer, FC)和Reshape操作组成。根据以往的算法(Yuan等，2018)可以看出，全连接解码器善于预测点云的整体几何结构，但因为只是用最后一层特征来进行补全，很难提取到点云的局部结构特征。因此，本文借鉴特征金字塔网络(feature pyramid network, FPN)(Lin等，2017)和PF-Net(Huang等，2020)的金字塔逐层特征提取的思想，按照从粗略到精细的过程，逐步完成点云补全操作。

如图 4所示，将特征向量$\mathit{\boldsymbol{V}}$通过全连接层，获得3个子特征向量${\mathit{\boldsymbol{V}}_i}\left({i = 1, 2, 3} \right)$特征维度分别为1 024, 512, 256，每个子特征向量负责补全不同分辨率的点云。首先利用${\mathit{\boldsymbol{V}}_3}$预测初级点云${P_3}\left({{M_3} \times 3} \right)$，然后用${\mathit{\boldsymbol{V}}_2}$预测次级点云$P_2$距离$P_3$中心点的相对坐标，通过reshape和加法(addition)操作可以根据$P_3$生成次级点云${P_2}\left({{M_2} \times 3} \right)$。同理，利用${\mathit{\boldsymbol{V}}_1}$和$P_2$可以预测最终点云$P_1$距离$P_2$中心点的相对坐标，依此补全最终点云${P_1}\left({{M_1} \times 3} \right)$，相关示意图见图 4。

图 4 从粗略到精细的点云补全过程图

Fig. 4 From sketchy to intact completion process diagram

2.4 注意力鉴别器$D$

点云补全任务隶属于生成式任务，借鉴生成对抗网络(GAN)中生成网络和鉴别网络相互促进训练的思想，本文也引入了注意力鉴别器来提高点云补全的性能。注意力鉴别器$D$同样是自编码器的结构，编码器端采用了由特征嵌入层和注意力层组成的单尺度特征提取器，将生成点云$P_1$和其对应的真实点云送入鉴别器$D$，经特征提取器计算后输出640维度的特征向量，随后通过连续的全连接层(FC)[640-512-256-128-16-1]，最终为fake或者real的二值输出。

3 实验

为了测试网络在不同类型数据集上的算法有效性，本文将MSTCN算法在两个公开数据集上进行了实验。这些数据集包括具有16个类别的ShapeNet数据集和具有10个类别的ModelNet10数据集。同时将本文算法与相关的点云补全深度学习方法进行对比，并完成了验证算法各模块有效性的消融实验。

3.1 数据集和实验参数设置

为训练本文模型并综合评价算法的点云补全性能，本文采用具有16个类别的ShapeNet数据集和具有10个类别的ModelNet10数据集。所有输入点云都是以原点为中心，其$x, y, z$三坐标数值均被规范化到区间[-1, 1]内。每个完整点云均通过在每个样本上均匀采样2 048个点得到，而不完整点云则通过在预设的5个基准点中随机选择一个点作为中心，并从完整点云数据中按给定比例删除上述中心一定半径范围内的点来生成。

使用Pytorch深度学习平台，优化器选用了ADAM(adaptive moment estimation)，初始学习率为0.000 1且每40轮衰减到原来的0.2倍，Batchsize设置为8，epoch设置为200，在多尺度特征提取器$E$和注意力鉴别器$D$中采用批归一化层(batch normalization)和ReLU激活函数，而在金字塔点生成器$G$中仅采用ReLU激活函数。在多尺度特征提取器$E$中，特征嵌入层(FEL)的采样与邻域聚合操作重复2次，注意力层操作重复4次。在金字塔点生成器$G$中，设置生成点云的点数为$[{M_1}, {M_2}, {M_3}]{\rm{ }} = \left[ {512, 128, 64} \right]$。

3.2 损失函数与评估指标

本文MSTCN算法的损失函数主要由生成损失和对抗损失两部分加权组成，生成损失用来衡量生成点云和真实点云之间的差异程度，对抗损失用来优化点云补全过程，使得生成点云看起来更加逼真。

Fan等人(2017)提出了两种比较点云之间差异的度量指标：CD和EMD(earth mover’s distance)，考虑到EMD的计算量较大，本文选择CD作为点云补全性能的评估指标，即

$ \begin{array}{l} {d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1}, {\mathit{\boldsymbol{S}}_2}} \right) = {d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right) + {d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_2} \to {\mathit{\boldsymbol{S}}_1}} \right) = \\ \;\;\;\;\;\;\frac{1}{{{\mathit{\boldsymbol{S}}_1}}}\sum\limits_{\mathit{\boldsymbol{x}} \in {\mathit{\boldsymbol{S}}_1}} {\mathop {{\rm{min}}}\limits_{\mathit{\boldsymbol{y}} \in {\mathit{\boldsymbol{S}}_2}} \left\| {\mathit{\boldsymbol{x}} - \mathit{\boldsymbol{y}}} \right\|_2^2} + \frac{1}{{{\mathit{\boldsymbol{S}}_2}}}\sum\limits_{\mathit{\boldsymbol{y}} \in {\mathit{\boldsymbol{S}}_2}} {\mathop {{\rm{min}}}\limits_{\mathit{\boldsymbol{x}} \in {\mathit{\boldsymbol{S}}_1}} \left\| {\mathit{\boldsymbol{y}} - \mathit{\boldsymbol{x}}} \right\|_2^2} \end{array} $

(7)

由式(7)可以看出，CD描述了生成点云${{\mathit{\boldsymbol{S}}_1}}$和真实点云${{\mathit{\boldsymbol{S}}_2}}$之间的平均最近平方距离，由于金字塔点生成器$G$有3个不同的尺度，总的生成损失也由3部分构成，${d_{{\rm{CD1}}}}, {d_{{\rm{CD2}}}}, {d_{{\rm{CD3}}}}$分别计算了生成点云${\mathit{\boldsymbol{P}}_1}, {\mathit{\boldsymbol{P}}_2}, {\mathit{\boldsymbol{P}}_3}$和与之对应的真实点云之间的CD值，总的生成损失的表达式为

$ \begin{array}{l} {\mathit{\boldsymbol{L}}_{{\rm{com}}}} = {d_{{\rm{CD}}1}}\left({{\mathit{\boldsymbol{P}}_1}, {\mathit{\boldsymbol{P}}_{1{\rm{GT}}}}} \right) + \alpha {d_{{\rm{CD}}2}}\left({{\mathit{\boldsymbol{P}}_2}, {\mathit{\boldsymbol{P}}_{2GT}}} \right) + \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;2\alpha {d_{{\rm{CD}}3}}\left({{\mathit{\boldsymbol{P}}_3}, {\mathit{\boldsymbol{P}}_{3{\rm{GT}}}}} \right) \end{array} $

(8)

对抗损失的表达式为

$ \begin{array}{l} \;\;{\mathit{\boldsymbol{L}}_{{\rm{adv}}}} = \sum {\log } \left({D\left({{\mathit{\boldsymbol{P}}_{{\rm{GT}}}}} \right)} \right) + \\ \sum {\log } \left({1 - D\left({G\left({E\left({{\mathit{\boldsymbol{P}}_{{\rm{inc}}}}} \right)} \right)} \right)} \right) \end{array} $

(9)

式中，${{\mathit{\boldsymbol{P}}_{{\rm{inc}}}}}$和${{\mathit{\boldsymbol{P}}_{{\rm{GT}}}}}$分别属于原始残缺点云和真实点云。$E、G、D$分别表示多尺度特征提取器、金字塔点生成器和注意力鉴别器。

总损失函数由生成损失和对抗损失两部分加权求和组成。其中$\alpha $表示生成损失中的求和权重，随着训练轮次的增加，对物体精细化结构的补全要求越高，小尺度点云的训练比重越大，故本实验取$\alpha = 0. 01、0. 05、0. 1$(对应epoch < 30、30≤epoch < 80、epoch≥80)；$\beta $表示总损失函数中的求和权重，由于生成损失对于点云的补全起决定性作用，故本实验取$\beta $=0.95。

$ {\mathit{\boldsymbol{L}}_{{\rm{total}}}} = {\rm{ }}\beta {\mathit{\boldsymbol{L}}_{{\rm{com}}}} + (1 - \beta){\mathit{\boldsymbol{L}}_{{\rm{adv}}}} $

(10)

3.3 实验结果分析

3.3.1 ShapeNet数据集

表 1给出了本文算法MSTCN与现有点云补全模型的实验对比结果，评价指标包含${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right)$和${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_2} \to {\mathit{\boldsymbol{S}}_1}} \right)$两部分，前者计算的是从生成点云中的每个点到其最接近的真实点云中的点的平均平方距离，它衡量的是生成点云较真实点云的差异程度；后者计算的是从真实点云中的每个点到其最接近的生成点云中的点的平均平方距离，它衡量的是生成点云覆盖真实点云的程度。

表 1 本文算法与其他算法在ShapeNet数据集上的实验结果对比
Table 1 Experimental results of different algorithms on ShapeNet dataset

下载CSV

类别	${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right)/{d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_2} \to {\mathit{\boldsymbol{S}}_1}} \right)$
类别	LGAN-AE	PCN	3D-Capsule	PF-Net	MSTCN
Airplane	4.075/3.828	3.808/4.242	3.932/4.672	1.252/1.262	1.246/1.299
Bag	13.996/15.009	13.328/15.355	14.564/13.645	4.178/3.870	4.296/3.949
Cap	14.88/12.632	14.61/11.966	14.497/12.726	5.168/5.231	4.157/5.397
Car	8.172/11.212	8.508/11.551	9.164/12.714	2.193/2.818	2.480/2.914
Chair	6.56/7.696	6.777/8.036	7.143/8.166	2.073/2.231	2.001/2.226
Earphone	25.674/17.566	26.525/18.341	27.957/18.639	8.114/5.092	4.826/4.871
Guitar	1.919/1.887	1.787/2.164	1.451/2.457	0.526/0.485	0.519/0.575
Knife	2.269/2.135	2.114/2.449	1.716/2.781	0.622/0.549	0.649/0.615
Lamp	11.365/14.808	9.85/15.464	11.687/14.761	3.705/4.941	3.248/5.244
Laptop	4.74/5.692	4.681/6.383	5.015/6.93	1.183/1.354	1.327/1.509
Motorbike	7.133/8.79	6.63/8.814	6.209/10.052	2.037/2.350	2.060/2.336
Mug	13.268/12.984	14.049/12.433	14.987/13.05	3.618/3.257	3.419/3.584
Pistol	5.397/5.774	4.694/5.721	5.281/6.485	1.222/1.457	1.071/1.532
Rocket	3.851/4.466	3.349/4.424	3.768/5.016	0.872/1.127	0.781/1.344
Skateboard	5.834/7.17	5.367/8.477	5.9/8.871	1.480/1.209	1.663/1.531
Table	7.299/10.006	6.911/11.188	8.057/10.931	2.262/2.525	2.240/2.362
平均	8.527/8.854	8.312/9.188	8.833/9.494	2.532/2.485	2.249/2.581
注：加粗字体为每行最优值

从表 1中可以看出，在ShapeNet数据集的实验中，整体上本文的MSTCN算法在16个类别的平均CD值${\tilde d_{{\rm{CD}}}}$指标上要优于PF-Net等其他算法，比PF-Net算法性能提升约3.73 % (由于本文实验所用的ShapeNet数据集包含了16个类别，而PF-Net中使用的ShapeNet数据集只包含了13个类别的子集，故本文测试的CD值与PF-Net论文中给出的CD值略有差异)。具体来看，帽子、椅子、汽车、耳机、灯、手枪和桌子等7类物体上的$ {d_{{\rm{CD}}}}$值优于其余网络，飞机、帽子和吉他等10类物体上的${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right)$值优于其余网络，这是由于MSTCN算法中的Transformer结构比CMLP(combined multi-layer perception)结构(Huang等，2020)更适合捕捉点云中逐点间的相关性信息，从而可以将补全过程的重心放在关键性结构的生成上，而非仅仅为了与真实点云的CD距离更近。

图 5给出了本文MSTCN算法与PF-Net算法在ShapeNet数据集部分类别中的可视化补全效果对比图。从图 5中可以看出，MSTCN算法的优势体现在以下几个方面。

图 5 本文算法与PF-Net算法在ShapeNet数据集部分类别上的补全效果图

Fig. 5 Comparison of the completion results achieved by MSTCN and PF-Net on some categories of ShapeNet dataset

((a) input point cloud; (b) PF-Net; (c) MSTCN; (d) ground truth)

1) 具有更精准的特殊结构补全能力。由图 5的桌子(Table)类的补全效果图可以看出，这个桌子不同于普通的课桌，其下方有一个踏脚横梁，而残缺点云只是显示了该横梁的左边的一小段，将其进行补全操作后，PF-Net算法很好地补充上了桌面缺失的长边，但却忽略了踏脚横梁。而MSTCN算法在补全桌面的基础上，也可以有效地检测到残缺点云中左下部分突出的一小段，并将其按照踏脚横梁进行补全。相同的情况也出现在摩托车(Motorbike)这一类别的某些样本上。可以看出残缺点云缺少了摩托车的前轮，经算法补全操作后，PF-Net准确识别出前轮的缺失，但补全的前轮比后轮偏小，且没有体现出该轮胎与轴承所构成的圆环嵌套结构，反而更像是一个汽车轮胎。而MSTCN却能在识别前轮缺失的基础上，有效恢复摩托车轮胎的圆环嵌套结构。

2) 面对类别中特殊样本具有更强的泛化能力。训练集同一类别中的大多样本都具有类似的结构，但也有一些样本结构特殊，因其训练样本较少，往往最后的补全效果不佳。如帽子(Cap) 类的示例图，大多数的帽子都是鸭舌帽形状，其有较水平或者球面的上顶。但一旦面对如图的军官大檐帽形状，且缺失了帽子后檐，经算法补全操作后，PF-Net仍将其按照传统帽子补全了一个水平的上顶。而MSTCN虽也难以避免受大多数鸭舌帽样本的影响补全水平上顶，但其也关注到了该大檐帽缺失的倾斜上顶的后端，并尝试对其补全。同样的例子也体现在耳机(Earphone)类别中，MSTCN比PF-Net更有意识地对较宽耳机的横梁进行补全。

3.3.2 ModelNet10数据集

ModelNet10数据集具有10个不同类别，且样本更具结构多样性，本文在其上针对MSTCN和PF-Net两种算法进行了点云补全实验, 如表 2所示，发现MSTCN算法的优势更加明显。整体上看，MSTCN算法在ModelNet10数据集中10个类别的平均CD值${\bar d_{{\rm{CD}}}}$指标上要优于PF-Net算法，比PF-Net算法性能提升约12.75 %。具体来看，除去床(Bed)这一类别外，剩下的9个类别的${d_{{\rm{CD}}}}$值，MSTCN均优于PF-Net算法，说明MSTCN算法可以较好克服样本数量少的困难，在面对具有新颖结构物体的情况下具有更好的模型泛化性能。

表 2 本文算法与PF-Net在ModelNet10上的实验结果对比
Table 2 Comparative results of MSTCN and PF-Net on ModelNet10

下载CSV

类别	${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right)/{d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_2} \to {\mathit{\boldsymbol{S}}_1}} \right)$
类别	PF-Net	MSTCN
Bathtub	3.93/8.48	4.36/7.13
Bed	3.05/3.02	3.32/3.07
Chair	3.83/3.83	3.51/3.78
Desk	5.25/7.53	4.14/4.96
Dresser	3.97/4.30	4.01/3.72
Monitor	6.04/5.39	4.19/4.11
NightStand	4.19/4.75	4.24/4.56
Sofa	3.72/5.02	3.51/3.58
Table	3.14/3.40	2.34/3.16
Toilet	5.04/6.18	5.23/5.20
平均	4.22/5.19	3.88/4.33
注：加粗字体为每行最优值。

3.3.3 消融实验

本文模型主要有特征嵌入层(FEL)、注意力层(attention)和注意力鉴别器(discriminator，D)3个模块。为了验证各模块的有效性，本文在ShapeNet数据集上进行了4组消融实验，其网络模型设置为完整的MSTCN模型和3个分别仅去除上述模块之一的对照模型。

表 3给出了消融实验的结果，图 6给出了飞机类别的消融实验可视化结果图。从结果中可以看出，实验[M]取得了最低的类别平均CD值，并且是唯一补全了飞机光滑的机身和轮廓分明的左翼发动机的实验，说明完整的MSTCN模型在4组模型中具有最好的点云补全效果。

表 3 消融实验对比结果
Table 3 Comparison results of ablation studies

下载CSV

实验编号	网络模型	${d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_1} \to {\mathit{\boldsymbol{S}}_2}} \right)/{d_{{\rm{CD}}}}\left({{\mathit{\boldsymbol{S}}_2} \to {\mathit{\boldsymbol{S}}_1}} \right)$
[A]	MSTCN w/o FEL	2.615/2.878
[B]	MSTCN w/o attention	2.896/3.121
[C]	MSTCN w/o discriminator	2.321/2.594
[M]	MSTCN	2.249/2.581
注：w/o X代表缺失X模块。

图 6 飞机类别的消融实验可视化结果图

Fig. 6 The visualization of ablation studies of airplane

((a) input point cloud; (b) experiment [A]; (c) experiment [B]; (d) experiment [C]; (e) experiment [M]; (f) ground truth)

具体来看，实验[A]在MSTCN的基础上去除了特征嵌入层，类别平均CD值上升到了2.615/2.878，对飞机左翼发动机轮廓的补全不够清晰，说明特征嵌入层的引入增加了模型对物体局部结构信息的提取能力。实验[B]在MSTCN的基础上去除了注意力层，类别平均CD值激增到2.896/3.121，飞机左翼周围有“悬浮”的杂点，并且对于机身的补全也不够光滑，说明注意力层的引入可使得模型在物体补全时有选择性地参考输入点云的局部结构，比如，在补全飞机左翼时，完整且轮廓清晰的右翼就可作为很好的参考。实验[C]在MSTCN的基础上去除了注意力鉴别器，类别平均CD值小幅上升到2.321/2.594，飞机左翼发动机补全效果较好，但机身的补全存在断层现象，说明注意力鉴别器的引入可提升网络模型的物体补全效果。综上所述，4组消融实验验证了特征嵌入层、注意力层和注意力鉴别器3个模块的有效性。

为了验证MSTCN模型对于不同缺失比例的输入点云的补全稳健性，本文在ShapeNet数据集基础上，通过改变输入点云和生成点云的点数，训练了3组针对不同缺失比例的点云补全子模型。实验结果如图 7所示，选取椅子这一类别，分别给出了在不同缺失比例情况下，MSTCN模型的点云补全结果。其中25 %、50 %和75 % 表示3组输入点云相对于真实点云分别损失了25 %、50 %和75 % 比例的原始点。根据图 7可以看出，虽然输入点云的点数逐渐减少，但MSTCN模型始终维持着较好的点云补全效果，其中25 %和50 % 残缺比例的补全结果有着相近的CD值。即使缺失比例达到75 % 时，椅子类别的CD值也保持在2.074/2.456的较低水平，仅根据残缺的椅子腿，就可以识别并补全出完整的椅子形状。综上所述，该实验验证了MSTCN在处理不同缺失程度的输入点云时具有较强的补全稳健性。

图 7 不同缺失比例点云补全稳健性测试

Fig. 7 Robustness test of different missing ratio of point cloud

((a) ground truth point cloud; (b) 25 % missing ratio; (c) 50 % missing ratio; (d) 75 % missing ratio)

4 结论

本文提出一种新的点云补全框架，在自编码器结构的基础上，将Transformer注意力模块引入点云补全网络中，从而增加了各样本中点云逐点之间的相关性信息，可以更好地提取物体的局部细节特征。网络设计了多尺度并行处理结构，在编码端和解码端都按照3种分辨率来进行点云补全，提高了物体精细结构的补全效果。同时本文借鉴了GAN算法的思想，增添了注意力鉴别器，使之与点云生成器相互促进，提高了网络的点云补全性能。与多个相关的点云补全算法相比，本文算法在ShapeNet和ModelNet10两个数据集上的点云补全评估指标${d_{{\rm{CD}}}}$取得了显著提升，与PF-Net算法相比分别提升了3.73 %和12.75 %。但值得注意的是，本文算法进行点云补全操作时，出现了局部点数过多，补全后物体局部的点云密度不一致的问题。因此设计一种新的、高效的点云间差异度评估算法，防止训练时损失函数陷入局部最优解，使得补全后的点云更加规整、光滑，将是未来点云补全的主要研究方向之一。

参考文献

Achlioptas P, Diamanti O, Mitliagkas I and Guibas L. 2018. Learning representations and generative models for 3D point clouds//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 40-49

Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q X, Li Z M, Savarese S, Savva M, Song S R, Su H, Xiao J X, Li Y and Yu F. 2015. ShapeNet: an information-rich 3D model repository[EB/OL]. [2021-06-20]. https://arxiv.org/pdf/1512.03012.pdf

Charles R Q, Li Y, Su H and Guibas L J. 2017b. PointNet ++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5105-5114

Charles R Q, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16]

Dai A, Qi C R and Nieβner M. 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2463-2471[DOI: 10.1109/CVPR.2017.693]

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-06-05]. https://arxiv.org/pdf/2010.11929v1.pdf

Fan H Q, Su H and Guibas L. 2017. A point set generation network for 3D object reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2463-2471[DOI: 10.1109/CVPR.2017.264]

Goodfellow I J, Jean P A, Mirza M, Xu B, David W F, Ozair S, Courville A and Bengio Y. 2014. Generative Adversarial Nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: 2672-2680

Guo M H, Cai J X, Liu Z N, Mu T J, Martin R R, Hu S M. 2021a. PCT: point cloud transformer. Computational Visual Media, 7(2): 187-199 [DOI:10.1007/s41095-021-0229-5]

Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L, Bennamoun M. 2021b. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12): 4338-4364 [DOI:10.1109/TPAMI.2020.3005434]

Han X G, Li Z, Huang H B, Kalogerakis E and Yu Y Z. 2017. High-resolution shape completion using deep neural networks for global structure and local geometry inference//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 85-93[DOI: 10.1109/ICCV.2017.19]

Huang Z T, Yu Y K, Xu J W, Ni F and Le X Y. 2020. PF-Net: point fractal network for 3D point cloud completion//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7659-7667[DOI: 10.1109/CVPR42600.2020.00768]

Kipf T N and Welling M. 2017. Semi-supervised classification with graph convolutional networks[EB/OL]. [2021-06-20]. https://arxiv.org/pdf/1609.02907.pdf

Li Y Y, Dai A, Guibas L, Nieβner M. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum, 34(2): 435-446 [DOI:10.1111/cgf.12573]

Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[DOI: 10.1109/CVPR.2017.106]

Long X X, Cheng X J, Zhu H, Zhang P J, Liu H M, Li J, Zheng L T, Hu Q Y, Liu H, Cao X, Yang R G, Wu Y H, Zhang G F, Liu Y B, Xu K, Guo Y L, Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics, 26(6): 1389-1428 (龙霄潇, 程新景, 朱昊, 张朋举, 刘浩敏, 李俊, 郑林涛, 胡庆拥, 刘浩, 曹汛, 杨睿刚, 吴毅红, 章国锋, 刘烨斌, 徐凯, 郭裕兰, 陈宝权. 2021. 3维视觉前沿进展. 中国图象图形学报, 26(6): 1389-1428) [DOI:10.11834/jig.210043]

Mitra N J, Pauly M, Wand M, Ceylan D. 2013. Symmetry in 3D geometry: extraction and applications. Computer Graphics Forum, 32(6): 1-23 [DOI:10.1111/cgf.12010]

Nguyen D T, Hua B S, Tran M K, Pham Q H and Yeung S K. 2016. A field model for repairing 3D shapes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5676-5684[DOI: 10.1109/CVPR.2016.612]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin L. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: NIPS: 6000-6010

Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M, Solomon J M. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): #146 [DOI:10.1145/3326362]

Wen X, Li T Y, Han Z Z and Liu Y S. 2020. Point cloud completion by skip-attention network with hierarchical folding//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1936-1945[DOI: 10.1109/CVPR42600.2020.00201]

Wu Z R, Song S R, Khosla A, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920[DOI: 10.1109/CVPR.2015.7298801]

Xie H Z, Yao H X, Zhou S C, Mao J G, Zhang S P and Sun W X. 2020. GRNet: gridding residual network for dense point cloud completion//Proceedings of the 16th European Conference on Computer Vision. Glasgow, Scotland: Springer: 365-381[DOI: 10.1007/978-3-030-58545-7_21]

Yang Y Q, Chen F, Shen Y R and Tian D. 2018. FoldingNet: point cloud auto-encoder via deep grid deformation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 206-215[DOI: 10.1109/CVPR.2018.00029]

Yuan W T, Khot T, Held D, Mertz C and Hebert M. 2018. PCN: point completion network//Proceedings of 2018 International Conference on 3D Vision (3DV). Verona, Italy: IEEE: 728-737[DOI: 10.1109/3DV.2018.00088]

Zhao H S, Jiang L, Jia J Y, Torr P and Koltun V. 2021. Point transformer[EB/OL]. [2021-06-20]. https://arxiv.org/pdf/2012.09164.pdf

Zhao W, Gao S M, Lin H W. 2007. A robust hole-filling algorithm for triangular mesh. The Visual Computer, 23(12): 987-997 [DOI:10.1007/s00371-007-0167-y]