发布时间: 2022-02-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.210550
2022 | Volume 27 | Number 2

三维形状分析

边收缩池化的网格变分自编码器

袁宇杰^1,2, 来煜坤³, 杨洁^1,2, 段琦⁴, 傅红波⁵, 高林^1,2

1. 中国科学院计算技术研究所, 北京 100190;

2. 中国科学院大学, 北京 100049;

3. 英国卡迪夫大学, 卡迪夫 CF24 4AG, 英国;

4. 商汤科技, 上海 200233;

5. 香港城市大学, 香港 999077

收稿日期: 2021-07-14; 修回日期: 2021-11-10; 预印本日期: 2021-11-17

基金项目: 国家自然科学基金项目（62061136007，61872440）；北京市自然科学基金项目（L182016）；英国皇家学会牛顿高级学者基金项目（NAFR2192151）；中国科学院青年创新促进会基金项目（2019108）；之江实验室开放课题基金项目（2021KE0AB06）

作者简介: 袁宇杰, 1996年生, 男, 博士研究生, 主要研究方向为计算机图形学和几何处理。E-mail: yuanyujie@ict.ac.cn
来煜坤, 男, 教授, 主要研究方向为计算机图形学、几何处理、图像处理和计算机视觉。E-mail: LaiY4@cardiff.ac.uk
杨洁, 男, 博士研究生, 主要研究方向为计算机图形学和几何处理。E-mail: yangjie01@ict.ac.cn
段琦, 男, 博士, 主要研究方向为计算机视觉、计算机图形学和医学影像可视化技术。E-mail: duanqi@sensetime.com
傅红波, 男, 教授, 主要研究方向为计算机图形学和人机交互。E-mail: fuplus@gmail.com
高林, 通信作者, 男, 副研究员, 主要研究方向为计算机图形学和几何处理。E-mail: gaolin@ict.ac.cn
*通信作者: 高林 gaolin@ict.ac.cn

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2022)02-0511-14

摘要

目的 3D形状分析是计算机视觉和图形学的一个重要研究课题。虽然现有方法使用基于图的卷积将基于图像的深度学习推广到3维网格，但缺乏有效的池化操作限制了其网络的学习能力。针对具有相同连通性，但几何形状不同的网格模型数据集，本文利用网格简化的边收缩操作建立网格层次结构，提出了一种新的网格池化操作。方法本文改进了传统的网格简化方法，以避免生成高度不规则的三角形，利用改进的网格简化方法定义了新的网格池化操作。网格简化的边收缩操作建立的网格层次结构之间存在对应关系，有利于网格池化的定义。新定义的池化操作有效地编码了层次结构中较粗糙和较稠密网格之间的对应关系。最后提出了一种带有边收缩池化和图卷积的变分自编码器（variational auto-encoder，VAE）结构，以探索3D形状的隐空间并用于3D形状的生成。结果由于引入了新定义的池化操作和图卷积操作，提出的网络结构比原始MeshVAE需要的参数更少，因此可以处理更稠密的网格模型。结论实验表明提出的方法具有更好的泛化能力，并且在各种应用中更可靠，包括形状生成、形状插值和形状嵌入。

关键词

网格生成; 网格插值; 变分自编码器(VAE); 网格池化; 边收缩

Mesh variational auto-encoders with edge contraction pooling

Yuan Yujie^1,2, Lai Yukun³, Yang Jie^1,2, Duan Qi⁴, Fu Hongbo⁵, Gao Lin^1,2

1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;

2. University of Chinese Academy of Sciences, Beijing 100049, China;

3. School of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, UK;

4. SenseTime Research, Shanghai 200233, China;

5. City University of Hong Kong, Hong Kong 999077, China

Supported by: National Natural Science Foundation of China(62061136007, 61872440);Beijing Municipal Natural Science Foundation(L182016); Royal Society Newton Advanced Fellowship, Britain (NAF\R2\192151); Youth Innovation Promotion Association, Chinese Academy of Sciences (2019108);Open Research Projects of Zhejiang Laboratory(2021KE0AB06)

Abstract

Objective 3D shape datasets have been tremendous facilitated nowadays. Data-driven 3D shape analysis has been an active research topic in computer vision and graphics. Apart from regular works, current data-driven works attempted to generalize deep neural networks from images to 3D shapes, including triangular meshes, point clouds and voxel data. Deep neural networks for triangular meshes have been concentrated. 3D meshes have complicated and irregular inter-connection. Most current works tend to keep mesh connectivity unchanged each layer, thus, losing the capability of increased receptive fields when pooling operations are applied. The variational auto-encoder (VAE) has been widely used in various kinds of generation tasks, including generation, interpolation and exploration on triangular meshes. Based on a fully-connected network, the initial MeshVAE requires mega parameters and its generalization capability is often weak. Although the fully connected layers allow changes of mesh connectivity across layers, due to irregular changes, such approaches cannot be directly generalized to convolutional layers. Some works adopt convolutional layers in the VAE structure. However, such convolution operations cannot change the connectivity of the mesh. Sampling operation is also evolved in convolutional neural networks(CNNs) on meshes, but the mesh sampling strategy does not aggregate the whole local neighborhood information when reducing the quantities of vertices. Hence, it is necessary to design a pooling operation for meshes similar to the pooling for images to reduce the amount of network parameters in order to deal with denser models and enhance the generalization ability of the network. Moreover, the defined pooling can support further convolutions and conduct recovery via a corresponding de-pooling operation. Method A novel mesh pooling operation is illustrated based on edge contraction. The VAE architecture in context of the newly defined pooling operation is built up as well. Mesh simplification is applied to organize a mesh hierarchy with different levels of details, and achieves effective pooling by keeping track of the mapping between coarser and finer meshes. To avoid generating highly irregular triangles in mesh simplification, a modified mesh simplification approach is demonstrated based on the classical edge contraction algorithm. The edge length is an essential indicator for the edge contraction process. So, as one of the criteria, the edge length is incorporated to order pairs of points. The new edge length is added to the original quadric error formulation straightforward. The feature of a new vertex is defined as the average feature of the contracted vertices for average pooling, and alternative pooling operations can be similarly ruled. In the decoding process, the features of the vertices on the simplified mesh are equally assigned to the corresponding contracted vertices on the dense mesh for the inverse operation, de-pooling. The input to the illustrated network is a vertex-based deformation feature representation, which is different from 3D coordinates, encodes deformations defined on vertices in terms of deformation gradients analysis. The demonstrated framework uses a cluster of 3D shapes with the same connectivity to train the network. Such meshes can be easily obtained via consistent re-meshing. The network follows a VAE architecture where pooling operations and graph convolutions are applied. It has qualified generalization capabilities and handles much higher resolution meshes in various applications, such as shape generation and interpolation. Result The framework is tested on four datasets, shape completion and animation of people (SCAPE), Swing, Fat and Hand. The capability of the network is tested to generate unseen shapes, and calculate the average root mean squared (RMS) errors. The network with the proposed pooling and without pooling has been initially compared. The RMS error is lower by an average of 6.92% with pooling, which shows the benefits of our pooling and de-pooling operations. The comparisons between the proposed pooling and other pooling or sampling methods are illustrated. The RMS error of the proposed pooling for unseen data is lower on average by 9.34% compared to initial simplification-based pooling, 9.07% compared to uniform remeshing method, 8.06% compared to graph pooling, and 9.64% compared to mesh sampling, which illustrates this modified simplification algorithm is more effective in terms of pooling and the proposed pooling is superior on multiple datasets, demonstrating its generalization capability. The proposed framework is also compared with related mesh-based auto-encoder architectures. Thanks to spectral graph convolutions and the proposed pooling, the method reduces the reconstruction errors of unseen data consistently, showing superior generalizability. For instance, compared with one work which uses the same per-vertex features, the designed network achieves 29% and 32% lower average RMS reconstruction errors on the SCAPE and Face datasets. MeshCNN is compared and the proposed network achieves better results. Moreover, the capability of our framework is demonstrated in shape generation, shape interpolation and shape embedding. Conclusion A newly defined pooling operation, based on a modified mesh simplification algorithm, is integrated into a mesh variational auto-encoder architecture. Our generative model has its good generalization capability. Compared to the original MeshVAE, our method can generate high quality deformable models with richer details.

Key words

mesh generation; mesh interpolation; variational auto-encoder(VAE); mesh pooling; edge contraction

0 引言

近年来，互联网上的3D模型数据集呈现井喷式的发展。数据驱动的3D形状分析一直是计算机视觉和计算机图形领域的一个热门研究课题。除了传统的数据驱动工作(Gao等，2017)，更多工作试图将深度神经网络方法从2D图像扩展到3D模型，如三角网格(Tan等，2018a, b；Litany等，2018)、点云(Qi等，2017a)、体素(Wu等，2016；Maturana和Scherer，2015)等。本文专注于三角网格的深度神经网络。与图像不同，三角网格具有复杂且不规则的连通性。大多数现有工作倾向于保持层与层之间的网格连接不变，从而失去了应用池化操作时增加感受野的能力。

变分自编码器(variational auto-encoder, VAE)(Kingma和Welling，2014)作为一种生成网络已广泛应用于各种任务，包括2维的人脸图像修复(张雪菲等，2020)以及3维的三角网格的生成、插值和浏览(Tan等，2018b)。翟正利等人(2019)对变分自编码器模型及其衍生模型进行了综述。最初的MeshVAE(Tan等，2018b)使用全连接层，需要大量参数，泛化能力往往较弱。尽管全连接层允许层间网格连接的变化，但由于不规则的变化，全连接层后不能直接应用卷积层。一些工作(Litany等，2018；Gao等，2018)在VAE结构中采用卷积层。然而，这样的卷积操作并不能改变网格的连通性。Ranjan等人(2018)在网格上的卷积网络中引入了采样操作，但其采样策略在减少顶点数量时不会聚合所有局部邻域信息。因此，为了处理更稠密网格的模型，增强网络的泛化能力，有必要设计一种类似于图像池化的网格池化操作形式，以减少网络参数的数量。此外，本文希望定义的池化可以支持进一步的卷积并允许通过相应的反池化操作恢复到原始网格。

本文提出一种具有新定义的池化操作的VAE架构，如图 1所示。图中$\varepsilon $是随机变量，满足高斯分布$N\left({0, \mathit{\boldsymbol{I}}} \right), \mathit{\boldsymbol{I}}$是全为1的向量，$ \otimes $和$ \oplus $分别表示乘和加。该方法使用网格简化的边收缩操作来形成具有不同细节层次的网格层次结构，并通过跟踪粗细网格之间的映射来实现有效的池化。为了避免在网格简化过程中生成高度不规则的三角形，引入了一种基于经典方法(Garland和Heckbert，1997)的改进的网格简化方法。新定义的池化操作能有效聚合局部邻域信息，增强网络的泛化能力。网络的输入是基于顶点的变形特征表示(Gao等，2021)，与3D坐标不同，它使用定义在顶点上的变形梯度对变形进行编码。提出的框架使用一组具有相同连接关系的3D形状来训练网络。通过一致的重网格操作可以很容易地获得这样的网格。此外，本文在网络中采用了图卷积操作(Defferrard等，2016)。总体而言，本文提出的网络遵循VAE架构，其中应用了池化操作和图卷积。正如在实验部分展示的，本文网络不仅具有更好的泛化能力，而且可以处理更高分辨率的网格，从而有利于各种应用，如形状生成、插值和嵌入。

图 1 本文网络结构

Fig. 1 Our network architecture

1 相关工作

1.1 3D模型的深度学习方法

针对3D模型的深度学习方法受到越来越多的关注。Boscaini等人(2016a, b)将卷积神经网络从欧几里德域推广到非欧几里德域，这有助于3D形状分析，比如建立3D形状之间的对应关系。Bronstein等人(2017)概述了在非欧氏域上利用卷积神经网络的方法，包括在图结构和网格上。Masci等人(2015)通过对以测地极坐标表示的局部面片应用滤波器，提出了第1个网格卷积运算。Sinha等人(2016)将3D形状转换为几何图像以获得欧氏参数化表示，标准的卷积神经网络即可被应用。Wang等人(2017, 2018)提出了基于八叉树的卷积用于3D形状分析。与局部面片、几何图像或八叉树结构不同，本文工作使用顶点特征(Gao等，2021)为输入进行卷积运算。

为了分析具有相同连通性但不同几何形状的网格模型，MeshVAE(Tan等，2018b)首次将变分自编码器网络结构引入到3维网格数据，并通过各种应用证明了其有效性。Tan等人(2018a)使用卷积自编码器从具有大尺度变形的网格数据集中提取局部变形分量。Gao等人(2018)提出了一种将卷积网格变分自编码器与循环一致对抗网络CycleGAN (Zhu等，2017)相结合的网络，用于没有配对的形状之间的全自动变形传播。Tan等人(2018a)和Gao等人(2018)在网格上使用了基于空域的卷积运算，而Defferrard等人(2016)和Henaff等人(2015)的工作通过在频域的构造将卷积神经网络扩展到不规则图上，与空域卷积相比表现出更优越的性能。与Defferrard等人(2016)和Yi等人(2017)相同，本文工作也在频域中执行卷积运算。

虽然池化操作在图像处理的深度网络中得到了广泛的应用，但现有的基于网格的VAE方法有些不支持池化(Tan等，2018b；Gao等，2018)，有些使用简单的采样过程(Ranjan等，2018)，无法聚合所有的局部邻域信息。实际上，Ranjan等人(2018)提出的采样方法也是基于一种简化算法，但是该方法直接舍弃顶点，然后利用三角形的重心坐标通过插值来恢复丢失的顶点。相反，本文池化操作可以通过记录简化过程来聚合局部信息，并支持池化操作的直接逆向，从而有效地实现反池化操作。

Hanocka等人(2019)提出了MeshCNN，一种基于卷积神经网络(convolutional neural network, CNN)的网格神经网络，用于网格分类和分割。该框架包含一个动态网格池化操作，该操作根据特定任务进行网格简化。相反，本文在静态网格简化算法的基础上定义了池化操作，因为本文网络旨在编码和生成高质量的网格模型，静态简化算法保证了层次结构的一致性，从而更好地保留了几何细节，更具鲁棒性。

1.2 均匀采样或池化方法

以点云作为输入，PointNet+ +(Qi等，2017b)提出了一种均匀采样方法，能用于点云的神经网络中。基于相同的思想，TextureNet(Huang等，2019)对网格的顶点也进行均匀采样，但这种采样方法破坏了顶点之间的连接关系，将网格数据转化为点云，不支持进一步的图卷积。相反，简化方法可以建立网格层次结构，因此可以帮助执行网格池化操作。然而，大多数简化方法，如Garland和Heckbert(1997)都是保持几何形状的，但是简化网格上的顶点可能是高度不均匀的。重网格化操作(Botsch和Kobbelt，2004)可以构建均匀的简化网格，但会丢失层次结构中网格之间的对应关系。在经典方法(Garland和Heckbert，1997)的基础上，本文提出了一种改进的网格简化方法，对新定义的网格池化和反池化操作进行更均匀的网格简化，并记录粗糙网格和稠密网格之间的对应关系。

1.3 变形网格的表示和应用

为了更好地表示3D网格，一种直接的方法是使用3D形状的顶点坐标。然而，顶点坐标既不具有平移不变性，也不具有旋转不变性，这给大尺度变形的学习带来了困难。相反，本文使用了一种最新的3维形状变形表示方法(Gao等，2021)，与另一种广泛使用的表示方法(Gao等，2016)相比，它具有在顶点记录变形，使得图卷积和池化操作更容易实现的优点。

形状生成和插值是网格数据的常见应用。利用VAE结构，MeshVAE(Tan等，2018b)可以生成更多可变形的网格形状，而Ranjan等人(2018)中的方法可以从隐含空间生成具有生动表情的3维人脸。事实上，这些基于VAE的方法(Tan等，2018b；Ranjan等，2018；Litany等，2018)也可以用于形状插值。形状插值是一个得到广泛研究的课题。现有的网格插值方法主要分为基于几何的方法(Huber等，2017)和数据驱动的方法(Gao等，2017)。后者利用了形状数据集中隐藏的信息，使得插值结果更加合理可靠。实验结果表明，MeshVAE(Tan等，2018b)比现有的数据驱动方法(Gao等，2017)能取得更好的结果。但是，Tan等人(2018b)无法处理包含太多顶点的形状(例如，来自Sumner和Popović (2004)的大象模型)，这限制了生成的网格形状的分辨率。虽然Ranjan等人(2018)方法在人脸形状上表现良好，但在人体(Vlasic等，2008)上的重建效果并不理想。本文工作也基于VAE架构，因此可以自然地用于形状生成和形状插值。框架改进了MeshVAE，并且具有更好的泛化能力。

2 方法

2.1 网格简化

使用网格简化的边收缩算法来帮助构建可靠的池化操作。为此，网格简化不仅创建了具有不同细节级别的网格层次结构，而且还确保了较粗网格和较细网格之间的对应关系。网格简化过程基于经典的边收缩方法(Garland和Heckbert，1997)，该方法根据衡量形状变化的度量依次重复边收缩操作。然而，原始方法不能保证简化的网格包含均匀分布的三角形。为了实现更有效的池化，较粗网格中的每个顶点都应对应于相似大小的区域。

观察到边的长度是整个过程中的一个重要指标。为了避免收缩形成长边，将边长作为排序要简化的点对的指标之一。原始方法中将在顶点$\mathit{\boldsymbol{v}} = {\left[ {{v_x}, {v_y}, {v_z}, 1} \right]^{\rm{T}}}$处的误差定义为二次形式${\mathit{\boldsymbol{v}}^{\rm{T}}}\mathit{\boldsymbol{Qv}}, \mathit{\boldsymbol{Q}}$是基本二次误差的和，参见Garland和Heckbert(1997)中的定义。对于给定的边收缩$\left({{\mathit{\boldsymbol{v}}_1}, {\mathit{\boldsymbol{v}}_2}} \right) \to \mathit{\boldsymbol{\bar v}}$，简单地选择使用$\mathit{\boldsymbol{\overline Q}} = {\mathit{\boldsymbol{Q}}_1} + {\mathit{\boldsymbol{Q}}_2}$作为新矩阵，$\mathit{\boldsymbol{\overline Q}} $近似于$\mathit{\boldsymbol{\overline v}} $处的误差。所以$\mathit{\boldsymbol{\overline v}} $处的误差将是${\mathit{\boldsymbol{\overline v}} ^{\rm{T}}}\mathit{\boldsymbol{\overline Q}} \mathit{\boldsymbol{\overline v}} $。本文提出将新的边长添加到原始的简化误差度量中。具体来说，给定一条边$\left({{\mathit{\boldsymbol{v}}_i}, {\mathit{\boldsymbol{v}}_j}} \right)$要收缩到一个新的顶点${\mathit{\boldsymbol{\overline v}} _k}$，总误差定义为

$ \begin{array}{l} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;E = \mathit{\boldsymbol{\overline v}} _k^{\rm{T}}{\mathit{\boldsymbol{\overline Q}} _k}{\mathit{\boldsymbol{\overline v}} _k} + \\ \lambda \max \left\{ {{L_{km}}, {L_{kn}}\mid m} \right.\left. { \in {\mathit{\boldsymbol{N}}_i}, n \in {\mathit{\boldsymbol{N}}_j}, m \ne j, n \ne i} \right\} \end{array} $

(1)

式中，${L_{km}}$(或${L_{kn}}$)是顶点$k$和顶点$m$(或$n$)之间的新边长。${\mathit{\boldsymbol{N}}_i}$(或${\mathit{\boldsymbol{N}}_j}$)是顶点$i$(或顶点$j$)的相邻顶点的集合，$\lambda $是权重。需格外注意的是，仅惩罚新创建的顶点${\mathit{\boldsymbol{\overline v}} _k}$周围的最大边长，来有效避免边太长的三角形。在所有的实验中，收缩了一半顶点以支持有效的池化。一个有代表性的简化例子如图 2所示，它清楚地展示了修改后的简化算法的效果。改进的简化算法相对于原始简化算法在池化和形状重建任务上的优势将在第3节中讨论。

图 2 比较网格简化算法

Fig. 2 Comparison of the mesh simplification algorithm

((a) original mesh; (b) Garland and Heckbert (1997); (c)ours)

2.2 池化和反池化

网格简化通过重复的边收缩操作来实现，即将两个相邻的顶点收缩为一个新的顶点。利用这个过程来定义网格池化操作。将采用平均池化应用于接下来的网络框架，其他类型的池化操作也可以类似地定义。在边收缩步骤之后，将新顶点的特征定义为收缩顶点的平均特征。如图 3所示，通过边收缩将红色顶点简化为绿色顶点，并对红色顶点的特征进行平均，得到绿色顶点的特征。这确保池化操作在相应简化区域有效进行。这个过程有一些优点：它既保留了正确的拓扑结构以支持多层卷积或池化，也使感受野得到很好的定义。由于变分自编码器网络具有解码器结构，因此还需要正确定义反池化操作。同样利用了简化关系，将反池化定义为逆操作：简化后的网格上顶点的特征均等分配给稠密网格上相应的被收缩顶点。

图 3 使用简化算法来引入网格上的池化操作

Fig. 3 Using a simplification algorithm to introduce pooling operation on meshes

2.3 图卷积

为了形成一个完整的神经网络架构，采用Defferrard等人(2016)介绍的图卷积。假设输入是矩阵$\mathit{\boldsymbol{x}}$，卷积操作的输出为矩阵$\mathit{\boldsymbol{y}}$，其中$\mathit{\boldsymbol{x}}, \mathit{\boldsymbol{x}}$的每行对应一个顶点，每列对应一个特征维度。令$\mathit{\boldsymbol{L}}$表示归一化图拉普拉斯算子。网络中使用的频域图卷积定义为

$ \mathit{\boldsymbol{y}} = {g_\theta }(\mathit{\boldsymbol{L}})\mathit{\boldsymbol{x}} = \sum\limits_{h = 0}^{H - 1} {{\theta _h}} {\mathit{\boldsymbol{T}}_h}(\mathit{\boldsymbol{\widetilde L}})\mathit{\boldsymbol{x}} $

(2)

式中，$\mathit{\boldsymbol{\tilde L}} = 2\mathit{\boldsymbol{L}}/{\lambda _{\max }} - \mathit{\boldsymbol{I}}, {\lambda _{\max }}$是最大的特征值，$\theta \in {{\bf{R}}^H}$是多项式系数，并且${\mathit{\boldsymbol{T}}_h}(\mathit{\boldsymbol{\widetilde L}}) \in {{\bf{R}}^{V \times V}}$是在$\mathit{\boldsymbol{\widetilde L}}$处的$h$阶切比雪夫多项式。$H-1$是切比雪夫多项式最高阶数。

2.4 网络结构

如图 1所示，本文网络建立在具有平均池化操作和图卷积操作的变分自编码器之上。编码器的输入是预处理后的ACAP(as-consistent-as-possible)特征(Gao等，2021)，每个维度被线性缩放到[-0.95, 0.95] 以便使用tanh激活函数，其大小为$\mathit{\boldsymbol{X}} \in {{\bf{R}}^{V \times 9}}$，其中$V$为顶点的数量，9是变形表示的维度，其中3维表示旋转，6维表示缩放。该表示有效地编码局部变形并能很好地处理大尺度的旋转。与使用全连接层的原始MeshVAE(Tan等，2018b)不同，本文网络的编码器先由两个图卷积层和一个池化层组成，然后是另一个图卷积层。最后一个卷积层的输出通过两个不同的全连接层映射到一个均值向量和一个偏差向量。均值向量没有激活函数，偏差向量使用sigmoid函数作为激活函数。引入随机变量$\varepsilon \sim N\left({0, \mathit{\boldsymbol{I}}} \right), \mathit{\boldsymbol{I}}$是全为1的向量，来完成变分自编码器的重参数化过程(Kingma和Welling，2014)，让导数能够顺利地反向传播。解码是编码的逆过程，但使用不同的卷积权重，所有层都使用tanh激活函数。与池化操作相对应，第2.2节中描述的反池化操作将较粗的网格中的特征映射到较细的网格。整个网络的输出为$\mathit{\boldsymbol{\hat X}} \in {{\bf{R}}^{V \times 9}}$，其与输入具有相同的维度，同时可以重新缩放回变形表示用于重构模型形状。为了训练提出的VAE网络，使用均方误差作为重建损失。结合KL(Kullback-Leibler)散度(Kullback和Leibler，1951)，模型的总损失函数定义为

$ L = \frac{1}{{2M}}\sum\limits_{i = 1}^M {\left\| {{\mathit{\boldsymbol{X}}^i} - {{\mathit{\boldsymbol{\hat X}}}^i}} \right\|_{\rm{F}}^2} + \alpha {D_{{\rm{KL}}}}(q(z\mid \mathit{\boldsymbol{X}})\parallel p(z)) $

(3)

式中，${{\mathit{\boldsymbol{X}}^i}}$和${{{\mathit{\boldsymbol{\hat X}}}^i}}$分别表示第$i$个模型的预处理特征和网络的输出，${\left\| {\; \cdot \;} \right\|_{\rm{F}}}$是矩阵的Frobenius范数，$M$是数据集中的所有形状数量，$\alpha $是调整重建损失和$\rm{KL}$散度之间平衡的参数。$z$为隐变量，$p\left(z \right)$是先验概率，$q(z\mid \mathit{\boldsymbol{X}})$是后验概率，${D_{{\rm{KL}}}}$是$\rm{KL}$散度。

2.5 条件变分自编码器

当VAE用于形状生成时，通常倾向于选择生成形状类型，特别是对于包含来自不同类别的形状的数据集(例如男性和女性，瘦和胖，更多示例参见Pons-Moll等人(2015))。为了实现这一点，参考Sohn等人(2015)为输入和隐变量添加标签以扩展框架。在这种情况下，网络的损失函数变为

$ \begin{array}{l} \;{L_c} = \frac{1}{{2M}}\sum\limits_{i = 1}^M {\left\| {\mathit{\boldsymbol{X}}_c^i - {{\mathit{\boldsymbol{\hat X}}}^i}} \right\|_{\rm{F}}^2} + \\ \alpha {D_{{\rm{KL}}}}(q(\mathit{\boldsymbol{z|X}}, \mathit{\boldsymbol{c}})\parallel p(\mathit{\boldsymbol{z|c}})) \end{array} $

(4)

式中，${\mathit{\boldsymbol{\hat X}}}$是条件VAE的输出结果，而$p(\mathit{\boldsymbol{z|c}})$和$q(\mathit{\boldsymbol{z|X}}, \mathit{\boldsymbol{c}})$是条件先验和后验概率分布。

2.6 实现细节

在所有的实验中，式(1)中用$\lambda = 0.001$收缩一半的顶点，并设置图卷积式(2)的超参数$H = 3$，总损失函数中$\alpha = 0.3$。除了特别说明的实验，其余实验的隐空间维度都是128, 并在网络权重上使用L2正则化来避免过度拟合。使用Adam优化器(Kingma和Ba，2015)，其中${\beta _1} = 0.9, {\beta _2} = 0.999$，学习率设置为0.001。

3 实验结果与分析

本文使用了多个变形形状数据集，包括SCAPE(shape completion and animation of people)数据集(Anguelov等，2005)、Swing数据集(Vlasic等，2008)、Face数据集(Neumann等，2013)、Horse和Camel数据集(Sumner和Popović，2004)、来自Dyna数据集的Fat (ID: 50002)(Pons-Moll等，2015)以及Hand数据集等。除了特别说明的实验，每个数据集随机均分为两部分，分别用于训练和测试。测试网络重建训练未见形状的能力，并计算平均均方根(root mean square, RMS)误差。

3.1 架构评估

为了比较不同的网络结构和设置，同时对编码解码结果的性能有较大影响的一些因素分析，进行一系列消融实验。

1) 池化的影响。在表 1(第8列和第3列)中，比较了在使用和不使用池化的情况下重建形状的RMS误差。使用池化后RMS误差平均降低6.92 %。该结果显示了池化和反池化操作的优势，能增强网络在未见形状上的重建能力。

表 1 消融实验的RMS重建误差
Table 1 Ablation study of RMS reconstruction errors

下载CSV

数据集	仅使用空域卷积	仅使用频域卷积	基于原始简化的池化	均匀重网格	图池化	网格采样	本文
SCAPE	0.108 6	0.082 5	0.089 8	0.081 3	0.082 4	0.083 1	0.076 3
Swing	0.035 9	0.028 2	0.028 4	0.028 1	0.029 2	0.029 8	0.026 8
Fat	0.036 2	0.026 7	0.028 5	0.030 5	0.025 3	0.028 9	0.024 9
Hand	0.030 0	0.028 4	0.027 1	0.028 0	0.030 6	0.027 8	0.026 0
注：加粗字体为每行最优值。

2) 与空域卷积比较。将频域图卷积与空域图卷积进行比较，二者都采用了如图 1所示的网络结构。比较结果如表 1(第2列和第3列)所示，容易发现频域图卷积得到了更好的结果，这是因为频域卷积相较于空域卷积，能考虑更多的邻域信息。

3) 与其他池化或采样方法比较。为了证明基于改进的边收缩算法的池化操作的优势，将提出的池化与基于原始简化算法(Garland和Heckbert，1997)的池化、基于均匀重网格方法(Botsch和Kobbelt，2004)的池化、现有的图池化方法(Shen等，2018)和网格采样操作(Ranjan等，2018)进行了比较。其中，重网格方法能够均匀地分布顶点，但会丢失几何细节。而本文方法旨在实现均匀的同时保持形状的简化，从而获得更好的泛化能力。结果如表 1所示。与基于Garland和Heckbert(1997)的池化相比，本文池化对训练未见数据的RMS误差平均降低了9.34 %，与均匀重网格方法(Botsch和Kobbelt，2004)相比降低了9.07 %，与图池化方法(Shen等，2018)相比降低了8.06 %，与网格采样操作(Ranjan等，2018)相比降低了9.64 %。结果表明，改进后的简化算法在池化方面更为有效，而且本文方法的池化在多个数据集上的效果更为优越，展示了其泛化能力。另外，图 4展示了改进的简化算法与原始简化算法在简化网格上的更多比较结果，可以看出改进的边收缩算法能够得到分布更均匀的简化网格。这也是本文网格池化在数值上具有优势的原因。

图 4 与经典网格简化算法的更多对比

Fig. 4 More comparisons with the mesh simplification algorithm

((a) original mesh; (b) Garland and Heckbert (1997); (c) ours)

3.2 与最优方法的比较

在表 2中，将本文方法与最先进的基于网格的自编码器架构(Gao等，2018；Ranjan等，2018；Tan等，2018b)在重建未见形状的RMS误差方面进行了比较。由于使用了频域图卷积和本文池化操作，该方法减少了训练未见数据的重建误差，展现出优越的泛化能力。例如，与使用相同逐顶点特征的Gao等人(2018)方法相比，本文网络在SCAPE和Face数据集上降低了29 %和32 % 的平均RMS重建误差。此外，在图 5—图 7与Gao等人(2018)和Ranjan等人(2018)方法进行了重建的可视化比较。图 5和图 6中重建误差使用颜色可视化。这些结果表明，本文方法比Gao等人(2018)和Ranjan等人(2018)方法能够生成更加准确的重建结果。

表 2 与不同自编码器框架比较训练未见数据的RMS重建误差
Table 2 Comparison of RMS reconstruction errors for unseen data using different auto-encoder frameworks

下载CSV

数据集	顶点数量	Tan等人(2018b)	Gao等人(2018)	Ranjan等人(2018)	本文
SCAPE	12 500	-	0.108 6	0.109 5	0.076 3
Swing	9 971	-	0.035 9	0.055 7	0.026 8
Fat	6 890	0.030 8	0.036 2	0.032 4	0.024 9
Hand	3 573	0.036 2	0.030 0	0.063 2	0.026 0
Face	11 849	-	1.061 9	1.147 9	0.725 7
Horse	8 431	-	0.012 8	0.051 0	0.011 9
Camel	11 063	-	0.013 4	0.026 5	0.011 5
注：加粗字体为每行最优值，“-”代表相应方法耗尽显存，无数据结果。

图 5 与Gao等人(2018)方法的重建结果可视化比较

Fig. 5 Qualitative comparison of reconstruction results with Gao et al.(2018)

图 6 在训练未见数据上重建结果可视化比较

Fig. 6 Qualitative comparison of reconstruction results for unseen data

图 7 与Ranjan等人(2018)方法的重建结果可视化比较

Fig. 7 Qualitative comparison of reconstruction results with Ranjan et al.(2018)

在表 3中还与MeshCNN(Hanocka等，2019)进行了比较。MeshCNN使用边上的特征，包括每个面的二面角、两个内角和两个边长比作为输入，不能用于重建模型。因此，采用边特征的平均绝对误差(mean absolute error, MAE)作为比较的度量。为本文网络的输入形状和重建形状计算这些特征，并修改MeshCNN的分割网络用于编码和解码任务。可以看到，网络在所有的5个边特征上都取得了更好的结果，这反映了本文方法重建的网格三角形质量更好。

表 3 与MeshCNN(Hanocka等，2019)比较MAE重建误差
Table 3 Comparison of MAE reconstruction errors with MeshCNN (Hanocka et al., 2019)

下载CSV

边特征	SCAPE		Swing		Fat		Hand
边特征	MeshCNN	本文	MeshCNN	本文	MeshCNN	本文	MeshCNN	本文
dihedral angle	0.069 0	0.000 6	0.050 6	0.000 3	0.301 7	0.018 6	0.092 1	0.000 2
inner angle 1	0.324 5	0.061 4	0.371 3	0.042 1	0.430 9	0.030 0	0.285 7	0.026 8
inner angle 2	0.310 0	0.052 9	0.296 4	0.040 2	0.415 0	0.034 9	0.200 9	0.023 1
edge-length ratio 1	0.380 6	0.066 1	0.364 5	0.053 7	0.389 4	0.035 4	0.372 3	0.078 2
edge-length ratio 2	0.366 8	0.064 9	0.352 3	0.047 5	0.356 8	0.027 6	0.353 1	0.076 6
注：加粗字体为各行各数据集最优值。

在表 4中，与DEMEA(deep mesh autoencoder)(Tretschk等，2020)进行比较，采用与DEMEA相同的数据集以及训练集和测试集划分，网络使用相同的隐含层维度32维。结果度量上也采用了与DEMEA相同的平均逐顶点误差。从表中结果比较可以看出，在4个数据集上，本文网络都取得了更好的重建结果。

表 4 与DEMEA(Tretschk等，2020)比较重建误差
Table 4 Comparison of reconstruction errors with DEMEA (Tretschk et al., 2020)

下载CSV

数据集	顶点数量	DEMEA	本文
CoMA	5 023	1.05	0.81
SynHand5M	1 193	4.67	3.05
Cloth	961	8.30	4.87
DFaust	6 890	2.90	2.08
注：加粗字体为每行最优值。

表 5实验说明本文网络比原始MeshVAE需要的参数更少。在Fat和Hand两个数据集上，本文网络需要的网络权重数量远远少于原始的MeshVAE，这更有助于网络的泛化能力。

表 5 与原始MeshVAE(Tan等，2018b) 在网络权重数量上的比较
Table 5

下载CSV

数据集	顶点数量	Tan等人(2018b)	本文
Fat	6 890	129 745 920	7 941 042
Hand	3 573	68 610 048	4 118 706
注：加粗字体为每行最优值。

3.3 新形状生成

网络训练完成后就可以使用隐空间和解码器来生成新的形状。使用标准正态分布$z \sim N\left({0, 1} \right)$作为输入传入训练好的解码器。从图 8可以看出，本文网络能够生成合理的新形状。为了证明生成的形状不存在于数据集中，根据顶点平均欧氏距离找到训练数据集中最相近的形状用于视觉比较。从比较可以看出生成的形状确实是新的，不同于训练数据集中任何已有的形状。为了展示模型的条件随机生成能力，在Pons-Moll等人(2015)的Dyna数据集上训练网络。分别使用BMI(body mass index)+性别和运动作为条件来训练网络。如图 9所示，本文方法能够随机生成以编号“50007”，一个BMI为39.0的男性模型为条件，和以带有“单腿跳跃”标签的动作(包括抬腿)为条件的新形状。

图 8 本文框架随机生成的新形状以及它们在原始数据集中的最近邻

Fig. 8 Randomly generated new shapes by our framework, along with their nearest neighbors in the original datasets

((a) Hand; (b) SCAPE; (c) Horse; (d) Face; (e) Swing)

图 9 本文框架条件随机生成新形状

Fig. 9 Conditional random generation of new shapes using our framework

((a) conditioned on motion sequence—one leg jump; (b) conditioned on bodyshape—male model with BMI 39.0)

3.4 形状插值

本文方法也可用于形状插值。这也是一种生成新形状的方法。将两个形状的隐变量进行线性插值，然后利用概率解码器输出3D变形序列。在SCAPE数据集(Anguelov等，2005)上与原始MeshVAE和最先进的数据驱动变形方法(Gao等，2017)进行对比，结果分别如图 10和图 11所示，最左列和最右列中的模型是插值的输入模型。可以看到MeshVAE生成的插值结果具有明显的瑕疵。Gao等人(2017)的结果倾向于遵循原始数据集的运动序列，这些数据集具有相似的开始和结束状态，这导致如摆动右臂等多余的运动。相比之下，本文插值给出了更合理的运动序列。在图 12中，展示了与Litany等人(2018)方法的比较，其会产生瑕疵，特别是在合成的人手中。在图 13展示了更多的插值结果，包括在新生成的人体模型之间和其他类型的模型之间插值。

图 10 与Tan等人(2018b)比较网格插值结果

Fig. 10 Comparison of mesh interpolation results with Tan et al.(2018b)

((a) Tan et al.(2018b); (b) ours)

图 11 与Gao等人(2017)比较网格插值结果

Fig. 11 Comparison of mesh interpolation results with Gao et al.(2017)

((a) Gao et al.(2017); (b) ours)

图 12 与Litany等人(2018)比较网格插值结果

Fig. 12 Comparison of mesh interpolation results with Litany et al.(2018)

((a) Litany et al.(2018); (b) ours)

图 13 更多插值结果

Fig. 13 More interpolation results

((a) Horse; (b) Hand; (c) new generated shapes)

为了展示本文网络能处理更稠密网格的能力将提出的网络与MeshVAE进行了比较。MeshVAE只能处理较小分辨率的网格，这是由于其全连接网络的高显存需求导致的。而借助卷积和池化操作，本文方法可以处理更稠密的网格。因此，本文方法可以在插值、重建和随机生成中恢复出更好的几何细节。图 14中展示了一个插值的比较示例。原始大象模型(Sumner和Popović，2004)有42 321个顶点，由于显存限制，MeshVAE无法处理这么多顶点，因此使用了5 394个顶点的简化网格。本文方法对原始网格模型进行操作，并产生更精细的结果。

图 14 与MeshVAE(Tan等，2018b)的插值比较

Fig. 14 Interpolation comparison with MeshVAE

((a) Tan et al.(2018b); (b) ours)

3.5 形状嵌入

本文方法可以将3D形状压缩成低维向量进行可视化。为了更好地可视化嵌入，计算了隐含向量的最大的两个方差作为2D嵌入图中模型对应的水平和垂直坐标，利用这种方法在低维空间中嵌入形状。本文方法根据模型的形状来划分所有的模型，同时允许相似姿态的模型保持在接近的位置。使用一个代表性的运动序列，即来自Sumner和Popović(2004)的马运动序列。Horse数据集包含一个奔腾的马的运动序列，它形成了一个循环序列。可以从图 15左子图所示的嵌入结果，即一个与原始序列相匹配的圆得出结论，本文网络具有良好的嵌入能力。在图 15的右上角和右下角，还分别展示了PCA(principal components analysis)和t-SNE(t-distributed stochastic neighbor embedding)的结果。对比发现，本文网络结果呈现为两个圆，而PCA和t-SNE不能揭示数据的内在信息。

图 15 Horse数据集(Sumner和Popović，2004)的2D嵌入

Fig. 15 2D embedding of Horse dataset (Sumner and Popović, 2004)

4 结论

本文提出了一种新的网格池化操作，该池化操作能改变网格的连接关系，支持进一步的卷积或池化操作。该池化操作基于边收缩网格简化算法。同时，为了使得网格池化的感受野更加一致，改进了边收缩网格简化算法。利用提出的网格池化操作，结合图卷积算子，本文构建了一个网格变分自编码器，该网络采用逐顶点特征表示作为输入。通过对训练未见数据的重建实验，表明了相较于目前已有的网格编解码网络，本文网络具有更好的泛化能力。相较于原始的MeshVAE，方法能够生成高质量且具有丰富细节的变形模型。模型还可以被应用于包括形状生成，形状插值和形状嵌入等诸多应用中，且实验表明，模型超越了已有方法。不足之一在于只能处理相同连接关系的网格结构，这是由于神经网络的输入需要一致的表示。又因为池化操作是基于网格简化的，因此在网格简化失败的情况下无法生成合理的结果，例如非水密网格和高度不规则的网格输入。对于未来的工作，拟探索如何将不同拓扑结构的网格作为输入，同时在网格简化算法上进一步研究，解决网格简化失败导致网格池化无法使用的情况。

参考文献

Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J. 2005. SCAPE: shape completion and animation of people. ACM Transactions on Graphics, 24(3): 408-416 [DOI:10.1145/1073204.1073207]

Boscaini D, Masci J, Rodoià E and Bronstein M. 2016a. Learning shape correspondence with anisotropic convolutional neural networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 3197-3205

Boscaini D, Masci J, Rodolà E, Bronstein M M, Cremers D. 2016b. Anisotropic diffusion descriptors. Computer Graphics Forum, 35(2): 431-441 [DOI:10.1111/cgf.12844]

Botsch M and Kobbelt L. 2004. A remeshing approach to multiresolution modeling//2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing. Nice, France: ACM: 185-192[DOI: 10.1145/1057432.1057457]

Bronstein M M, Bruna J, LeCun Y, Szlam A, Vandergheynst P. 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4): 18-42 [DOI:10.1109/msp.2017.2693418]

Defferrard M, Bresson X and Vandergheynst P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 3844-3852

Gao L, Chen S Y, Lai Y K, Xia S H. 2017. Data-driven shape interpolation and morphing editing. Computer Graphics Forum, 36(8): 19-31 [DOI:10.1111/cgf.12991]

Gao L, Lai Y K, Liang D, Chen S Y, Xia S H. 2016. Efficient and flexible deformation representation for data-driven surface modeling. ACM Transactions on Graphics, 35(5): #158 [DOI:10.1145/2908736]

Gao L, Lai Y K, Yang J, Zhang L X, Xia S H, Kobbelt L. 2021. Sparse data driven mesh deformation. IEEE Transactions on Visualization and Computer Graphics, 27(3): 2085-2100 [DOI:10.1109/tvcg.2019.2941200]

Gao L, Yang J, Qiao Y L, Lai Y K, Rosin P L, Xu W W, Xia S H. 2018. Automatic unpaired shape deformation transfer. ACM Transactions on Graphics, 37(6): #237 [DOI:10.1145/3272127.3275028]

Garland M and Heckbert P S. 1997. Surface simplification using quadric error metrics//Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. Los Angeles, USA: ACM: 209-216[DOI: 10.1145/258734.258849]

Hanocka R, Hertz A, Fish N, Giryes R, Fleishman S, Cohen-Or D. 2019. MeshCNN: a network with an edge. ACM Transactions on Graphics, 38(4): #90 [DOI:10.1145/3306346.3322959]

Henaff M, Bruna J and LeCun Y. 2015. Deep convolutional networks on graph-structured data[EB/OL]. [2021/05/30]. https://arxiv.org/pdf/1506.05163.pdf

Huang J W, Zhang H T, Yi L, Funkhouser T, Nieβner M and Guibas L J. 2019. TextureNet: consistent local parametrizations for learning from high-resolution signals on meshes//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4440-4449[DOI: 10.1109/cvpr.2019.00457]

Huber P, Perl R, Rumpf M. 2017. Smooth interpolation of key frames in a Riemannian shell space. Computer Aided Geometric Design, 52-53: 313-328 [DOI:10.1016/j.cagd.2017.02.008]

Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR (poster)

Kingma D P and Welling M. 2014. Auto-encoding variational Bayes//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: ICLR

Kullback S, Leibler R A. 1951. On information and sufficiency. The Annals of Mathematical Statistics, 22(1): 79-86 [DOI:10.1214/aoms/1177729694]

Litany O, Bronstein A, Bronstein M and Makadia A. 2018. Deformable shape completion with graph convolutional autoencoders//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1886-1895[DOI: 10.1109/cvpr.2018.00202]

Masci J, Boscaini D, Bronstein M M and Vandergheynst P. 2015. Geodesic convolutional neural networks on riemannian manifolds//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 832-840[DOI: 10.1109/iccvw.2015.112]

Maturana D and Scherer S. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE: 922-928[DOI: 10.1109/iros.2015.7353481]

Neumann T, Varanasi K, Wenger S, Wacker M, Magnor M, Theobalt C. 2013. Sparse localized deformation components. ACM Transactions on Graphics, 32(6): #179 [DOI:10.1145/2508363.2508417]

Pons-Moll G, Romero J, Mahmood N, Black M J. 2015. Dyna: a model of dynamic human shape in motion. ACM Transactions on Graphics, 34(4): #120 [DOI:10.1145/2766993]

Qi C R, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 77-85[DOI: 10.1109/cvpr.2017.16]

Qi C R, Yi L, Su H and Guibas L. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 5105-5114

Ranjan A, Bolkart T, Sanyal S and Black M J. 2018. Generating 3D faces using convolutional mesh autoencoders//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 704-720[DOI: 10.1007/978-3-030-01219-9_43]

Shen Y R, Feng C, Yang Y Q and Tian D. 2018. Mining point cloud local structures by kernel correlation and graph pooling//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4548-4557[DOI: 10.1109/cvpr.2018.00478]

Sinha A, Bai J and Ramani K. 2016. Deep learning 3D shape surfaces using geometry images//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 223-240[DOI: 10.1007/978-3-319-46466-4_14]

Sohn K, Yan X C and Lee H. 2015. Learning structured output representation using deep conditional generative models//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 3483-3491

Sumner R W, Popovicć J. 2004. Deformation transfer for triangle meshes. ACM Transactions on Graphics, 23(3): 399-405 [DOI:10.1145/1015706.1015736]

Tan Q Y, Gao L, Lai Y K, Yang J and Xia S H. 2018a. Mesh-based autoencoders for localized deformation component analysis//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI: 2452-2459

Tan Q Y, Gao L, Lai Y K and Xia S H. 2018b. Variational autoencoders for deforming 3D mesh models//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5841-5850[DOI: 10.1109/cvpr.2018.00612]

Tretschk E, Tewari A, Zollhöfer M, Golyanik V and Theobalt C. 2020. DEMEA: deep mesh autoencoders for non-rigidly deforming objects//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 601-617[DOI: 10.1007/978-3-030-58548-8_35]

Vlasic D, Baran I, Matusik W, Popovicć J. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3): 1-9 [DOI:10.1145/1399504.1360696]

Wang P S, Liu Y, Guo Y X, Sun C Y, Tong X. 2017. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics, 36(4): #72 [DOI:10.1145/3072959.3073608]

Wang P S, Sun C Y, Liu Y, Tong X. 2018. Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Transactions on Graphics, 37(6): #217 [DOI:10.1145/3272127.3275050]

Wu J J, Zhang C K, Xue T F, Freeman T W and Tenenbaum J B. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 82-90

Yi L, Su H, Guo X W and Guibas L J. 2017. SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6584-6592[DOI: 10.1109/cvpr.2017.697]

Zhai Z L, Liang Z M, Zhou W, Sun X. 2019. Research overview of variational auto-encoders models. Computer Engineering and Applications, 55(3): 1-9 (翟正利, 梁振明, 周炜, 孙霞. 2019. 变分自编码器模型综述. 计算机工程与应用, 55(3): 1-9)

Zhang X F, Cheng L C, Bai S L, Zhang F, Sun N L, Wang Z Y. 2020. Face image inpainting via variational autoencoder. Journal of Computer-Aided Design and Computer Graphics, 32(3): 401-409 (张雪菲, 程乐超, 白升利, 张繁, 孙农亮, 王章野. 2020. 基于变分自编码器的人脸图像修复. 计算机辅助设计与图形学学报, 32(3): 401-409) [DOI:10.3724/SP.J.1089.2020.17938]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251[DOI: 10.1109/iccv.2017.244]