Current Issue Cover
边收缩池化的网格变分自编码器

袁宇杰1,2, 来煜坤3, 杨洁1,2, 段琦4, 傅红波5, 高林1,2(1.中国科学院计算技术研究所, 北京 100190;2.中国科学院大学, 北京 100049;3.英国卡迪夫大学, 卡迪夫 CF24 4AG, 英国;4.商汤科技, 上海 200233;5.香港城市大学, 香港 999077)

摘 要
目的 3D形状分析是计算机视觉和图形学的一个重要研究课题。虽然现有方法使用基于图的卷积将基于图像的深度学习推广到3维网格,但缺乏有效的池化操作限制了其网络的学习能力。针对具有相同连通性,但几何形状不同的网格模型数据集,本文利用网格简化的边收缩操作建立网格层次结构,提出了一种新的网格池化操作。方法 本文改进了传统的网格简化方法,以避免生成高度不规则的三角形,利用改进的网格简化方法定义了新的网格池化操作。网格简化的边收缩操作建立的网格层次结构之间存在对应关系,有利于网格池化的定义。新定义的池化操作有效地编码了层次结构中较粗糙和较稠密网格之间的对应关系。最后提出了一种带有边收缩池化和图卷积的变分自编码器(variational auto-encoder,VAE)结构,以探索3D形状的隐空间并用于3D形状的生成。结果 由于引入了新定义的池化操作和图卷积操作,提出的网络结构比原始MeshVAE需要的参数更少,因此可以处理更稠密的网格模型。结论 实验表明提出的方法具有更好的泛化能力,并且在各种应用中更可靠,包括形状生成、形状插值和形状嵌入。
关键词
Mesh variational auto-encoders with edge contraction pooling

Yuan Yujie1,2, Lai Yukun3, Yang Jie1,2, Duan Qi4, Fu Hongbo5, Gao Lin1,2(1.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China;3.School of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, UK;4.SenseTime Research, Shanghai 200233, China;5.City University of Hong Kong, Hong Kong 999077, China)

Abstract
Objective 3D shape datasets have been tremendous facilitated nowadays. Data-driven 3D shape analysis has been an active research topic in computer vision and graphics. Apart from regular works, current data-driven works attempted to generalize deep neural networks from images to 3D shapes, including triangular meshes, point clouds and voxel data. Deep neural networks for triangular meshes have been concentrated. 3D meshes have complicated and irregular inter-connection. Most current works tend to keep mesh connectivity unchanged each layer, thus, losing the capability of increased receptive fields when pooling operations are applied. The variational auto-encoder (VAE) has been widely used in various kinds of generation tasks, including generation, interpolation and exploration on triangular meshes. Based on a fully-connected network, the initial MeshVAE requires mega parameters and its generalization capability is often weak. Although the fully connected layers allow changes of mesh connectivity across layers, due to irregular changes, such approaches cannot be directly generalized to convolutional layers. Some works adopt convolutional layers in the VAE structure. However, such convolution operations cannot change the connectivity of the mesh. Sampling operation is also evolved in convolutional neural networks(CNNs) on meshes, but the mesh sampling strategy does not aggregate the whole local neighborhood information when reducing the quantities of vertices. Hence, it is necessary to design a pooling operation for meshes similar to the pooling for images to reduce the amount of network parameters in order to deal with denser models and enhance the generalization ability of the network. Moreover, the defined pooling can support further convolutions and conduct recovery via a corresponding de-pooling operation. Method A novel mesh pooling operation is illustrated based on edge contraction. The VAE architecture in context of the newly defined pooling operation is built up as well. Mesh simplification is applied to organize a mesh hierarchy with different levels of details, and achieves effective pooling by keeping track of the mapping between coarser and finer meshes. To avoid generating highly irregular triangles in mesh simplification, a modified mesh simplification approach is demonstrated based on the classical edge contraction algorithm. The edge length is an essential indicator for the edge contraction process. So, as one of the criteria, the edge length is incorporated to order pairs of points. The new edge length is added to the original quadric error formulation straightforward. The feature of a new vertex is defined as the average feature of the contracted vertices for average pooling, and alternative pooling operations can be similarly ruled. In the decoding process, the features of the vertices on the simplified mesh are equally assigned to the corresponding contracted vertices on the dense mesh for the inverse operation, de-pooling. The input to the illustrated network is a vertex-based deformation feature representation, which is different from 3D coordinates, encodes deformations defined on vertices in terms of deformation gradients analysis. The demonstrated framework uses a cluster of 3D shapes with the same connectivity to train the network. Such meshes can be easily obtained via consistent re-meshing. The network follows a VAE architecture where pooling operations and graph convolutions are applied. It has qualified generalization capabilities and handles much higher resolution meshes in various applications, such as shape generation and interpolation. Result The framework is tested on four datasets, shape completion and animation of people (SCAPE), Swing, Fat and Hand. The capability of the network is tested to generate unseen shapes, and calculate the average root mean squared (RMS) errors. The network with the proposed pooling and without pooling has been initially compared. The RMS error is lower by an average of 6.92% with pooling, which shows the benefits of our pooling and de-pooling operations. The comparisons between the proposed pooling and other pooling or sampling methods are illustrated. The RMS error of the proposed pooling for unseen data is lower on average by 9.34% compared to initial simplification-based pooling, 9.07% compared to uniform remeshing method, 8.06% compared to graph pooling, and 9.64% compared to mesh sampling, which illustrates this modified simplification algorithm is more effective in terms of pooling and the proposed pooling is superior on multiple datasets, demonstrating its generalization capability. The proposed framework is also compared with related mesh-based auto-encoder architectures. Thanks to spectral graph convolutions and the proposed pooling, the method reduces the reconstruction errors of unseen data consistently, showing superior generalizability. For instance, compared with one work which uses the same per-vertex features, the designed network achieves 29% and 32% lower average RMS reconstruction errors on the SCAPE and Face datasets. MeshCNN is compared and the proposed network achieves better results. Moreover, the capability of our framework is demonstrated in shape generation, shape interpolation and shape embedding. Conclusion A newly defined pooling operation, based on a modified mesh simplification algorithm, is integrated into a mesh variational auto-encoder architecture. Our generative model has its good generalization capability. Compared to the original MeshVAE, our method can generate high quality deformable models with richer details.
Keywords

订阅号|日报