Mesh variational auto-encoders with edge contraction pooling
- Vol. 27, Issue 2, Pages: 511-524(2022)
Received:14 July 2021,
Revised:10 November 2021,
Accepted:17 November 2021,
Published:16 February 2022
DOI: 10.11834/jig.210550
移动端阅览
浏览全部资源
扫码关注微信
Received:14 July 2021,
Revised:10 November 2021,
Accepted:17 November 2021,
Published:16 February 2022
移动端阅览
目的
2
3D形状分析是计算机视觉和图形学的一个重要研究课题。虽然现有方法使用基于图的卷积将基于图像的深度学习推广到3维网格,但缺乏有效的池化操作限制了其网络的学习能力。针对具有相同连通性,但几何形状不同的网格模型数据集,本文利用网格简化的边收缩操作建立网格层次结构,提出了一种新的网格池化操作。
方法
2
本文改进了传统的网格简化方法,以避免生成高度不规则的三角形,利用改进的网格简化方法定义了新的网格池化操作。网格简化的边收缩操作建立的网格层次结构之间存在对应关系,有利于网格池化的定义。新定义的池化操作有效地编码了层次结构中较粗糙和较稠密网格之间的对应关系。最后提出了一种带有边收缩池化和图卷积的变分自编码器(variational auto-encoder,VAE)结构,以探索3D形状的隐空间并用于3D形状的生成。
结果
2
由于引入了新定义的池化操作和图卷积操作,提出的网络结构比原始MeshVAE需要的参数更少,因此可以处理更稠密的网格模型。
结论
2
实验表明提出的方法具有更好的泛化能力,并且在各种应用中更可靠,包括形状生成、形状插值和形状嵌入。
Objective
2
3D shape datasets have been tremendous facilitated nowadays. Data-driven 3D shape analysis has been an active research topic in computer vision and graphics. Apart from regular works
current data-driven works attempted to generalize deep neural networks from images to 3D shapes
including triangular meshes
point clouds and voxel data. Deep neural networks for triangular meshes have been concentrated. 3D meshes have complicated and irregular inter-connection. Most current works tend to keep mesh connectivity unchanged each layer
thus
losing the capability of increased receptive fields when pooling operations are applied. The variational auto-encoder (VAE) has been widely used in various kinds of generation tasks
including generation
interpolation and exploration on triangular meshes. Based on a fully-connected network
the initial MeshVAE requires mega parameters and its generalization capability is often weak. Although the fully connected layers allow changes of mesh connectivity across layers
due to irregular changes
such approaches cannot be directly generalized to convolutional layers. Some works adopt convolutional layers in the VAE structure. However
such convolution operations cannot change the connectivity of the mesh. Sampling operation is also evolved in convolutional neural networks(CNNs) on meshes
but the mesh sampling strategy does not aggregate the whole local neighborhood information when reducing the quantities of vertices. Hence
it is necessary to design a pooling operation for meshes similar to the pooling for images to reduce the amount of network parameters in order to deal with denser models and enhance the generalization ability of the network. Moreover
the defined pooling can support further convolutions and conduct recovery via a corresponding de-pooling operation.
Method
2
A novel mesh pooling operation is illustrated based on edge contraction. The VAE architecture in context of the newly defined pooling operation is built up as well. Mesh simplification is applied to organize a mesh hierarchy with different levels of details
and achieves effective pooling by keeping track of the mapping between coarser and finer meshes. To avoid generating highly irregular triangles in mesh simplification
a modified mesh simplification approach is demonstrated based on the classical edge contraction algorithm. The edge length is an essential indicator for the edge contraction process. So
as one of the criteria
the edge length is incorporated to order pairs of points. The new edge length is added to the original quadric error formulation straightforward. The feature of a new vertex is defined as the average feature of the contracted vertices for average pooling
and alternative pooling operations can be similarly ruled. In the decoding process
the features of the vertices on the simplified mesh are equally assigned to the corresponding contracted vertices on the dense mesh for the inverse operation
de-pooling. The input to the illustrated network is a vertex-based deformation feature representation
which is different from 3D coordinates
encodes deformations defined on vertices in terms of deformation gradients analysis. The demonstrated framework uses a cluster of 3D shapes with the same connectivity to train the network. Such meshes can be easily obtained via consistent re-meshing. The network follows a VAE architecture where pooling operations and graph convolutions are applied. It has qualified generalization capabilities and handles much higher resolution meshes in various applications
such as shape generation and interpolation.
Result
2
The framework is tested on four datasets
shape completion and animation of people (SCAPE)
Swing
Fat and Hand. The capability of the network is tested to generate unseen shapes
and calculate the average root mean squared (RMS) errors. The network with the proposed pooling and without pooling has been initially compared. The RMS error is lower by an average of 6.92% with pooling
which shows the benefits of our pooling and de-pooling operations. The comparisons between the proposed pooling and other pooling or sampling methods are illustrated. The RMS error of the proposed pooling for unseen data is lower on average by 9.34% compared to initial simplification-based pooling
9.07% compared to uniform remeshing method
8.06% compared to graph pooling
and 9.64% compared to mesh sampling
which illustrates this modified simplification algorithm is more effective in terms of pooling and the proposed pooling is superior on multiple datasets
demonstrating its generalization capability. The proposed framework is also compared with related mesh-based auto-encoder architectures. Thanks to spectral graph convolutions and the proposed pooling
the method reduces the reconstruction errors of unseen data consistently
showing superior generalizability. For instance
compared with one work which uses the same per-vertex features
the designed network achieves 29% and 32% lower average RMS reconstruction errors on the SCAPE and Face datasets. MeshCNN is compared and the proposed network achieves better results. Moreover
the capability of our framework is demonstrated in shape generation
shape interpolation and shape embedding.
Conclusion
2
A newly defined pooling operation
based on a modified mesh simplification algorithm
is integrated into a mesh variational auto-encoder architecture. Our generative model has its good generalization capability. Compared to the original MeshVAE
our method can generate high quality deformable models with richer details.
Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J and Davis J. 2005. SCAPE: shape completion and animation of people. ACM Transactions on Graphics, 24(3): 408-416[DOI: 10.1145/1073204.1073207]
Boscaini D, Masci J, Rodoià E and Bronstein M. 2016a. Learning shape correspondence with anisotropic convolutional neural networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 3197-3205
Boscaini D, Masci J, Rodolà E, Bronstein M M and Cremers D. 2016b. Anisotropic diffusion descriptors. Computer Graphics Forum, 35(2): 431-441[DOI: 10.1111/cgf.12844]
Botsch M and Kobbelt L. 2004. A remeshing approach to multiresolution modeling//2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing. Nice, France: ACM: 185-192[ DOI: 10.1145/1057432.1057457 http://dx.doi.org/10.1145/1057432.1057457 ]
Bronstein M M, Bruna J, LeCun Y, Szlam A and Vandergheynst P. 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4): 18-42[DOI: 10.1109/msp.2017.2693418]
Defferrard M, Bresson X and Vandergheynst P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 3844-3852
Gao L, Chen S Y, Lai Y K and Xia S H. 2017. Data-driven shape interpolation and morphing editing. Computer Graphics Forum, 36(8): 19-31[DOI: 10.1111/cgf.12991]
Gao L, Lai Y K, Liang D, Chen S Y and Xia S H. 2016. Efficient and flexible deformation representation for data-driven surface modeling. ACM Transactions on Graphics, 35(5): #158[DOI: 10.1145/2908736]
Gao L, Lai Y K, Yang J, Zhang L X, Xia S H and Kobbelt L. 2021. Sparse data driven mesh deformation. IEEE Transactions on Visualization and Computer Graphics, 27(3): 2085-2100[DOI: 10.1109/tvcg.2019.2941200]
Gao L, Yang J, Qiao Y L, Lai Y K, Rosin P L, Xu W W and Xia S H. 2018. Automatic unpaired shape deformation transfer. ACM Transactions on Graphics, 37(6): #237[DOI: 10.1145/3272127.3275028]
Garland M and Heckbert P S. 1997. Surface simplification using quadric error metrics//Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. Los Angeles, USA: ACM: 209-216[ DOI: 10.1145/258734.258849 http://dx.doi.org/10.1145/258734.258849 ]
Hanocka R, Hertz A, Fish N, Giryes R, Fleishman S and Cohen-Or D. 2019. MeshCNN: a network with an edge. ACM Transactions on Graphics, 38(4): #90[DOI: 10.1145/3306346.3322959]
Henaff M, Bruna J and LeCun Y. 2015. Deep convolutional networks on graph-structured data[EB/OL]. [2021/05/30] . https://arxiv.org/pdf/1506.05163.pdf https://arxiv.org/pdf/1506.05163.pdf
Huang J W, Zhang H T, Yi L, Funkhouser T, Nieβner M and Guibas L J. 2019. TextureNet: consistent local parametrizations for learning from high-resolution signals on meshes//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4440-4449[ DOI: 10.1109/cvpr.2019.00457 http://dx.doi.org/10.1109/cvpr.2019.00457 ]
Huber P, Perl R and Rumpf M. 2017. Smooth interpolation of key frames in a Riemannian shell space. Computer Aided Geometric Design, 52-53: 313-328[DOI: 10.1016/j.cagd.2017.02.008]
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR (poster)
Kingma D P and Welling M. 2014. Auto-encoding variational Bayes//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: ICLR
Kullback S and Leibler R A. 1951. On information and sufficiency. The Annals of Mathematical Statistics, 22(1): 79-86[DOI: 10.1214/aoms/1177729694]
Litany O, Bronstein A, Bronstein M and Makadia A. 2018. Deformable shape completion with graph convolutional autoencoders//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1886-1895[ DOI: 10.1109/cvpr.2018.00202 http://dx.doi.org/10.1109/cvpr.2018.00202 ]
Masci J, Boscaini D, Bronstein M M and Vandergheynst P. 2015. Geodesic convolutional neural networks on riemannian manifolds//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 832-840[ DOI: 10.1109/iccvw.2015.112 http://dx.doi.org/10.1109/iccvw.2015.112 ]
Maturana D and Scherer S. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE: 922-928[ DOI: 10.1109/iros.2015.7353481 http://dx.doi.org/10.1109/iros.2015.7353481 ]
Neumann T, Varanasi K, Wenger S, Wacker M, Magnor M and Theobalt C. 2013. Sparse localized deformation components. ACM Transactions on Graphics, 32(6): #179[DOI: 10.1145/2508363.2508417]
Pons-Moll G, Romero J, Mahmood N and Black M J. 2015. Dyna: a model of dynamic human shape in motion. ACM Transactions on Graphics, 34(4): #120[DOI: 10.1145/2766993]
Qi C R, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 77-85[ DOI: 10.1109/cvpr.2017.16 http://dx.doi.org/10.1109/cvpr.2017.16 ]
Qi C R, Yi L, Su H and Guibas L. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 5105-5114
Ranjan A, Bolkart T, Sanyal S and Black M J. 2018. Generating 3D faces using convolutional mesh autoencoders//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 704-720[ DOI: 10.1007/978-3-030-01219-9_43 http://dx.doi.org/10.1007/978-3-030-01219-9_43 ]
Shen Y R, Feng C, Yang Y Q and Tian D. 2018. Mining point cloud local structures by kernel correlation and graph pooling//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4548-4557[ DOI: 10.1109/cvpr.2018.00478 http://dx.doi.org/10.1109/cvpr.2018.00478 ]
Sinha A, Bai J and Ramani K. 2016. Deep learning 3D shape surfaces using geometry images//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 223-240[ DOI: 10.1007/978-3-319-46466-4_14 http://dx.doi.org/10.1007/978-3-319-46466-4_14 ]
Sohn K, Yan X C and Lee H. 2015. Learning structured output representation using deep conditional generative models//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 3483-3491
Sumner R W and Popovicć J. 2004. Deformation transfer for triangle meshes. ACM Transactions on Graphics, 23(3): 399-405[DOI: 10.1145/1015706.1015736]
Tan Q Y, Gao L, Lai Y K, Yang J and Xia S H. 2018a. Mesh-based autoencoders for localized deformation component analysis//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI: 2452-2459
Tan Q Y, Gao L, Lai Y K and Xia S H. 2018b. Variational autoencoders for deforming 3D mesh models//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5841-5850[ DOI: 10.1109/cvpr.2018.00612 http://dx.doi.org/10.1109/cvpr.2018.00612 ]
Tretschk E, Tewari A, Zollhöfer M, Golyanik V and Theobalt C. 2020. DEMEA: deep mesh autoencoders for non-rigidly deforming objects//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 601-617[ DOI: 10.1007/978-3-030-58548-8_35 http://dx.doi.org/10.1007/978-3-030-58548-8_35 ]
Vlasic D, Baran I, Matusik W and Popovicć J. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3): 1-9[DOI: 10.1145/1399504.1360696]
Wang P S, Liu Y, Guo Y X, Sun C Y and Tong X. 2017. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics, 36(4): #72[DOI: 10.1145/3072959.3073608]
Wang P S, Sun C Y, Liu Y and Tong X. 2018. Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Transactions on Graphics, 37(6): #217[DOI: 10.1145/3272127.3275050]
Wu J J, Zhang C K, Xue T F, Freeman T W and Tenenbaum J B. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 82-90
Yi L, Su H, Guo X W and Guibas L J. 2017. SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6584-6592[ DOI: 10.1109/cvpr.2017.697 http://dx.doi.org/10.1109/cvpr.2017.697 ]
Zhai Z L, Liang Z M, Zhou W and Sun X. 2019. Research overview of variational auto-encoders models. Computer Engineering and Applications, 55(3): 1-9
翟正利, 梁振明, 周炜, 孙霞. 2019. 变分自编码器模型综述. 计算机工程与应用, 55(3): 1-9
Zhang X F, Cheng L C, Bai S L, Zhang F, Sun N L and Wang Z Y. 2020. Face image inpainting via variational autoencoder. Journal of Computer-Aided Design and Computer Graphics, 32(3): 401-409
张雪菲, 程乐超, 白升利, 张繁, 孙农亮, 王章野. 2020. 基于变分自编码器的人脸图像修复. 计算机辅助设计与图形学学报, 32(3): 401-409[DOI: 10.3724/SP.J.1089.2020.17938]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251[ DOI: 10.1109/iccv.2017.244 http://dx.doi.org/10.1109/iccv.2017.244 ]
相关文章
相关作者
相关机构