A vectorized spherical convolutional network for recognizing 3D mesh models with unknown rotation
- Vol. 28, Issue 4, Pages: 1091-1103(2023)
Published: 16 April 2023
DOI: 10.11834/jig.211205
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 April 2023 ,
移动端阅览
张强, 赵杰煜, 陈豪. 2023. 面向识别未知旋转的3维网格模型的矢量型球面卷积网络. 中国图象图形学报, 28(04):1091-1103
Zhang Qiang, Zhao Jieyu, Chen Hao. 2023. A vectorized spherical convolutional network for recognizing 3D mesh models with unknown rotation. Journal of Image and Graphics, 28(04):1091-1103
目的
2
3维目标分类是视觉领域的一个基本问题,3维目标的旋转变化给分类带来极大挑战。同时不规则3维网格模型难以运用传统2维卷积网络提取特征。针对这两个问题,提出一种基于矢量型球面卷积网络的分类方法,用于识别未知旋转的3维网格模型。
方法
2
使用矢量型神经元作为网络的基础神经元,并提出一种新型矢量层间的卷积方式。首先,将3维模型规范化并映射到单位球上,获取球面的信号表示;然后,使用矢量型分类网络和重建网络学习等变的3维模型特征;最后,使用分类网络完成3维模型分类。
结果
2
经过消融实验对比,使用本文提出的球面卷积模块和矢量卷积层,并在训练时加入重建模块。对原本未旋转(no rotation,NR)数据集进行任意旋转(arbitrary rotation,AR),并设定NR/AR,AR/AR,NR/NR共3种训练/测试策略的分类任务,其中NR/AR任务衡量模型识别未知旋转的能力。在刚性数据集ModelNet40上,相比基于球面卷积网络(spherical convolutional neural network, SCNN)的分类方法,在3种任务上分别提高了7.7%,1.8%,3.1%。为验证本文方法在识别非刚性3维网格目标的优越性,在非刚性数据集SHREC15(shape retrieval contest 2015)上,相比SCNN,本文方法在3种任务上分别提高了8.8%,4.5%,5.0%。
结论
2
本文提出一种将矢量型网络运用在3维目标分类的思路,使用光线投射法获得分布在球面空间的特征,便于使用统一的球面卷积算子进行处理;设计一种球面残差模块避免梯度消失;使用矢量型神经元并设计矢量层之间的卷积方式以保证网络的等变性,使得识别任意旋转的3维模型时更加准确。
Objective
2
The 3D meshes are concerned of spatial information-demonstrated surface triangles, which can optimize surface information than other related representations like voxel or point cloud. The 3D shape analysis is still to be resolved in relevant to mesh representation on two aspects: 1) the irregular data structure of the mesh model is challenged for feature extraction using traditional 2D convolutional networks, and 2) the 3D rotation transformation is challenged for object recognition as well. The emerging convolutional neural networks (CNNs) have been developing dramatically in the context of 2D vision like classification, segmentation, detection, as well as 3D objects-oriented applications. Current CNN-based 3D mesh classification is developed from two aspects: 1) the 3D object is transferred to 2D images and the following 2D-based CNN methods are used, and 2) convolution methods are designed on 3D mesh data. However, it is still challenged to recognize rotated objects due to traditional CNN-equivariant-lacked pooling operation. The lack of rotation equivariance can be improved on the basis of two networks which are vectorized and equivariant networks. The vectorized network, known as capsule network, has shown its potentials in learning spatial transformation on 2D images, but convolution-constrained method is required to be applied on 3D mesh further. To apply the vectorized neural network to 3D mesh data and preserve rotation equivariance, we develop a vectorized spherical neural network-derived method for 3D mesh classification.
Method
2
Our method can be segmented into three categories as mentioned below: First, the 3D mesh model is preprocessed to signals on the sphere. We normalize the 3D mesh into a unit sphere and get the spherical signals on the unit sphere using the ray casting scheme. The obtained spherical signals are nearly equivalent 3D shape representations and can be further processed by spherical convolution methods. The aims of processing 3D mesh to spherical signals are 1) to utilize the spherical signals-defined equivariant spherical convolution operators, and 2) to design vectorized neurons in a coordinated manner. Second, the autoencoder-structured model is used to learn the feature of the spherical signal. The model is composed of two sub-networks: i) a vectorized spherical convolutional neural network (VSCNN) to encode the equivariant feature and classify the 3D object, and ii) a multilayer perceptron decoder to decode the extracted feature back to sphere signal. The VSCNN is based on two kinds of spherical residual convolution block and the vector convolution layer. To train deeper networks and resist overfitting, we develop two spherical convolution modules which are
<math id="M1"><msup><mrow><mi>S</mi></mrow><mrow><mn mathvariant="normal">2</mn></mrow></msup></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058806&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058805&type=
2.62466669
2.53999996
convolution block and
<math id="M2"><mi>S</mi><mi>O</mi><mo stretchy="false">(</mo><mn mathvariant="normal">3</mn><mo stretchy="false">)</mo></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058810&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058785&type=
7.78933382
2.96333337
convolution block, and the primary vectorized neurons are obtained after that. The vector convolution layer is used to learn high-level vectorized features derived from the lower layer. The vector convolutional layer can be used to transfer the primary vectorized neurons to get high-level ones. A deep vectorized network can be constructed through the vector convolutional layer-based stacking. To guarantee the rotation-equivariant spherical vector neurons can be learned well during convolution, we use the
<math id="M3"><mi>S</mi><mi>O</mi><mo stretchy="false">(</mo><mn mathvariant="normal">3</mn><mo stretchy="false">)</mo></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058810&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=43058785&type=
7.78933382
2.96333337
convolution operator to predict the high-level neurons. The VSCNN and the network-reconstructed are trained simultaneously. Third, the VSCNN-based 3D object classification is demonstrated. For validation, we use VSCNN to clarify the category information of the 3D model only.
Result
2
The ModelNet40 and SHREC15 of two 3D datasets are verified for the effectiveness of the proposed method. Our model is trained on the non-rotated (NR) and arbitrarily rotated (AR) training set, and it is tested on the non-rotated and rotated test set as well. The robustness of the model is demonstrated to rotation. For the rigid data set ModelNet40, the accuracy of rotation-unidentified targets can be reached to 85.2%, surpassing the baseline method by 7.7%. The comparative analysis shows that our method proposed can surpass most of multi-view and point cloud methods compared to other related 3D data representations. The NR/NR result can show its optimization ability in comparison with the benchmarks. At the same time, to identify non-rigid three-dimensional grid targets, we carry out a rotation classification experiment on the non-rigid data set SHREC15, and the accuracy rate can be reached to 90.4%, surpassing the baseline method by 8.8%.
Conclusion
2
We develop a 3D object classification method for rotated mesh. The robustness to 3D rotation can be optimized in terms of vectorized neurons and the equivariant vector convolution layer. The 3D models-rotated recognition is facilitated excluding rotation augmentation, and it shows the learning ability for vectorized networks-based transformation.
3维目标分类矢量型网络胶囊网络球面卷积网络(SCNN)旋转等变网络
3D object classificationvectorized networkcapsule networkspherical convolutional neural network(SCNN)rotation equivariant network
Ahmad A, Kakillioglu B and Velipasalar S. 2018. 3D capsule networks for object classification from 3D model data//Proceedings of the 52nd Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, USA: IEEE: 2225-2229 [DOI: 10.1109/ACSSC.2018.8645256http://dx.doi.org/10.1109/ACSSC.2018.8645256]
Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Cohen T S, Geiger M, Köhler J and Welling M. 2018. Spherical CNNs//Proceedings of 2018 International Conference on Learning Representations. Vancouver, Canada: OpenReview.net: 1-15
Cohen T S and Welling M. 2016. Group equivariant convolutional networks//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR: 2990-2999
Defferrard M, Milani M, Gusset F and Perraudin N. 2020. DeepSphere: a graph-based spherical CNN//Proceedings of 2020 International Conference on Learning Representations. La Jolla CA USA: OpenReview.net: 1-10
Driscoll J R and Healy D M. 1994. Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15(2): 202-250 [DOI: 10.1006/aama.1994.1008http://dx.doi.org/10.1006/aama.1994.1008]
Esteves C. 2020. Theoretical aspects of group equivariant neural networks [EB/OL]. [2020-04-30]. https://arxiv.org/pdf/2004.05154v2.pdfhttps://arxiv.org/pdf/2004.05154v2.pdf
Esteves C, Allec-Blanchette C, Makadia A and Daniilidis K. 2018. Learning SO(3) equivariant representations with spherical CNNs//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 54-70 [DOI: 10.1007/978-3-030-01261-8_4http://dx.doi.org/10.1007/978-3-030-01261-8_4]
Esteves C, Makadia A and Daniilidis K. 2020. Spin-weighted spherical CNNs//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 8614-8625
Frome A, Huber D, Kolluri R, Bülow T and Malik J. 2004. Recognizing objects in range data using regional point descriptors//Proceedings of the 8th European Conference on Computer Vision. Prague, Czech Republic: Springer: 224-237 [DOI: 10.1007/978-3-540-24672-5_18http://dx.doi.org/10.1007/978-3-540-24672-5_18]
Gu J D and Tresp V. 2020. Improving the robustness of capsule networks to image affine transformations//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7283-7291 [DOI: 10.1109/CVPR42600.2020.00731http://dx.doi.org/10.1109/CVPR42600.2020.00731]
Gu J D, Tresp V and Hu H. 2021. Capsule network is not more robust than convolutional network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 14304-14312 [DOI: 10.1109/CVPR46437.2021.01408http://dx.doi.org/10.1109/CVPR46437.2021.01408]
Hanocka R, Hertz A, Fish N, Giryes R, Fleishman S and Cohen-Or D. 2019. MeshCNN: a network with an edge. ACM Transactions on Graphics, 38(4): #90 [DOI: 10.1145/3306346.3322959http://dx.doi.org/10.1145/3306346.3322959]
Hinton G, Sabour S and Frosst N. 2018. Matrix capsules with EM routing//Proceedings of 2018 International Conference on Learning Representations. Vancouver, Canada: OpenReview.net: 1-16
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org: 448-456
Iqbal T, Xu Y, Kong Q Q and Wang W W. 2018. Capsule routing for sound event detection//Proceedings of the 26th European Signal Processing Conference. Rome, Italy: IEEE: 2255-2259 [DOI: 10.23919/EUSIPCO.2018.8553198http://dx.doi.org/10.23919/EUSIPCO.2018.8553198]
Jiang C, Huang J W, Kashinath K, Prabhat, Marcus P and Niessner M. 2019. Spherical CNNs on unstructured grids//Proceedings of 2019 International Conference on Learning Representations. New Orleans, LA, USA: OpenReview.net: 1-16
Kanezaki A, Matsushita Y and Nishida Y. 2018. RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5010-5019 [DOI: 10.1109/CVPR.2018.00526http://dx.doi.org/10.1109/CVPR.2018.00526]
Kazhdan M and Funkhouser T. 2002. Harmonic 3D shape matching//Proceedings of the ACM SIGGRAPH 2002 Conference Abstracts and Applications. San Antonio, USA: ACM: #191 [DOI: 10.1145/1242073.1242204http://dx.doi.org/10.1145/1242073.1242204]
Kingma D P and Ba J. 2017. Adam: a method for stochastic optimization//Proceedings of 2017 International Conference on Learning Representations. San Diego, USA: OpenReview.net: 1-18
Klambauer G, Unterthiner T, Mayr A and Hochreiter S. 2017. Self-normalizing neural networks//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 972-981
Kosiorek A R, Sabour S, Teh Y W and Hinton G E. 2019. Stacked capsule autoencoders//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Cambridge, Vancouver, Canada: MIT Press: 15486-15496
Kostelec P J and Rockmore D N. 2007. SOFT: SO(3) fourier transforms. Department of Mathematics, Dartmouth College, Hanover, NH, 3755: #21
Lenssen J E, Fey M and Libuschewski P. 2018. Group equivariant capsule networks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 8858-8886
Li H Y, Guo X Y, Dai B, Ouyang W L and Wang X G. 2018. Neural network encapsulation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 266-282 [DOI: 10.1007/978-3-030-01252-6_16http://dx.doi.org/10.1007/978-3-030-01252-6_16]
Lian Z, Zhang J, Choi S, Elnaghy H, El-Sana J, Furuya T, Giachetti A, Guler R A, Lai L, Li C, Li H, Limberger F A, Martin R, Nakanishi R U, Neto A P, Nonato L G, Ohbuchi R, Pevzner K, Pickup D, Rosin P, Sharf A, Sun L, Sun X, Tari S, Unal G and Wilson R C. 2015. Non-rigid 3D shape retrieval//Proceedings of 2015 Eurographics Workshop on 3D Object Retrieval. Zurich, Switzerland: Eurographics Association: 107-120
Liu C, Lin N, Cao Y J and Yang C. 2021. Seg-CapNet: neural network model for the cardiac MRI segmentation. Journal of Image and Graphics, 26(2): 452-463
刘畅, 林楠, 曹仰杰, 杨聪. 2021. Seg-CapNet: 心脏MRI图像分割神经网络模型. 中国图象图形学报, 26(2): 452-463 [DOI: 10.11834 / jig.190626http://dx.doi.org/10.11834/jig.190626]
Mohan R and Valada A. 2021. EfficientPS: efficient panoptic segmentation. International Journal of Computer Vision, 129(5): 1551-1579 [DOI: 10.1007/s11263-021-01445-zhttp://dx.doi.org/10.1007/s11263-021-01445-z]
Oyallon E and Mallat S. 2015. Deep roto-translation scattering for object classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2865-2873 [DOI: 10.1109/CVPR.2015.7298904http://dx.doi.org/10.1109/CVPR.2015.7298904]
Patrick M K, Adekoya A F, Mighty A A and Edward B Y. 2022. Capsule networks——a survey. Journal of King Saud University-Computer and Information Sciences, 34(1): 1295-1310 [DOI: 10.1016/j.jksuci.2019.09.014http://dx.doi.org/10.1016/j.jksuci.2019.09.014]
Poulenard A and Guibas L J. 2021. A functional approach to rotation equivariant non-linearities for tensor field networks//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13169-13178 [DOI: 10.1109/CVPR46437.2021.01297http://dx.doi.org/10.1109/CVPR46437.2021.01297]
Qi C R, Yi L, Su H and Guibas L J. 2017. PointNet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Cambridge, Long Beach, USA: Curran Associates Inc.: 5105-5114
Rao Y M, Lu J W and Zhou J. 2019. Spherical fractal convolutional neural networks for point cloud recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 452-460 [DOI: 10.1109/CVPR.2019.00054http://dx.doi.org/10.1109/CVPR.2019.00054]
Sabour S, Frosst N and Hinton G E. 2017. Dynamic routing between capsules//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 3859-3869
Shen W, Zhang B B, Huang S K, Wei Z H and Zhang Q S. 2020. 3D-rotation-equivariant quaternion neural networks//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 531-547 [DOI: 10.1007/978-3-030-58565-5_32http://dx.doi.org/10.1007/978-3-030-58565-5_32]
Spezialetti R, Stella F, Marcon M, Silva L, Salti S and Di Stefano L. 2020. Learning to orient surfaces by self-supervised spherical CNNs//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 5381-5392
Su H, Maji S, Kalogerakis E and Learned-Miller E. 2015. Multi-view convolutional neural networks for 3D shape recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 945-953 [DOI: 10.1109/ICCV.2015.114http://dx.doi.org/10.1109/ICCV.2015.114]
Touvron H, Vedaldi A, Douze M and Jégou H. 2019. Fixing the train-test resolution discrepancy//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press: 8252-8262
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010
Venkatraman S, Balasubramanian S and Sarma R R. 2019. Building deep, equivariant capsule networks//Proceedings of 2019 International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net: 1-19
Wang C Y, Liao H Y M, Wu Y H, Chen P Y, Hsieh J W and Yeh I H. 2020. CSPNet: a new backbone that can enhance learning capability of CNN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE: 1571-1580 [DOI: 10.1109/CVPRW50498.2020.00203http://dx.doi.org/10.1109/CVPRW50498.2020.00203]
Wang D L and Liu Q. 2018. An optimization view on dynamic routing between capsules//Proceedings of 2018 ICLR Workshop Submission. Vancouver, Canada: OpenReview.net: 1-4
Wang X X, Yu L, Tian S W and Wang R J. 2021. Missing argument filling of Uyghur event based on independent recurrent neural network and capsule network. Acta Automatica Sinica, 47(4): 903-912
王县县, 禹龙, 田生伟, 王瑞锦. 2021. 独立RNN和胶囊网络的维吾尔语事件缺失元素填充. 自动化学报, 47(4): 903-912 [DOI: 10.16383/j.aas.c180655http://dx.doi.org/10.16383/j.aas.c180655]
Wang Y J, Fan Q N, Li K, Chen D D, Yang J Y, Lu J Z, Lischinski D and Chen B Q. 2022. High quality rendered dataset and non-local graph convolutional network for intrinsic image decomposition. Journal of Image and Graphics, 27(2): 404-420
王玉洁, 樊庆楠, 李坤, 陈冬冬, 杨敬钰, 卢健智, Lischinski D, 陈宝权. 2022. 面向本征图像分解的高质量渲染数据集与非局部卷积网络. 中国图象图形学报, 27(2): 404-420 [DOI: 10.11834/jig.210705http://dx.doi.org/10.11834/jig.210705]
Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920 [DOI: 10.1109/CVPR.2015.7298801http://dx.doi.org/10.1109/CVPR.2015.7298801]
Zhang R. 2019. Making convolutional networks shift-invariant again//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 7324-7334
Zhang S F, Zhao W, Wu X F and Zhou Q. 2021. Fast dynamic routing based on weighted kernel density estimation. Concurrency and Computation: Practice and Experience, 33(15): #5281 [DOI: 10.1002/cpe.5281http://dx.doi.org/10.1002/cpe.5281]
Zhang Z H, Xu Y Y, Yu J Y and Gao S H. 2018. Saliency detection in 360° videos//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 504-520 [DOI: 10.1007/978-3-030-01234-2_30http://dx.doi.org/10.1007/978-3-030-01234-2_30]
Zhao Y H, Birdal T, Deng H W and Tombari F. 2019. 3D point capsule networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1009-1018 [DOI: 10.1109/CVPR.2019.00110http://dx.doi.org/10.1109/CVPR.2019.00110]
Zhao Y H, Birdal T, Lenssen J E, Menegatti E, Guibas L and Tombari F. 2020. Quaternion equivariant capsule networks for 3D point clouds//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 1-19 [DOI: 10.1007/978-3-030-58452-8_1http://dx.doi.org/10.1007/978-3-030-58452-8_1]
相关作者
相关机构