利用隐式解码器的三维模型簇协同分割
Co-segmentation of 3D shape clusters based on implicit decoder
- 2022年27卷第2期 页码:550-561
纸质出版日期: 2022-02-16 ,
录用日期: 2021-03-23
DOI: 10.11834/jig.200677
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-02-16 ,
录用日期: 2021-03-23
移动端阅览
杨军, 张敏敏. 利用隐式解码器的三维模型簇协同分割[J]. 中国图象图形学报, 2022,27(2):550-561.
Jun Yang, Minmin Zhang. Co-segmentation of 3D shape clusters based on implicit decoder[J]. Journal of Image and Graphics, 2022,27(2):550-561.
目的
2
为建立3维模型语义部件之间的对应关系并实现模型自动分割,提出一种利用隐式解码器(implicit decoder,IM-decoder)的无监督3维模型簇协同分割网络。
方法
2
首先对3维点云模型进行体素化操作,进而由CNN-encoder(convolutional neural network encoder)提取体素化点云模型的特征,并将模型信息映射至特征空间。然后使用注意力模块聚合3维模型相邻点特征,将聚合特征与3维点坐标作为IM-decoder的输入来增强模型的空间感知能力,并输出采样点相对于模型部件的内外状态。最后使用max pooling聚合解码器生成的隐式场,以得到模型的协同分割结果。
结果
2
实验结果表明,本文算法在ShapeNet Part数据集上的mIoU(mean intersection-over-union)为62.1%,与目前已知的两类无监督3维点云模型分割方法相比,分别提高了22.5%和18.9%,分割性能得到了极大提升。与两种有监督方法相比,分别降低了19.3%和20.2%,但其在部件数较少的模型上可获得更优的分割效果。相比于使用交叉熵函数作为重构损失函数,本文使用均方差函数可获得更高的分割准确率,mIoU提高了26.3%。
结论
2
与当前主流的无监督分割算法相比,本文利用隐式解码器进行3维模型簇协同分割的无监督方法分割准确率更高。
Objective
2
3D shape segmentation is an important task
without which many 3D data processing applications cannot accomplish their work. It has also become a hot research topic in areas
such as digital geometric processing and modeling
and plays a crucial role in finalizing tasks such as 3D printing
3D shape retrieval
and medical organ segmentation. Recent years have witnessed the continuous development of 3D data acquisition equipment such as laser scanners
RGBD cameras
and stereo cameras
which has resulted in 3D point cloud data enjoying wide usage in 3D shape segmentation tasks. Based on the analysis of the shape the 3D point cloud takes
3D point cloud segmentation methods involving deep learning solutions are divided into three categories by related research scholars: 1) volumetric-based methods
2) view-based methods
and 3) point-based methods. Volumetric-based methods first use voxels in 3D space as the definition domain to perform 3D convolution and then expand the convolutional neural network (CNN) to 3D space for feature learning. Finally
point cloud shape segmentation can be realized by aggregating the acquired features. View-based methods use spatial projection to convert the input 3D shape into multiple 2D image views
inputting the stack of images into a 2D CNN to extract the input point cloud shape features
and then
for a refinement of the segmentation results
the input 3D shape features are further processed through the view pool and the CNN. To accommodate situations in which the points of the input cloud are disorderly and irregularly dispersed
point-based methods set up a specific neural network input layer to input the 3D point cloud directly into the network for training to improve the segmentation performance of the 3D point cloud shape. The network cannot achieve efficient co-segmentation of the shape clusters by employing component reconstruction techniques because typical point cloud data lack topology and surface information
and the labeling large data sets is difficult. Considering human beings' notion of object recognition
which is based on parts
as well as other factors
such as the instability of the segmentation caused by the influence of occlusion and the illumination and projection angle in the view-based methods
voxelization of point cloud data is selected in this paper. Moreover
most of the existing deep learning methods used for 3D shape segmentation adopt a supervisory mechanism
and the implementation of automatic 3D shape segmentation methods is difficult without effective usage of the potential connections between shapes. Thus
an unsupervised 3D shape cluster co-segmentation network
based on the implicit decoder (IM-decoder)
is used for the realization of the correspondence between semantically related components and the automatic segmentation of 3D shapes in this paper.
Method
2
The unsupervised 3D shape cluster co-segmentation method
based on the implicit decoder
is divided mainly into three important operations: encoding
feature aggregation
and decoding. The first task of the encoding stage is to carry out an accurate extraction of the features from the input 3D shape. The encoder network designed in this paper is based on traditional CNNs
and the encoder can only process regular 3D data. First
voxelization is carried out on all the points that represent the shape in 3D point cloud form. Then
the Hierarchical Surface Prediction method is used to improve the quality of the reconstructed 3D shape. Finally
the features of the voxelized points are extracted through the CNN encoder
and the shape information is mapped to the feature space. The feature aggregation operation further improves the quality of the extracted features by using the attention module
which aggregates the features of adjacent points in the 3D shape. During the decoding stage
the aggregated features and the 3D coordinates of the points are input to the IM-decoder for an enhancement of the spatial perception of the shape
and the internal and external states of the sampling points relative to the shape components are output after this enhancement. The final co-segmentation is accomplished by a max pooling operation
which is realized through aggregating the implicit fields generated by the decoder.
Result
2
In this paper
ablation and comparative experiments are conducted on the ShapeNet Part dataset using intersection over union (IoU) and mean intersection over union (mIoU) as evaluation criteria. Experimental results show that the mIoU achieved by our algorithm
when invoked on the ShapeNet Part dataset
reaches 62.1%. Compared with the currently known two types of unsupervised 3D point cloud shape segmentation methods
its mIoU is increased by 22.5% and 18.9%
and the segmentation performance is greatly improved. Compared with the two supervised methods
the mIoU of this algorithm is reduced by 19.3% and 20.2%
but our method could achieve a better segmentation effect on shapes with fewer parts. Moreover
the choice of using the mean square error function as the reconstruction loss function
instead of using the cross-entropy function
results in a higher segmentation accuracy
which is manifested by an improvement of 26.3%
in terms of mIoU. The ablation experiment shows that the attention module designed in this paper could improve the segmentation accuracy of the network by automatically selecting important features from each shape type.
Conclusion
2
The experimental results show that the 3D shape cluster co-segmentation method
which is based on the implicit decoder
achieves a high segmentation accuracy. On the one hand
the method uses CNN-encoder to extract the features of the 3D shape and designs the attention module such that important features are automatically selected
which can further improve the quality of the features. On the other hand
the implicit decoder
constructed by our method
performs collaborative analysis on the joint feature vector
which is composed of the selectively chosen features and the 3D coordinates of the points. Moreover
the implicit field resulting from the fine-tuning of the reconstruction loss function could effectively improve the accuracy of the segmentation.
协同分割模型簇隐式解码器注意力模块无监督
co-segmentationshape clustersimplicit decoderattention moduleunsupervised
Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Chen X B, Golovinskiy A and Funkhouser T. 2009. A benchmark for 3D mesh segmentation. ACM Transactions on Graphics, 28(3): #73[DOI: 10.1145/1531326.1531379]
Choy C B, Xu D F, Gwak J Y, Chen K and Savarese S. 2016. 3D-R2 N2: a unified approach for single and multi-view 3D object reconstruction//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 628-644[DOI: 10.1007/978-3-319-46484-8_38http://dx.doi.org/10.1007/978-3-319-46484-8_38]
Girdhar R, Fouhey D F, Rodriguez M and Gupta A. 2016. Learning a predictable and generative vector representation for objects//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 484-499[DOI: 10.1007/978-3-319-46466-4_29http://dx.doi.org/10.1007/978-3-319-46466-4_29]
Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2019. Deep learning for 3D point clouds: a survey[EB/OL]. [2020-10-28].https://arxiv.org/pdf/1912.12033.pdfhttps://arxiv.org/pdf/1912.12033.pdf
Häne C, Tulsiani S and Malik J. 2017. Hierarchical surface prediction for 3D object reconstruction//Proceedings of 2017 International Conference on 3D Vision (3DV). Qingdao, China: IEEE: 412-420[DOI: 10.1109/3DV.2017.00054http://dx.doi.org/10.1109/3DV.2017.00054]
Huang L, Yang J and Tian Z H. 2019. Unsupervised co-segmentation of 3D shapes via functional maps. Journal of Physics: Conference Series, 1168(3): #032046[DOI: 10.1088/1742-6596/1168/3/032046]
Huang Q X, Koltun V and Guibas L. 2011. Joint shape segmentation with linear programming//Proceedings of 2011 SIGGRAPH Asia Conference. Hong Kong, China: ACM: #125[DOI: 10.1145/2024156.2024159http://dx.doi.org/10.1145/2024156.2024159]
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]
Kalogerakis E, Hertzmann A and Singh K. 2010. Learning 3D mesh segmentation and labeling. ACM Transactions on Graphics, 29(4): #102[DOI: 10.1145/1778765.1778839]
Klokov R and Lempitsky V. 2017. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 863-872[DOI: 10.1109/ICCV.2017.99http://dx.doi.org/10.1109/ICCV.2017.99]
Li H Y, Sun Z X, Li Q and Shi J L. 2018. 3D shape co-segmentation by combining sparse representation with extreme learning machine//Proceedings of the 19th Pacific-Rim Conference on Multimedia. Hefei, China: Springer: 570-581[DOI: 10.1007/978-3-030-00767-6_53http://dx.doi.org/10.1007/978-3-030-00767-6_53]
Maturana D and Scherer S. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE: 922-928[DOI: 10.1109/IROS.2015.7353481http://dx.doi.org/10.1109/IROS.2015.7353481]
Muralikrishnan S, Kim V G and Chaudhuri S. 2018. Tags2 Parts: discovering semantic regions from shape tags//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2926-2935[DOI: 10.1109/CVPR.2018.00309http://dx.doi.org/10.1109/CVPR.2018.00309]
Niu C G, Liu Y J, Li Z M and Li H. 2019. 3D object recognition and model segmentation based on point cloud data. Journal of Graphics, 40(2): 274-281
牛辰庚, 刘玉杰, 李宗民, 李华. 2019. 基于点云数据的三维目标识别和模型分割方法. 图学学报, 40(2): 274-281[DOI: 10.11996/JG.j.2095-302X.2019020274]
Park J J, Florence P, Straub J, Newcombe R and Lovegrove S. 2019. DeepSDF: learning continuous signed distance functions for shape representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 165-174[DOI: 10.1109/CVPR.2019.00025http://dx.doi.org/10.1109/CVPR.2019.00025]
Shu Z Y, Qi C W, Xin S Q, Hu C, Wang L, Zhang Y and Liu L G. 2016. Unsupervised 3D shape segmentation and co-segmentation via deep learning. Computer Aided Geometric Design, 43: 39-52[DOI: 10.1016/j.cagd.2016.02.015]
Tulsiani S, Su H, Guibas L J, Efros A A and Malik J. 2017. Learning shape abstractions by assembling volumetric primitives//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1466-1474[DOI: 10.1109/CVPR.2017.160http://dx.doi.org/10.1109/CVPR.2017.160]
van Kaick O, Tagliasacchi A, Sidi O, Zhang H, Cohen-Or D, Wolf L and Hamarneh G. 2011. Prior knowledge for part correspondence. Computer Graphics Forum, 30(2): 553-562[DOI: 10.1111/j.1467-8659.2011.01893.x]
Wang X H, Wu L S, Chen H W, Hu Y and Shi Y Y. 2018. Feature line extraction from a point cloud based on region clustering segmentation. Acta Optica Sinica, 38(11): #1110001
王晓辉, 吴禄慎, 陈华伟, 胡赟, 石雅莹. 2018. 基于区域聚类分割的点云特征线提取. 光学学报, 38(11): #1110001[DOI: 10.3788/AOS201838.1110001]
Wang Y, Asafi S, van Kaick O, Zhang H, Cohen-Or D and Chen B Q. 2012. Active co-analysis of a set of shapes. ACM Transactions on Graphics (TOG), 31(6): 1-10[DOI: 10.1145/2366145.2366184]
Xu X and Lee G H. 2020. Weakly Supervised semantic point cloud segmentation: towards 10×fewer labels//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13703-13712[DOI: 10.1109/CVPR42600.2020.01372http://dx.doi.org/10.1109/CVPR42600.2020.01372]
Yang J, Wang S and Zhou P. 2019. Recognition and classification for three-dimensional model based on deep voxel convolution neural network. Acta Optica Sinica, 39(4): #0415007
杨军, 王顺, 周鹏. 2019. 基于深度体素卷积神经网络的三维模型识别分类. 光学学报, 39(4): #0415007[DOI: 10.3788/AOS201939.0415007]
Yang J and Zhang P. 2018. Three-dimensional shape segmentation by combining topological persistence and heat diffusion theory. Journal of Image andGraphics, 23(6): 887-895
杨军, 张鹏. 2018. 结合拓扑持续性和热扩散理论的3维模型分割. 中国图象图形学报, 23(6): 887-895[DOI: 10.11834/jig.170558]
Yi L, Kim V G, Ceylan D, Shen I C, Yan M Y, Su H, Lu C W, Huang Q X, Sheffer A and Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics, 35(6): #210[DOI: 10.1145/2980179.2980238]
Zhu C Y, Xu K, Chaudhuri S, Yi L, Guibas L J and Zhang H. 2020. AdaCoSeg: adaptive shape co-segmentation with group consistency loss//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8540-8549[DOI: 10.1109/CVPR42600.2020.00857http://dx.doi.org/10.1109/CVPR42600.2020.00857]
相关作者
相关机构