利用隐式解码器的三维模型簇协同分割

杨军; 张敏敏

doi:10.11834/jig.200677

三维点云分割 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

利用隐式解码器的三维模型簇协同分割
Co-segmentation of 3D shape clusters based on implicit decoder
2022年27卷第2期页码：550-561
纸质出版日期： 2022-02-16 ，

录用日期： 2021-03-23
DOI： 10.11834/jig.200677
稿件说明：

移动端阅览

杨军, 张敏敏. 利用隐式解码器的三维模型簇协同分割[J]. 中国图象图形学报, 2022,27(2):550-561.

Jun Yang, Minmin Zhang. Co-segmentation of 3D shape clusters based on implicit decoder[J]. Journal of Image and Graphics, 2022,27(2):550-561.
杨军, 张敏敏. 利用隐式解码器的三维模型簇协同分割[J]. 中国图象图形学报, 2022,27(2):550-561. DOI： 10.11834/jig.200677.

Jun Yang, Minmin Zhang. Co-segmentation of 3D shape clusters based on implicit decoder[J]. Journal of Image and Graphics, 2022,27(2):550-561. DOI： 10.11834/jig.200677.

摘要

目的

为建立3维模型语义部件之间的对应关系并实现模型自动分割，提出一种利用隐式解码器（implicit decoder，IM-decoder）的无监督3维模型簇协同分割网络。

方法

首先对3维点云模型进行体素化操作，进而由CNN-encoder（convolutional neural network encoder）提取体素化点云模型的特征，并将模型信息映射至特征空间。然后使用注意力模块聚合3维模型相邻点特征，将聚合特征与3维点坐标作为IM-decoder的输入来增强模型的空间感知能力，并输出采样点相对于模型部件的内外状态。最后使用max pooling聚合解码器生成的隐式场，以得到模型的协同分割结果。

结果

实验结果表明，本文算法在ShapeNet Part数据集上的mIoU（mean intersection-over-union）为62.1%，与目前已知的两类无监督3维点云模型分割方法相比，分别提高了22.5%和18.9%，分割性能得到了极大提升。与两种有监督方法相比，分别降低了19.3%和20.2%，但其在部件数较少的模型上可获得更优的分割效果。相比于使用交叉熵函数作为重构损失函数，本文使用均方差函数可获得更高的分割准确率，mIoU提高了26.3%。

结论

与当前主流的无监督分割算法相比，本文利用隐式解码器进行3维模型簇协同分割的无监督方法分割准确率更高。

Abstract

Objective

3D shape segmentation is an important task

without which many 3D data processing applications cannot accomplish their work. It has also become a hot research topic in areas

such as digital geometric processing and modeling

and plays a crucial role in finalizing tasks such as 3D printing

3D shape retrieval

and medical organ segmentation. Recent years have witnessed the continuous development of 3D data acquisition equipment such as laser scanners

RGBD cameras

and stereo cameras

which has resulted in 3D point cloud data enjoying wide usage in 3D shape segmentation tasks. Based on the analysis of the shape the 3D point cloud takes

3D point cloud segmentation methods involving deep learning solutions are divided into three categories by related research scholars: 1) volumetric-based methods

2) view-based methods

and 3) point-based methods. Volumetric-based methods first use voxels in 3D space as the definition domain to perform 3D convolution and then expand the convolutional neural network (CNN) to 3D space for feature learning. Finally

point cloud shape segmentation can be realized by aggregating the acquired features. View-based methods use spatial projection to convert the input 3D shape into multiple 2D image views

inputting the stack of images into a 2D CNN to extract the input point cloud shape features

and then

for a refinement of the segmentation results

the input 3D shape features are further processed through the view pool and the CNN. To accommodate situations in which the points of the input cloud are disorderly and irregularly dispersed

point-based methods set up a specific neural network input layer to input the 3D point cloud directly into the network for training to improve the segmentation performance of the 3D point cloud shape. The network cannot achieve efficient co-segmentation of the shape clusters by employing component reconstruction techniques because typical point cloud data lack topology and surface information

and the labeling large data sets is difficult. Considering human beings' notion of object recognition

which is based on parts

as well as other factors

such as the instability of the segmentation caused by the influence of occlusion and the illumination and projection angle in the view-based methods

voxelization of point cloud data is selected in this paper. Moreover

most of the existing deep learning methods used for 3D shape segmentation adopt a supervisory mechanism

and the implementation of automatic 3D shape segmentation methods is difficult without effective usage of the potential connections between shapes. Thus

an unsupervised 3D shape cluster co-segmentation network

based on the implicit decoder (IM-decoder)

is used for the realization of the correspondence between semantically related components and the automatic segmentation of 3D shapes in this paper.

Method

The unsupervised 3D shape cluster co-segmentation method

based on the implicit decoder

is divided mainly into three important operations: encoding

feature aggregation

and decoding. The first task of the encoding stage is to carry out an accurate extraction of the features from the input 3D shape. The encoder network designed in this paper is based on traditional CNNs

and the encoder can only process regular 3D data. First

voxelization is carried out on all the points that represent the shape in 3D point cloud form. Then

the Hierarchical Surface Prediction method is used to improve the quality of the reconstructed 3D shape. Finally

the features of the voxelized points are extracted through the CNN encoder

and the shape information is mapped to the feature space. The feature aggregation operation further improves the quality of the extracted features by using the attention module

which aggregates the features of adjacent points in the 3D shape. During the decoding stage

the aggregated features and the 3D coordinates of the points are input to the IM-decoder for an enhancement of the spatial perception of the shape

and the internal and external states of the sampling points relative to the shape components are output after this enhancement. The final co-segmentation is accomplished by a max pooling operation

which is realized through aggregating the implicit fields generated by the decoder.

Result

In this paper

ablation and comparative experiments are conducted on the ShapeNet Part dataset using intersection over union (IoU) and mean intersection over union (mIoU) as evaluation criteria. Experimental results show that the mIoU achieved by our algorithm

when invoked on the ShapeNet Part dataset

reaches 62.1%. Compared with the currently known two types of unsupervised 3D point cloud shape segmentation methods

its mIoU is increased by 22.5% and 18.9%

and the segmentation performance is greatly improved. Compared with the two supervised methods

the mIoU of this algorithm is reduced by 19.3% and 20.2%

but our method could achieve a better segmentation effect on shapes with fewer parts. Moreover

the choice of using the mean square error function as the reconstruction loss function

instead of using the cross-entropy function

results in a higher segmentation accuracy

which is manifested by an improvement of 26.3%

in terms of mIoU. The ablation experiment shows that the attention module designed in this paper could improve the segmentation accuracy of the network by automatically selecting important features from each shape type.

Conclusion

The experimental results show that the 3D shape cluster co-segmentation method

which is based on the implicit decoder

achieves a high segmentation accuracy. On the one hand

the method uses CNN-encoder to extract the features of the 3D shape and designs the attention module such that important features are automatically selected

which can further improve the quality of the features. On the other hand

the implicit decoder

constructed by our method

performs collaborative analysis on the joint feature vector

which is composed of the selectively chosen features and the 3D coordinates of the points. Moreover

the implicit field resulting from the fine-tuning of the reconstruction loss function could effectively improve the accuracy of the segmentation.

关键词

协同分割模型簇隐式解码器注意力模块无监督

Keywords

co-segmentationshape clustersimplicit decoderattention moduleunsupervised

references

Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]

Chen X B, Golovinskiy A and Funkhouser T. 2009. A benchmark for 3D mesh segmentation. ACM Transactions on Graphics, 28(3): #73[DOI: 10.1145/1531326.1531379]

Choy C B, Xu D F, Gwak J Y, Chen K and Savarese S. 2016. 3D-R2 N2: a unified approach for single and multi-view 3D object reconstruction//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 628-644[DOI: 10.1007/978-3-319-46484-8_38http://dx.doi.org/10.1007/978-3-319-46484-8_38]

Girdhar R, Fouhey D F, Rodriguez M and Gupta A. 2016. Learning a predictable and generative vector representation for objects//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 484-499[DOI: 10.1007/978-3-319-46466-4_29http://dx.doi.org/10.1007/978-3-319-46466-4_29]

Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2019. Deep learning for 3D point clouds: a survey[EB/OL]. [2020-10-28].https://arxiv.org/pdf/1912.12033.pdfhttps://arxiv.org/pdf/1912.12033.pdf

Häne C, Tulsiani S and Malik J. 2017. Hierarchical surface prediction for 3D object reconstruction//Proceedings of 2017 International Conference on 3D Vision (3DV). Qingdao, China: IEEE: 412-420[DOI: 10.1109/3DV.2017.00054http://dx.doi.org/10.1109/3DV.2017.00054]

Huang L, Yang J and Tian Z H. 2019. Unsupervised co-segmentation of 3D shapes via functional maps. Journal of Physics: Conference Series, 1168(3): #032046[DOI: 10.1088/1742-6596/1168/3/032046]

Huang Q X, Koltun V and Guibas L. 2011. Joint shape segmentation with linear programming//Proceedings of 2011 SIGGRAPH Asia Conference. Hong Kong, China: ACM: #125[DOI: 10.1145/2024156.2024159http://dx.doi.org/10.1145/2024156.2024159]

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]

Kalogerakis E, Hertzmann A and Singh K. 2010. Learning 3D mesh segmentation and labeling. ACM Transactions on Graphics, 29(4): #102[DOI: 10.1145/1778765.1778839]

Klokov R and Lempitsky V. 2017. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 863-872[DOI: 10.1109/ICCV.2017.99http://dx.doi.org/10.1109/ICCV.2017.99]

Li H Y, Sun Z X, Li Q and Shi J L. 2018. 3D shape co-segmentation by combining sparse representation with extreme learning machine//Proceedings of the 19th Pacific-Rim Conference on Multimedia. Hefei, China: Springer: 570-581[DOI: 10.1007/978-3-030-00767-6_53http://dx.doi.org/10.1007/978-3-030-00767-6_53]

Maturana D and Scherer S. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE: 922-928[DOI: 10.1109/IROS.2015.7353481http://dx.doi.org/10.1109/IROS.2015.7353481]

Muralikrishnan S, Kim V G and Chaudhuri S. 2018. Tags2 Parts: discovering semantic regions from shape tags//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2926-2935[DOI: 10.1109/CVPR.2018.00309http://dx.doi.org/10.1109/CVPR.2018.00309]

Niu C G, Liu Y J, Li Z M and Li H. 2019. 3D object recognition and model segmentation based on point cloud data. Journal of Graphics, 40(2): 274-281

牛辰庚, 刘玉杰, 李宗民, 李华. 2019. 基于点云数据的三维目标识别和模型分割方法. 图学学报, 40(2): 274-281[DOI: 10.11996/JG.j.2095-302X.2019020274]

Park J J, Florence P, Straub J, Newcombe R and Lovegrove S. 2019. DeepSDF: learning continuous signed distance functions for shape representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 165-174[DOI: 10.1109/CVPR.2019.00025http://dx.doi.org/10.1109/CVPR.2019.00025]

Shu Z Y, Qi C W, Xin S Q, Hu C, Wang L, Zhang Y and Liu L G. 2016. Unsupervised 3D shape segmentation and co-segmentation via deep learning. Computer Aided Geometric Design, 43: 39-52[DOI: 10.1016/j.cagd.2016.02.015]

Tulsiani S, Su H, Guibas L J, Efros A A and Malik J. 2017. Learning shape abstractions by assembling volumetric primitives//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1466-1474[DOI: 10.1109/CVPR.2017.160http://dx.doi.org/10.1109/CVPR.2017.160]

van Kaick O, Tagliasacchi A, Sidi O, Zhang H, Cohen-Or D, Wolf L and Hamarneh G. 2011. Prior knowledge for part correspondence. Computer Graphics Forum, 30(2): 553-562[DOI: 10.1111/j.1467-8659.2011.01893.x]

Wang X H, Wu L S, Chen H W, Hu Y and Shi Y Y. 2018. Feature line extraction from a point cloud based on region clustering segmentation. Acta Optica Sinica, 38(11): #1110001

王晓辉, 吴禄慎, 陈华伟, 胡赟, 石雅莹. 2018. 基于区域聚类分割的点云特征线提取. 光学学报, 38(11): #1110001[DOI: 10.3788/AOS201838.1110001]

Wang Y, Asafi S, van Kaick O, Zhang H, Cohen-Or D and Chen B Q. 2012. Active co-analysis of a set of shapes. ACM Transactions on Graphics (TOG), 31(6): 1-10[DOI: 10.1145/2366145.2366184]

Xu X and Lee G H. 2020. Weakly Supervised semantic point cloud segmentation: towards 10×fewer labels//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13703-13712[DOI: 10.1109/CVPR42600.2020.01372http://dx.doi.org/10.1109/CVPR42600.2020.01372]

Yang J, Wang S and Zhou P. 2019. Recognition and classification for three-dimensional model based on deep voxel convolution neural network. Acta Optica Sinica, 39(4): #0415007

杨军, 王顺, 周鹏. 2019. 基于深度体素卷积神经网络的三维模型识别分类. 光学学报, 39(4): #0415007[DOI: 10.3788/AOS201939.0415007]

Yang J and Zhang P. 2018. Three-dimensional shape segmentation by combining topological persistence and heat diffusion theory. Journal of Image andGraphics, 23(6): 887-895

杨军, 张鹏. 2018. 结合拓扑持续性和热扩散理论的3维模型分割. 中国图象图形学报, 23(6): 887-895[DOI: 10.11834/jig.170558]

Yi L, Kim V G, Ceylan D, Shen I C, Yan M Y, Su H, Lu C W, Huang Q X, Sheffer A and Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics, 35(6): #210[DOI: 10.1145/2980179.2980238]

Zhu C Y, Xu K, Chaudhuri S, Yi L, Guibas L J and Zhang H. 2020. AdaCoSeg: adaptive shape co-segmentation with group consistency loss//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8540-8549[DOI: 10.1109/CVPR42600.2020.00857http://dx.doi.org/10.1109/CVPR42600.2020.00857]

文章被引用时，请邮件提醒。

提交

结合轻量化骨干与多尺度融合的单阶段检测器

深度哈希图像检索方法综述

结合Gabor滤波器和扩展LTP算子的无监督纹理图像分割

广义统计区域合并的SAR图像浮筏养殖信息提取