结合深度学习与支持向量机的金属零件识别

郑健红; 鲍官军; 张立彬; 荀一; 陈教料

doi:10.11834/jig.190127

图像分析和识别 | 浏览量 : 0 下载量: 6 CSCD: 4

PDF
导出
分享
收藏
专辑

结合深度学习与支持向量机的金属零件识别
Metal part recognition based on deep learning and support vector machine
2019年24卷第12期页码：2233-2242
收稿：2019-04-12，

修回：2019-6-3，

录用：2019-6-9，

纸质出版：2019-12-16
DOI： 10.11834/jig.190127
稿件说明：

移动端阅览

郑健红, 鲍官军, 张立彬, 荀一, 陈教料. 结合深度学习与支持向量机的金属零件识别[J]. 中国图象图形学报, 2019,24(12):2233-2242. DOI： 10.11834/jig.190127.

Jianhong Zheng, Guanjun Bao, Libin Zhang, Yi Xun, Jiaoliao Chen. Metal part recognition based on deep learning and support vector machine[J]. Journal of Image and Graphics, 2019, 24(12): 2233-2242. DOI： 10.11834/jig.190127.

摘要

目的

在视觉引导的工业机器人自动拾取研究中，关键技术难点之一是机器人抓取目标区域的识别问题。特别是金属零件，其表面的反光、随意摆放时相互遮挡等非结构化因素都给抓取区域的识别带来巨大的挑战。因此，本文提出一种结合深度学习和支持向量机的抓取区域识别方法。

方法

分别提取抓取区域的方向梯度直方图（HOG）和局部二进制模式（LBP）特征，利用主成分分析法（PCA）对融合后的特征进行降维，以此来训练支持向量机（SVM）分类器。通过训练Mask R-CNN（regions with convolutional neural network）神经网络完成抓取区域的初步分割。然后利用SVM对Mask R-CNN识别的抓取区域进行二次分类，完成对干扰区域的剔除。最后计算掩码完成实例分割，以此达到对抓取区域的精确识别。

结果

对于随机摆放的铜质金属零件，本文算法与单一的Mask R-CNN及多特征融合的SVM算法就识别准确率、错检率、漏检率3个指标进行了比较，结果表明本文算法在识别准确率上较Mask R-CNN和SVM算法分别提高了7%和25%，同时有效降低了错检率与漏检率。

结论

本文算法结合了Mask R-CNN与SVM两种方法，对于反光和遮挡情况具有一定的鲁棒性，同时有效地提升了目标识别的准确率。

Abstract

Objective

Under the background of "machine substitution" robotic visual intelligence is crucial to the industrial upgrading of the manufacturing industry. Algorithm-guided industrial robots with a visual perception function are also receiving increasing attention in industrial production.One of the most critical difficulties in the automatic picking of industrial robots is the identification of the target area.This problem is particularly prominent in the picking process of metal parts. Unstructured factors

such as reflective surface and mutual occlusion during random placement

pose great challenges to the identification of the picking area.To solve these problems

this study proposes a picking region recognition method based on deep learning and support vector machine (SVM).These two models are combined to exploit their individual advantages and further improve their accuracy.

Method

The proposed approach is used to construct a new model that combines regions with a convolutional neural network feature (Mask R-CNN) and SVM.Our methods include feature extraction

multi-feature fusion

SVM classifier training

neural network training

the combination of SVM and deep neural network.First

the local binary pattern(LBP) and histogram of oriented gradient(HOG) features of the picking areaare extracted.The presence of interference areas poses a huge challenge to the identification of the picking area.The interference area is relative to the identification areaand is easily misidentified and obtained through long-term practice on the assembly line.The dimension of the feature matrix generated by directly merging these two features is too large.Thus

we mustutilize principal component analysis to reduce the dimensions of the feature matrix and train the SVM classifier through the trained feature matrix.The size of the matrix after the direct fusion of the two features is 7 000×2 692. Hence

we select a cumulative contribution rate of 94%

at which the recognition accuracy rate is up to 97.25%.The size of the feature matrix is reduced to 7 000×231after dimension reduction.After that

we cancomplete the initial segmentation of the picking area by training the Mask R-CNN

which may contain interference areas inside.Mask R-CNN is roughly composed of the following parts:feature extraction

area suggestion network (RPN)

ROIAlign

and final result.The feature extraction part is the backbone of the entire network. Its function is to extract several important features of different targets from numerous training photos.We use an already trained residual network (ResNet101)as the feature extraction network.The RPN network uses the feature map to obtain the candidate frame of the object in the original image

which is currently implemented by anchor technology.In this study

nine candidate regions are selected for each anchor on the feature graph according to different scales (i.e.

128

256

and 512 pixels) and different aspect ratios (i.e.

1:1

0.5:1

and 1:0.5).By using the ROIAlign network

the corresponding area in the feature map is pooled to a fixed size according to the position coordinates of the candidate frame.The final classification and regression results are generated by the fully connected layer

and the mask division of the object is generated by the deconvolution operation.Then

quadratic segmentation of the results after initial segmentation by the SVM algorithm basically completes the elimination of the interference area.The final instance segmentation is completed by mask calculation of the picking area.

Result

Multi-feature fusion SVM

Mask R-CNN

and the proposed algorithm are used to detect the picking area of 500 metal parts.Experimental results show that the algorithm can adapt to the recognition of the picking region. The correct rate of algorithm identification in this work is 89.40%

the missed detection rate is 7.80%

and the false detection rate is 2.80%.The correct rate of algorithm identification is 7.00% and 25.00% higher than those of Mask R-CNN and SVM

respectively.The error detection rate of the algorithm is 7.80% and 18.40% lower than those of Mask R-CNN and SVM

respectively. The missed detection rate of the algorithm is 6.60% lower than that of SVM.

Conclusion

The SVM classifier with multi-feature fusion is used to classify the recognition results of Mask R-CNN

and the rejection of the interference region is completed. Accurate recognition of the picking region is completed by the calculation of the mask.In the construction of the image training set

the effects of illumination and occlusion between parts are fully considered

and the illumination and occlusion conditions are effectively divided and investigated; hence

the approach exhibits a certain robustness in practical applications.Compared with the sliding window frame method used in traditional target recognition

this work accurately identifies the shape of the target area through mask calculation and has a high recognition accuracy.Moreover

this work compensates for the limitations of the single-network framework by constructing a multi-feature fusion SVM classifier

which effectively reduces the false detection rate.

关键词

Keywords

references

Chen L C, Papandreou G, Kokkinos I and Murphy K. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184]

Chen L C, Papandreou G, Kokkinos I and MurphyK. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL].2014-12-22 [2019-04-01] . https://arxiv.org/pdf/1412.7062.pdf https://arxiv.org/pdf/1412.7062.pdf

Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL]. 2017-06-17 [2019-04-01] . https://arxiv.org/pdf/1706.05587.pdf https://arxiv.org/pdf/1706.05587.pdf

Cortes C and Vapnik V. 1995. Support-vector networks. Machine Learning, 20(3): 273-297[DOI: 10.1007/BF00994018]

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 886-893 [ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

Freund Y and Schapire R E. 1996. Experiments with a new boosting algorithm//Proceedings of the 13th International Conference on International Conference on Machine Learning. Bari, Italy: ACM, 148-156

Girshick R, Donahue J and Darrell T. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 580-587 [ DOI: 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 1440-1448 [ DOI: 10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ]

He K M, Gkioxari G and Dollar P. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2980-2988 [ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 770-778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Lafferty J, Mccallum A and Pereira F C N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of Icml, 3(2):282-289 [DOI: 10.1109/ICIP.2012.6466940]

Lin T Y, Dollr P, Girshick R and He K. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 936-944 [ DOI: 10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ]

Liu W, Anguelov D, Erhan D and Szegedy C. 2016. SSD: Single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 21-37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]

Long J, Shelhamer E and Darrell T. 2014. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651 [DOI: 10.1109/TPAMI.2016.2572683]

Ojala T, Pietikinen M and Harwood D. 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1): 51-59 [DOI: 10.1016/0031-3203(95)00067-4]

Ojala T, Pietikainen M and Maenpaa T. 2002. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 971-987 [DOI: 10.1109/tpami.2002.1017623]

Papageorgiou C P, Oren M and Poggio T. 1998. A general framework for object detection//Proceedings of the 6th International Conference on Computer Vision. Bombay, India: IEEE, 555-562 [ DOI: 10.1109/ICCV.1998.710772 http://dx.doi.org/10.1109/ICCV.1998.710772 ]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 779-788 [ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031]

Simonyan K and Zisserman A. 2014. Very deep convolutionalnetworks for large-scale image recognition[EB/OL]. 2014-09-04[2019-04-01] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Sun X Q and Zou L Y. 2018. Application of image recognition based on SVM in part sorting system. Journal of Mechanical&Electrical Engineering, 35(12): 1353-1356

孙小权, 邹丽英. 2018.基于SVM的图像识别在零件分拣系统中的应用.机电工程, 35(12): 1353-1356)[DOI: 10.3969/j.issn.1001-4551.2018.12.019]

Visin F, Kastner K, Cho K and Matteucci M. 2015. ReNet: a recurrent neural network based alternative to convolutional networks[EB/OL][2019-04-01] . https://arxiv.org/pdf/1505.00393.pdf https://arxiv.org/pdf/1505.00393.pdf

Visin F, Romero A, Cho K and Matteucci M. 2016. ReSeg: a recurrent neural network-based model for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, NV, USA: IEEE, 426-433 [ DOI: 10.1109/CVPRW.2016.60 http://dx.doi.org/10.1109/CVPRW.2016.60 ]

Wang L, Zhou Q H, Wang L, Jiang H S and Lin S Y. 2019. Improved convolutional neural network algorithm for real-time recognition and location of mechanical parts. Intelligent Computer and Applications, 9(1): 36-41, 46

王乐, 周庆华, 王磊, 蒋华胜, 林恩宇. 2019.改进卷积神经网络算法在机械零件实时识别与定位中的应用.智能计算机与应用, 9(1): 36-41, 46

Yuan A F, Cao J N and Yu L. 2015. A SURF-based component recognition algorithm. Computer Applications and Software, 32(1): 186-189

袁安富, 曹金燕, 余莉. 2015.一种基于SURF特征的零件识别算法.计算机应用与软件, 32(1): 186-189 [DOI: 10.3969/j.issn.1000-386x.2015.01.047]

Zhang X F, Liu J, Shi Z S, Wu Z H and Wang Z. 2019. A review of semantic segmentation based on deep learning. Laser&Optoelectronics Progress, 56(15): 150003

张祥甫, 刘健, 石章松, 吴中红, 王智. 2019.基于深度学习的语义分割问题研究综述.激光与光电子学进展, 56(15): 150003 [DOI: 10.3788/lop56.150003]