Texture-less object detection method for industrial components picking system
- Vol. 27, Issue 8, Pages: 2418-2429(2022)
Published: 16 August 2022 ,
Accepted: 19 May 2021
DOI: 10.11834/jig.210088
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 August 2022 ,
Accepted: 19 May 2021
移动端阅览
Ming Yan, Dapeng Tao, Yuanyuan Pu. Texture-less object detection method for industrial components picking system. [J]. Journal of Image and Graphics 27(8):2418-2429(2022)
目的
2
随着工业领域智能分拣业务的兴起,目标检测引起越来越多的关注。然而为了适应工业现场快速部署和应用的需求,算法只能在获得少量目标样本的情况下调整参数;另外工控机运算资源有限,工业零件表面光滑、缺乏显著的纹理信息,都不利于基于深度学习的目标检测方法。目前普遍认为Line2D可以很好地用于小样本情况的低纹理目标快速匹配,但Line2D不能正确匹配形状相同而颜色不同的两个零件。对此,提出一种更为鲁棒的低纹理目标快速匹配框架CL2D(color Line2D)。
方法
2
首先使用梯度方向特征作为物体形状的描述在输入图像快速匹配,获取粗匹配结果;然后通过非极大值抑制和颜色直方图比对完成精细匹配。最后根据工业分拣的特点,由坐标变换完成对目标的抓取点定位。
结果
2
为了对算法性能进行测试,本文根据工业分拣的实际环境,提出了YNU-BBD 2020(YNU-building blocks datasets 2020)数据集。在YNU-BBD 2020数据集上的测试结果表明,CL2D可以在CPU平台上以平均2.15 s/幅的速度处理高分辨率图像,在精度上相比于经典算法和深度学习算法,mAP(mean average precision)分别提升了10%和7%。
结论
2
本文针对工业零件分拣系统的特点,提出了一种快速低纹理目标检测方法,能够在CPU平台上高效完成目标检测任务,并且相较于现有方法具有显著优势。
Objective
2
Texture-less object detection is crucial for industrial components picking systems
where multiple components are assigned to a feeder in random
and the objects is to use a vision-guided robot arm grasps each into a packing box. To improve deployment ability in industrial sites
the algorithm is required to adjust parameters with few samples and run in limited computing resources. The traditional detection methods can be achieved by key-point match quickly. However
industrial components are not textured to extract patch descriptors and build key-point correspondence sufficiently. The appearance of industrial components is dominated by their shape
and leads to template matching methods based on object contours. One classical work is Line2D
which only needs a few samples to build template and running in the CPU platform efficiently. However
it produces false-positive results when two components have a similar silhouette.
Method
2
We demonstrate a new method called color Line2D (CL2D). CL2D uses object images to extract template information
then running a sliding window template process on the input image to complete object detection. It covers the object shape and color both. We use the gradient direction feature as the shape descriptors
which can be extracted from discrete points on the object contour. Specifically
we compute the oriented gradient of these points on the object image and sliding window
and calculate the cosine value of the angle between each point pair and sum it to sort similarity out. We use the HSV color histogram to represent the appearance of object color as complementary to shape features. We use cosine similarity to compare the histogram between object image and sliding window. The overall framework of CL2D can be categorized into two parts of offline and online. In the offline part
we will build a template database to store the data that may be used in the online matching process in order to speed up the online process. The template database is constructed by the following two steps as mentioned below: first
we annotate relevant information on the object image for extracting template data related to object contour points
foreground area and grasp point. Second
we compute histogram in the context of the foreground area
rotation object image; contour points based gradient orientation to get templates for multi poses of rotation of the object. The online part can be summarized as three steps mentioned below: coarse matching
fine matching
and grasp point localization. First
we use gradient direction templates of different rotation pose
matching on the input image to obtain coarse detection results. The matching process is optimized through gradient direction quantization and pre-computed response maps to achieve a faster matching speed. Second
we use the non-maximum suppression to filter redundant matching results and then compare the color histogram to determine the detection result. Finally
we localize the object grasp point on the input image by a coordinate transformation method. In order to evaluate the performance of texture-less object detection methods
we illustrate YNU-building blocks datasets 2020 (YNU-BBD 2020) to simulate a real industrial scena-rio.
Result
2
Our experimental results demonstrate that the algorithm can process 1 920×1 200 resolution images at an average speed of 2.15 s per frame on a CPU platform. In the case of using only one or two samples per object
CL2D can achieve 67.7% mAP on the YNU-BBD 2020 dataset
which is about 10% relative improvement compared to Lind2D
and 7% to deep learning methods based on the synthetic training data. The qualitative results of comparison with classic texture-less object detection methods show that the CL2D algorithm has its priorities in multi-instance object detection.
Conclusion
2
We propose a texture-less object detection method in terms of the integration of color and shape representation. Our method can be applied in a CPU platform with few samples. It has significant advantages compare to deep learning methods or classical texture-less object detection methods. The proposed method has the potential to be used in industrial components picking systems.
模板匹配低纹理目标检测颜色直方图智能制造随机分拣
template matchtexture-less object detectioncolor histogramsmart manufacturingrandom bin-picking
Blank A, Hiller M, Zhang S Y, Leser A, Metzner M, Lieret M, Thielecke J and Franke J. 2019. 6DoF pose-estimation pipeline for texture-less industrial components in bin picking applications//Proceedings of 2019 European Conference on Mobile Robots. Prague, Czech Republic: IEEE: 1-7[DOI: 10.1109/ECMR.2019.8870920http://dx.doi.org/10.1109/ECMR.2019.8870920]
Bochkovskiy A, Wang C Y and Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2020-04-23].https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf
Borgefors G. 1988. Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6): 849-865[DOI: 10.1109/34.9107]
Chan J, Lee J A and Kemao Q. 2017. BIND: binary integrated net descriptors for texture-less object recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3020-3028[DOI: 10.1109/CVPR.2017.322http://dx.doi.org/10.1109/CVPR.2017.322]
Collet A, Martinez M and Srinivasa S S. 2011. The MOPED framework: object recognition and pose estimation for manipulation. The International Journal of Robotics Research, 30(10): 1284-1306[DOI: 10.1177/0278364911401765]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893[DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D. 2010. Object detection withdiscriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9): 1627-1645[DOI: 10.1109/TPAMI.2009.167]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P and Lepetit V. 2012. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5): 876-888[DOI: 10.1109/TPAMI.2011.206]
Hodan T, Haluza P, ObdržálekŠ, Matas J, Lourakis M and Zabulis X. 2017. T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects//Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision. Santa Rosa, USA: IEEE: 880-888[DOI: 10.1109/wacv.2017.103http://dx.doi.org/10.1109/wacv.2017.103]
Li J G and Zhang Y M. 2013. Learning SURF cascade for fast and accurate object detection//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3468-3475[DOI: 10.1109/CVPR.2013.445http://dx.doi.org/10.1109/CVPR.2013.445]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110[DOI: 10.1023/B:VISI.0000029664.99615.94]
Piccinini P, Prati A and Cucchiara R. 2012. Real-time object detection and localization with SIFT-based clustering. Image and Vision Computing, 30(8): 573-587[DOI: 10.1016/j.imavis.2012.06.004]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI: 10.1109/TPAMI.2016.2577031]
Rublee E, Rabaud V, Konolige K and Bradski G. 2011. ORB: an efficient alternative to SIFT or SURF//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE: 2564-2571[DOI: 10.1109/ICCV.2011.6126544http://dx.doi.org/10.1109/ICCV.2011.6126544]
Rucklidge W J. 1997. Efficiently locating objects using the hausdorff distance. International Journal of Computer Vision, 24(3): 251-270[DOI: 10.1023/A:1007975324482]
Steger C. 2002. Occlusion, clutter, and illumination invariant object recognition. International Archives of Photogrammetry and Remote Sensing, XXXIV(3A): 345-350
Tombari F, Franchi A and Di L. 2013. BOLD features to detect texture-less objects//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1265-1272[DOI: 10.1109/ICCV.2013.160http://dx.doi.org/10.1109/ICCV.2013.160]
Zhao W L and Ngo C W. 2013. Flip-invariant SIFT for copy and object detection. IEEE Transactions on Image Processing, 22(3): 980-991[DOI: 10.1109/TIP.2012.2226043]
Zhao Y Q, Rao Y, Dong S P and Zhang J Y. 2020. Survey on deep learning object detection. Journal of Image and Graphics, 25(4): 629-654
赵永强, 饶元, 董世鹏, 张君毅. 2020. 深度学习目标检测方法综述. 中国图象图形学报, 25(4): 629-654 [DOI: 10.11834/jig.190307]
Zhao Z Q, Zheng P, Xu S T and Wu X D. 2019. Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, 30(11): 3212-3232[DOI: 10.1109/TNNLS.2018.2876865]
相关作者
相关机构