RGB-D图像中的分步超像素聚合和多模态融合目标检测
Object detection adopting sub-step merging of super-pixel and multi-modal fusion in RGB-D
- 2018年23卷第8期 页码:1231-1241
收稿:2017-12-20,
修回:2018-3-4,
纸质出版:2018-08-16
DOI: 10.11834/jig.170641
移动端阅览

浏览全部资源
扫码关注微信
收稿:2017-12-20,
修回:2018-3-4,
纸质出版:2018-08-16
移动端阅览
目的
2
受光照变化、拍摄角度、物体数量和物体尺寸等因素的影响,室内场景下多目标检测容易出现准确性和实时性较低的问题。为解决此类问题,本文基于物体的彩色和深度图像组,提出了分步超像素聚合和多模态信息融合的目标识别检测方法。
方法
2
在似物性采样(object proposal)阶段,依据人眼对显著性物体观察时先注意其色彩后判断其空间深度信息的理论,首先对图像进行超像素分割,然后结合颜色信息和深度信息对分割后的像素块分步进行多阈值尺度自适应超像素聚合,得到具有颜色和空间一致性的似物性区域;在物体识别阶段,为实现物体不同信息的充分表达,利用多核学习方法融合所提取的物体颜色、纹理、轮廓、深度多模态特征,将特征融合核输入支持向量机多分类机制中进行学习和分类检测。
结果
2
实验在基于华盛顿大学标准RGB-D数据集和真实场景集上将本文方法与当前主流算法进行对比,得出本文方法整体的检测精度较当前主流算法提升4.7%,运行时间有了大幅度提升。其中分步超像素聚合方法在物体定位性能上优于当前主流似物性采样方法,并且在相同召回率下采样窗口数量约为其他算法的1/4;多信息融合在目标识别阶段优于单个特征和简单的颜色、深度特征融合方法。
结论
2
结果表明在基于多特征的目标检测过程中本文方法能够有效利用物体彩色和深度信息进行目标定位和识别,对提高物体检测精度和检测效率具有重要作用。
Objective
2
With the development of artificial intelligence
a growing number of scholars begin to study object detection in the field of computer vision
and they are no longer content with the recent research on RGB images. The object detection methods based on the depth of images have attracted attention. However
the accuracy and real-time performance of indoor multi-class object detection is susceptible to illumination change
shooting angle
the number of objects
and object size. To improve detection accuracy
several studies have begun to employ deep learning methods. Although deep learning can effectively extract the underlying characteristics of objects at different levels
large samples and long learning time make the immediate and wide application of these methods impossible. With regard to improving detection efficiency
many scholars wanted to find all possible areas that contain objects based on the edge information of objects
thus reducing the number of detection windows. Several researchers used deep learning method to preselect it. To address these problems
this study proposes two methods by stages
which adopt RGB-D graphs. The first method is object proposal with super-pixel merging by steps
and the other is object classification adopting the technology of multi-modal data fusion.
Method
2
In the stage of object proposal
the method first segments images into super-pixels and merges them by steps adopting the method of self-adaptive multi-threshold scale on the basis of the color and depth information
according to the theory of eyes observing the color information first and then the depth information of an object. The method proposes to segment the graph with simple linear iterative clustering and merges the super-pixel in two steps
calculating the area similarity with respect to color and depth information. In this way
the detection windows with similar color and depth information are extracted to decrease the window number through filtering them by area and adopting non-maximal suppression to detection results with the overlapping region. At the end of the process
the number of detected windows becomes far less than that when using a sliding window scan
and each area may contain an object or part of an object. In the object recognition stage
the proposed method fuses the multi-modal features
including color
texture
contour
and depth
which are extracted from RGB-D images
employing multi-kernel learning. In general
objects are unclear when identified simply with one feature because of the multiplicity of objects. For example
distinguishing an apple from one painted in a picture is difficult. Multi-modal data fusion can cover several object characteristics in RGB-D images relative to single feature or simple fusion with two features. Finally
the fusing feature kernel is inputted into the SVM classifier
and the procedure of object detection is complete.
Result
2
By setting different threshold segmentation interval parameters and multi-kernel learning gauss kernel parameters
the study compares the proposed method and the current mainstream algorithm. The textual method has a certain advantage in object detection. The detection rate of the method is better by 4.7% than the state-of-art method via the comparative experiment based on the standard RGB-D databases from the University of Washington and real-scene databases obtained by Kinect sensor. The method of sub-step merging of super-pixel is superior to the present mainstream object proposal methods in object location
and the amounts of sampling windows are approximately fourfold less than the other algorithms in the situation of same recall rate. Moreover
by comparing the individual feature and the fusion feature recognition accuracy
multi-feature fusion method is much higher than the individual characteristics. The characteristics of the two fusions in the overall detection accuracy also have outstanding performance on object categories with different gestures.
Conclusion
2
Experimental results show that the proposed method can take full use of the color and depth information in object location and classification and is important in achieving high accuracy and enhanced real-time performance. The sub-step merging of super-pixel can also be used in the field of object detection based on deep learning.
Huang Z C, Liu Z Y. Feature integration and S-D probability correction based RGB-D saliency detection[J]. Journal of Image and Graphics, 2016, 21(10):1392-1401.
黄子超, 刘政怡.特征融合与S-D概率矫正的RGB-D显著检测[J].中国图象图形学报, 2016, 21(10):1392-1401. [DOI:10.11834/jig.20161014]
Su B Y, Ma J Y, Peng Y S, et al. Fast point cloud registration based on RGB-D data[J]. Journal of Image and Graphics, 2017, 22(5):643-655.
苏本跃, 马金宇, 彭玉升, 等.面向RGBD深度数据的快速点云配准方法[J].中国图象图形学报, 2017, 22(5):643-655. [DOI:10.11834/jig. 160602]
Papageorgiou C, Poggio T. A trainable system for object detection[J]. International Journal of Computer Vision, 2000, 38(1):15-33.[DOI:10.1023/A:1008162616689]
Hosang J, Benenson R, Dollár P, et al. What makes for effective detection proposals?[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(4):814-830.[DOI:10.1109/TPAMI.2015.2465908]
Cheng M M, Zhang Z M, Lin W Y, et al. BING: binarized normed gradients for objectness estimation at 300fps[C ] //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 3286-3293. [ DOI: 10.1109/CVPR.2014.414 http://dx.doi.org/10.1109/CVPR.2014.414 ]
Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges[C ] //Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 391-405. [ DOI: 10.1007/978-3-319-10602-1_26 http://dx.doi.org/10.1007/978-3-319-10602-1_26 ]
Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.[DOI:10.1007/s11263-013-0620-5]
Zheng Y, Chen Q Q, Zhang Y J. Deep learning and its new progress in object and behavior recognition[J]. Journal of Image and Graphics, 2014, 19(2):175-184.
郑胤, 陈权崎, 章毓晋.深度学习及其在目标和行为识别中的新进展[J].中国图象图形学报, 2014, 19(2):175-184. [DOI:10.11834/jig.20140202]
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C ] //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587. [ DOI: 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]
Girshick R. Fast R-CNN[C ] //Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448. [ DOI: 10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ]
Kanezaki A, Harada T. 3D Selective search for obtaining object candidates[C ] //Proceeding s of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015: 82-87. [ DOI: 10.1109/IROS.2015.7353358 http://dx.doi.org/10.1109/IROS.2015.7353358 ]
Ren S Q, He K M, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.[DOI:10.1109/TPAMI.2016.2577031]
Gupta S, Girshick R, Arbeláez P, et al. Learning rich features from RGB-D images for object detection and segmentation[C ] //Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 345-360. [ DOI: 10.1007/978-3-319-10584-0_23 http://dx.doi.org/10.1007/978-3-319-10584-0_23 ]
Wolf L, Hassner T, Taigman Y. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10):1978-1990.[DOI:10.1109/TPAMI.2010.230]
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C ] //Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2015: 886-893. [ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]
Rusu R B, Bradski G, Thibaux R, et al. Fast 3D recognition and pose using the viewpoint feature histogram[C ] //Proceedings of 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, Taiwan, China: IEEE, 2010: 2155-2162. [ DOI: 10.1109/IROS.2010.5651280 http://dx.doi.org/10.1109/IROS.2010.5651280 ]
Steder B, Rusu R B, Konolige K, et al. Point feature extraction on 3D range scans taking into account object boundaries[C ] //Proceedings of 2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011: 2601-2608. [ DOI: 10.1109/ICRA.2011.5980187 http://dx.doi.org/10.1109/ICRA.2011.5980187 ]
Bo L F, Lao K, Ren X F, et al. Object recognition with hierarchical kernel descriptors[C ] //Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2011: 1729-1736. [ DOI: 10.1109/CVPR.2011.5995719 http://dx.doi.org/10.1109/CVPR.2011.5995719 ]
Tang S, Wang X Y, Lv X T, et al. Histogram of oriented normal vectors for object recognition with a depth sensor[C]//Proceedings of the 11th Asian Conference on Computer Vision. Daejeon, Korea: Springer, 2013: 525-538.
Achanta R, Shaji A, Smith K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11):2274-2282.[DOI:10.1109/TPAMI. 2012. 120]
Bucak S S, Jin R, Jain A K. Multiple kernel learning for visual object recognition:a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7):1354-1369.[DOI:10.1109/TPAMI.2013.212]
Rakotomamonjy A, Bach F R, Canu S, et al. SimpleMKl[J]. Journal of Machine Learning Research, 2008, 9:2491-2521.
Lai K, Bo L F, Ren X F, et al. A large-scale hierarchical multi-view RGB-D object dataset[C ] //Proceedings of 2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011: 1817-1824. [ DOI: 10.1109/ICRA.2011.5980382 http://dx.doi.org/10.1109/ICRA.2011.5980382 ]
相关作者
相关机构
京公网安备11010802024621