融合图像显著性与特征点匹配的形变目标跟踪

杨勇; 闫钧华; 井庆丰

doi:10.11834/jig.170339

图像理解和计算机视觉 | 浏览量 : 0 下载量: 4 CSCD: 3

PDF
导出
分享
收藏
专辑

融合图像显著性与特征点匹配的形变目标跟踪
Deformation object tracking based on the fusion of invariant scalable key point matching and image saliency
2018年23卷第3期页码：384-398
收稿：2017-07-05，

修回：2017-11-13，

纸质出版：2018-03-16
DOI： 10.11834/jig.170339
稿件说明：

移动端阅览

杨勇, 闫钧华, 井庆丰. 融合图像显著性与特征点匹配的形变目标跟踪[J]. 中国图象图形学报, 2018,23(3):384-398. DOI： 10.11834/jig.170339.

Yong Yang, Junhua Yan, Qingfeng Jing. Deformation object tracking based on the fusion of invariant scalable key point matching and image saliency[J]. Journal of Image and Graphics, 2018, 23(3): 384-398. DOI： 10.11834/jig.170339.

摘要

目的

针对目标在跟踪过程中出现剧烈形变，特别是剧烈尺度变化的而导致跟踪失败情况，提出融合图像显著性与特征点匹配的目标跟踪算法。

方法

首先利用改进的BRISK（binary robust invariant scalable keypoints）特征点检测算法，对视频序列中的初始帧提取特征点，确定跟踪算法中的目标模板和目标模板特征点集合；接着对当前帧进行特征点检测，并与目标模板特征点集合利用FLANN（fast approximate nearest neighbor search library）方法进行匹配得到匹配特征点子集；然后融合匹配特征点和光流特征点确定可靠特征点集；再后基于可靠特征点集和目标模板特征点集计算单应性变换矩阵粗确定目标跟踪框，继而基于LC（local contrast）图像显著性精确定目标跟踪框；最后融合图像显著性和可靠特征点自适应确定目标跟踪框。当连续三帧目标发生剧烈形变时，更新目标模板和目标模板特征点集。

结果

为了验证算法性能，在OTB2013数据集中挑选出具有形变特性的8个视频序列，共2214帧图像作为实验数据集。在重合度实验中，本文算法能够达到0.567 1的平均重合度，优于当前先进的跟踪算法；在重合度成功率实验中，本文算法也比当前先进的跟踪算法具有更好的跟踪效果。最后利用Vega Prime仿真了无人机快速抵近飞行下目标出现剧烈形变的航拍视频序列，序列中目标的最大形变量超过14，帧间最大形变量达到1.72，实验表明本文算法在该视频序列上具有更好的跟踪效果。本文算法具有较好的实时性，平均帧率48.6帧/s。

结论

本文算法能够实时准确的跟踪剧烈形变的目标，特别是剧烈尺度变化的目标。

Abstract

Objective

A target tracking algorithm is proposed to address violent deformation of the target during the tracking process

especially a dramatic scale change

which leads to tracking failure. This algorithm is considered by fusing invariant scalable key points matching and image saliency.

Method

First

the target template and its feature point set are determined using the initial frame of a video sequence. Feature points are extracted from the initial frame by the improved BRISK feature point detection algorithm. The feature points of the current frame are extracted and matched with the target template feature points by using the FLANN method to obtain a subset of matching feature points. Second

the matching feature points and the light flow feature points are fused to determine the set of reliable feature points. Third

on the basis of the set of reliable feature points and the target template feature points

a homography transformation matrix is calculated to determine the target tracking box. The target tracking box is located based on image saliency

which is calculated by using the LC method. Finally

the target tracking box is adaptively determined by fusing image saliency and reliability feature points. In response to a highly severe non-rigid deformation

the target template and the target template feature point set are updated when the target in three consecutive frames is drastically deformed.

Result

A total of 2 214 frames of images with intense deformation of eight video sequences were selected from OTB2013 dataset as experimental datasets to verify the performance of the proposed algorithm. In a coincidence degree experiment

the algorithm in this paper can achieve an average coincidence degree of 0.567 1

which is better than the current advanced tracking algorithm. In the experiment of coincidence degree success rate

the proposed algorithm also has a better tracking effect than the current advanced tracking algorithm. Vega Prime is used to simulate UAV aerial video sequences

in which the object undergoes dramatic deformation. The maximum deformation of the target in the sequence exceeds 14

and the maximum deformation between frames reaches 1.72. Experiments show that the proposed algorithm has an improved tracking effect on this video sequence.

Conclusion

Experimental results show that the proposed algorithm can track the target of violent deformation

especially the target of violent scale change. The proposed algorithm has good real-time performance and an average frame rate of 48.6 frames/second.

关键词

Keywords

references

Wu Y, Lim J, Yang M H. Online object tracking:A benchmark[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR:IEEE, 2013:2411-2418.[ DOI:10.1109/CVPR.2013.312]. http://dx.doi.org/10.1109/CVPR.2013.312].

Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking[C]//Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands:Springer, 2016:445-461.[ DOI:10.1007/978-3-319-46448-0_27 http://dx.doi.org/10.1007/978-3-319-46448-0_27 ]

Ross D A, Lim J, Lin R S, et al. Incremental learning for robust visual tracking[J]. International Journal of Computer Vision, 2008, 77(1-3):125-141.[DOI:10.1007/s11263-007-0075-7]

Jia X. Visual tracking via adaptive structural local sparse appearance model[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI:IEEE, 2012:1822-1829.[ DOI:10.1109/CVPR.2012.6247880 http://dx.doi.org/10.1109/CVPR.2012.6247880 ]

Godec M, Roth P M, Bischof H. Hough-based tracking of non-rigid objects[J]. Computer Vision and Image Understanding, 2013, 117(10):1245-1256.[DOI:10.1016/j.cviu.2012.11.005]

Hare S, Golodetz S, Saffari A, et al. Struck:Structured Output Tracking with Kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):2096-2109.[DOI:10.1109/TPAMI.2015.2509974]

Danelljan M, Häger G, Khan F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8):1561-1575.[DOI:10.1109/TPAMI.2016.2609928]

Leutenegger S, Chli M, Siegwart R Y. BRISK:binary robust invariant scalable keypoints[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain:IEEE, 2012:2548-2555.[ DOI:10.1109/ICCV.2011.6126542 http://dx.doi.org/10.1109/ICCV.2011.6126542 ]

Mair E, Hager G D, Burschka D, et al. Adaptive and generic corner detection based on the accelerated segment test[C]//Proceedings of the 2012 European Conference on Computer Vision. Firenze, Italy:Springer, 2010:183-196.[ DOI:10.1007/978-3-642-15552-9_14 http://dx.doi.org/10.1007/978-3-642-15552-9_14 ]

Zhai Y, Shah M. Visual attention detection in video sequences using spatiotemporal cues[C]//Proceedings of the 14th ACM international conference on Multimedia. Santa Barbara, CA:ACM, 2006:815-824.[ DOI:10.1145/1180639.1180824 http://dx.doi.org/10.1145/1180639.1180824 ]

Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration[C]//Proceedingsof the 4th International Conference on Computer Vision Theory and Applications. Lisboa, Portugal:DBLP, 2009:331-340.[ DOI:10.5220/0001787803310340 http://dx.doi.org/10.5220/0001787803310340 ]

Tavakoli H R, Moin M S, Heikkilä J. Local similarity number and its application to object tracking[J]. International Journal of Advanced Robotic Systems, 2013, 10(3):184.[DOI:10.5772/55337]

Rousseeuw P J. Least median of squares regression[J]. Journal of the American Statistical Association, 1984, 79(388):871-880.[DOI:10.2307/2288718]

Fischler M A, Bolles R C. Random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography[J]. Readings in Computer Vision, 1987, 726-740.[DOI:10.1016/B978-0-08-051581-6.50070-2]

Zhong W, Lu H C, Yang M H. Robustobject tracking via sparse collaborative appearance model[J]. IEEE Transactions on Image Processing, 2014, 23(5):2356-2368.[DOI:10.1109/TIP.2014.2313227]

Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.[DOI:10.1109/TPAMI.2014.2345390]