基于卷积特征的核相关自适应目标跟踪
Kernel correlation adaptive target tracking based on convolution feature
- 2017年22卷第9期 页码:1230-1239
网络出版:2017-08-25,
纸质出版:2017
DOI: 10.11834/jig.170009
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2017-08-25,
纸质出版:2017
移动端阅览
针对现实场景中跟踪目标的快速运动、旋转、尺度变化、遮挡等问题,提出了基于卷积特征的核相关自适应目标跟踪的方法。 利用卷积神经网络提取高、低层卷积特征并结合本文提出的核相关滤波算法计算并获得高底两层卷积特征响应图。采用Coarse-to-Fine方法对目标位置进行估计,在学习得到1维尺度核相关滤波器估计尺度的基础上实时更新高低两层核相关滤波器参数,以实现自适应的目标跟踪。 实验选取公开数据集中的典型视频序列进行跟踪,测试了算法在目标尺度发生变化、遮挡、旋转等复杂场景下的跟踪性能并与多种优秀的跟踪算法在平均中心误差、平均重叠率等指标上进行了定量比较,在Singer1、Car4、Jogging、Girl、Football以及MotorRolling视频图像序列上的中心误差分别为8.71、6.83、3.96、3.91、4.83、9.23,跟踪重叠率分别为0.969、1.00、0.967、0.994、0.967、0.512。实验结果表明,本文算法与原始核相关滤波算法相比,平均中心位置误差降低20%,平均重叠率提高12%。 采用卷积神经网络提取高低两层卷积特征,高层卷积特征用于判别目标和背景,低层卷积特征用于预测目标位置并通过Coarse-to-Fine方法对目标位置进行精确的定位,较好地解决了由于目标的旋转和尺度变化带来的跟踪误差大的问题,提高了跟踪性能并能够实时更新学习。在目标尺度发生变化、遮挡、光照条件改变、目标快速运动等复杂场景下仍表现出较强的鲁棒性和适应性。
Visual object tracking is a fundamental problem in computer vision that has numerous applications
such as in intelligent visual surveillance
human-machine interaction
and content-based video coding.In a generic tracking problem
the target can be any object and only its initial location is known.Most state-of-the-art approaches address the tracking problem by learning a discriminative appearance model for the target object.Among the discriminative tracking methods
correlation filter-based approaches have recently demonstrated an excellent performance on benchmark-tracking datasets.Despite the significant developments made in recent decades
visual tracking remains a challenging problem
mainly due to the considerable appearance changes caused by occlusion
deformation
abrupt motion
illumination variation
and background clutter.Features based on convolutional neural networks (CNNs) have recently demonstrated state-of-the-art results on a wide range of visual recognition tasks.Therefore
understanding how to best utilize the rich feature hierarchies in CNNs is significant in robust visual tracking.In view of the problem of a fast-moving
scaling
and rotating target during tracking
a kernel correlation adaptive target-tracking approach based on a convolution feature was proposed in this study. A CNN was introduced to extract high-and low-convolution features.High-and low-convolution response maps were obtained using the kernel correlation filter algorithm proposed in this study
and target position was estimated using the coarse-to-fine method.The target scale was estimated with a 1D scale correlation filter to realize adaptive target tracking
and the kernel correlation filter was updated in real time. We tested the proposed algorithm in an experiment on typical video sequences in public data sets.These data sets involved challenging factors
such as illumination change
partial occlusion
scale change
and complex background.We compared our method with excellent tracking algorithms
such as high-speed tracking with kernelized correlation filters
adaptive color attributes for real-time visual tracking
and real-time compressive tracking.For a quantitative comparison
we used two evaluation metrics
namely
the average center error and the average overlap ratio.The results of the target-tracking experiment showed that the proposed filter algorithm exhibited better performance than the original comparative filter algorithm.The average central position error was reduced by 20%
and the average overlap rate was increased by 12%.The center errors in the video sequences Singer1
Car4
Jogging
Girl
Football
and MotorRolling were 8.71
6.83
3.96
3.91
4.83
and 9.23
respectively.The tracking overlap ratios of the aforementioned video sequences were 0.969
1.00
0.967
0.994
0.967
and 0.512
respectively. In this study
a CNN was introduced to extract high-and low-convolution features.The high-convolution features were used to determine the target and background
whereas the low-convolution features were adopted to predict target position.The coarse-to-fine method was applied to accurately locate target position
which solved the problem of a large tracking error due to the rotation and change in scale of the target.This method improves tracking performance and can update learning in real time.The experimental results indicate that the proposed approach maintains its good robustness and high adaptability even in elaborate scenes
such as those with a changing or fast-moving target scale
occlusion
and various illumination conditions.
相关作者
相关机构
京公网安备11010802024621