Current Issue Cover
基于卷积特征的核相关自适应目标跟踪

王守义, 周海英, 杨阳(中北大学计算机与控制工程学院, 太原 030051)

摘 要
目的 针对现实场景中跟踪目标的快速运动、旋转、尺度变化、遮挡等问题,提出了基于卷积特征的核相关自适应目标跟踪的方法。方法 利用卷积神经网络提取高、低层卷积特征并结合本文提出的核相关滤波算法计算并获得高底两层卷积特征响应图。采用Coarse-to-Fine方法对目标位置进行估计,在学习得到1维尺度核相关滤波器估计尺度的基础上实时更新高低两层核相关滤波器参数,以实现自适应的目标跟踪。结果 实验选取公开数据集中的典型视频序列进行跟踪,测试了算法在目标尺度发生变化、遮挡、旋转等复杂场景下的跟踪性能并与多种优秀的跟踪算法在平均中心误差、平均重叠率等指标上进行了定量比较,在Singer1、Car4、Jogging、Girl、Football以及MotorRolling视频图像序列上的中心误差分别为8.71、6.83、3.96、3.91、4.83、9.23,跟踪重叠率分别为0.969、1.00、0.967、0.994、0.967、0.512。实验结果表明,本文算法与原始核相关滤波算法相比,平均中心位置误差降低20%,平均重叠率提高12%。结论 采用卷积神经网络提取高低两层卷积特征,高层卷积特征用于判别目标和背景,低层卷积特征用于预测目标位置并通过Coarse-to-Fine方法对目标位置进行精确的定位,较好地解决了由于目标的旋转和尺度变化带来的跟踪误差大的问题,提高了跟踪性能并能够实时更新学习。在目标尺度发生变化、遮挡、光照条件改变、目标快速运动等复杂场景下仍表现出较强的鲁棒性和适应性。
关键词
Kernel correlation adaptive target tracking based on convolution feature

Wang Shouyi, Zhou Haiying, Yang Yang(School of Computer and Control Engineering, North University of China, Taiyuan 030051, China)

Abstract
Objective Visual object tracking is a fundamental problem in computer vision that has numerous applications,such as in intelligent visual surveillance,human-machine interaction,and content-based video coding.In a generic tracking problem,the target can be any object and only its initial location is known.Most state-of-the-art approaches address the tracking problem by learning a discriminative appearance model for the target object.Among the discriminative tracking methods,correlation filter-based approaches have recently demonstrated an excellent performance on benchmark-tracking datasets.Despite the significant developments made in recent decades,visual tracking remains a challenging problem,mainly due to the considerable appearance changes caused by occlusion,deformation,abrupt motion,illumination variation,and background clutter.Features based on convolutional neural networks (CNNs) have recently demonstrated state-of-the-art results on a wide range of visual recognition tasks.Therefore,understanding how to best utilize the rich feature hierarchies in CNNs is significant in robust visual tracking.In view of the problem of a fast-moving,scaling,and rotating target during tracking,a kernel correlation adaptive target-tracking approach based on a convolution feature was proposed in this study.Method A CNN was introduced to extract high-and low-convolution features.High-and low-convolution response maps were obtained using the kernel correlation filter algorithm proposed in this study,and target position was estimated using the coarse-to-fine method.The target scale was estimated with a 1D scale correlation filter to realize adaptive target tracking,and the kernel correlation filter was updated in real time.Result We tested the proposed algorithm in an experiment on typical video sequences in public data sets.These data sets involved challenging factors,such as illumination change,partial occlusion,scale change,and complex background.We compared our method with excellent tracking algorithms,such as high-speed tracking with kernelized correlation filters,adaptive color attributes for real-time visual tracking,and real-time compressive tracking.For a quantitative comparison,we used two evaluation metrics,namely,the average center error and the average overlap ratio.The results of the target-tracking experiment showed that the proposed filter algorithm exhibited better performance than the original comparative filter algorithm.The average central position error was reduced by 20%,and the average overlap rate was increased by 12%.The center errors in the video sequences Singer1,Car4,Jogging,Girl,Football,and MotorRolling were 8.71,6.83,3.96,3.91,4.83,and 9.23,respectively.The tracking overlap ratios of the aforementioned video sequences were 0.969,1.00,0.967,0.994,0.967,and 0.512,respectively.Conclusion In this study,a CNN was introduced to extract high-and low-convolution features.The high-convolution features were used to determine the target and background,whereas the low-convolution features were adopted to predict target position.The coarse-to-fine method was applied to accurately locate target position,which solved the problem of a large tracking error due to the rotation and change in scale of the target.This method improves tracking performance and can update learning in real time.The experimental results indicate that the proposed approach maintains its good robustness and high adaptability even in elaborate scenes,such as those with a changing or fast-moving target scale,occlusion,and various illumination conditions.
Keywords

订阅号|日报