Current Issue Cover
背景与时间感知的相关滤波实时视觉跟踪

朱建章1, 王栋2, 卢湖川2(1.河南财经政法大学数学与信息科学学院, 郑州 450046;2.大连理工大学信息与通信工程学院, 大连 116024)

摘 要
目的 传统的相关滤波跟踪算法采用对跟踪目标(唯一准确正样本)循环移位获取负样本,在整个学习过程中没有对真正的背景信息进行建模,因此当目标与背景信息极其相似时容易漂移。大多数跟踪算法为了提高跟踪性能,在时间序列上收集了大量的训练样本而导致计算复杂度的增加。采用模型在线更新策略,由于未考虑时间一致性,使得学习到的滤波器可能偏向背景而发生漂移。为了改善以上问题,本文在背景感知相关滤波(BACF)跟踪算法的基础上,加入时间感知,构建了一个带等式限制的相关滤波目标函数,称为背景与时间感知相关滤波(BTCF)视觉跟踪。该算法不但获取了真正的负样本作为训练集,而且仅用当前帧信息无需模型在线更新策略就能学习到具有较强判别力的相关滤波器。方法 首先将带等式限制的相关滤波目标函数转化为无约束的增广拉格朗日乘子公式,然后采用交替方向乘子方法(ADMM)转化为两个具有闭式解的子问题迭代求最优解。结果 采用OTB2015数据库中的OPE(one pass evaluation)评价准则,以成功率曲线图线下面积(AUC)和中心点位置误差为评判标准,在OTB2015公开数据库上与10个比较优秀的视觉跟踪算法进行对比实验。结果显示,100个视频序列和11个视频属性的成功率及对应的AUC和中心位置误差均明显优于其他基于相关滤波的视觉跟踪算法,说明本文算法具有良好的跟踪效果。本文的BTCF算法仅采用HOG纯手工特征,在OTB2015数据库上AUC较BACF算法提高了1.3%;由于颜色与边缘特征具有互补特性,本文融合CN(color names)特征后,在OTB2015数据库上,AUC较BACF算法提高了4.2%,采用纯手工特征跟踪性能AUC达到0.663,跟踪速度达到25.4帧/s。结论 本文的BTCF算法能够适用于光照变化、目标旋转、遮挡等复杂情况下的视觉跟踪,具有良好的鲁棒性和一定的实时性。
关键词
Learning background-temporal-aware correlation filter for real-time visual tracking

Zhu Jianzhang1, Wang Dong2, Lu Huchuan2(1.School of Mathematics and Information Sciences, Henan University of Economics and Low, Zhengzhou 450046, China;2.School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, China)

Abstract
Objective Visual tracking is a classical computer vision problem with many applications. In generic visual tracking, the task is to estimate the trajectory of a target in an image sequence, given only its initial location. Recently, traditional discriminative correlation filter-based approaches have been successfully applied to tracking problems. These methods learn a discriminative correlation filter from a set of training samples, which adopt a circular shift operator on the tracking target object (the only accurate positive sample) to obtain the training negative samples. These shifted patches are implicitly generated through the circulatory property of correlation in frequency domain and are used as negative examples for training the filter. All shifted patches are plagued by circular boundary effects and are not truly representative of negative patches in real-word scenes. Thus, the actual background information is not modeled during the total learning process and when the target object is similar to the background information, thereby leading to a drift. To improve the performance, a large number of training samples are collected, which results in the increase of computational complexity. Moreover, preferring the background is easy due to the online model update strategy, which causes drift. To resolve this problem, we construct a discriminative correlation filter-based target function with equation-constrained condition on the background-aware correlation filtering (BACF) visual object tracking algorithm, which is termed as the background-temporal-aware correlation filter (BTCF) visual object tracking. Our algorithm obtains the actual negative sample with the same size as the target object on the training set by multiplying the filter with a binary mask to suppress the background region. Moreover, it can learn a strong correlation filter-based discriminative classifier by only using the current frame information without online updating of the model. Method In this paper, the proposed BTCF model is convex and can be minimized to obtain the globally optimal solution. In order to further reduce the computational burden, we propose a new equation-constrained discriminative correlation filter-based objective function. This objective function satisfies the Eckstein-Bertsekas condition, therefore, it can be transformed into an unconstrained augmented Lagrange multiplier formula to converge to the global optimum solution. Then, two sub-problems with closed-form solution are gained by using the alternating direction multiplier method (ADMM). Every sub-problem is a smooth and convex function and is very easy to obtain the solution, therefore, each iteration of the sub-problem has a closed-form solution and is the global optimal solution of each sub-problem. Because of the convolution calculation in sub-problem two, it is difficult to solve the optimization problem, consequently, according to Parseval's theorem, we transform sub-problem two into Fourier domain to reduce computational complexity. The efficient ADMM based approach for learning our filter on multi-channel features, with computational cost of O(LKTlg(T)), where T is the size of vectorized frame, K is the number of feature channels, and L is the ADMM's iterations. We calculate model updates with Sherman-Morrison lemma to cope with changes in target and background appearance with real-time performance. Our algorithm can empirically converge within few iterations and, with hand-crafted features, can run in real time, thereby achieving notable improvements over BACF object tracking algorithm by tracking accuracy. Result The one-pass evaluation is used to compare different trackers proposed by OTB2015 based on two criteria, namely, center location error and bounding box overlap ratio. The center location error is one of the widely used evaluation metrics for target object tracking, which computes the average Euclidean distance between the center locations of the tracked targets and manually labeled ground truth positions of all the frames. Moreover, the percentage of frames in the estimated locations with a given threshold distance of the ground truth positions is considered successful tracking. Another commonly used evaluation metric is the overlap score. We use the area under the curve (AUC) of each success plot, which is the average of the success rates corresponding to the sample over thresholds to measure the ranking trackers. Our approach is compared with 10 state-of-the-art visual object tracking algorithms on the OTB2015 public database. Results show that our BTCF algorithm is remarkably better than visual tracking algorithms based on discriminative correlation filter in center location error and AUC. OTB2015 categorizes 100 sequences by annotating them with 11 attributes to evaluate and analyze the strength and weakness of the tracking approaches. Results show that our BTCF algorithm is remarkably better than the BACF in the center location error and AUC on the 11 attributes, thereby indicating that our algorithm can achieve effective and efficient performance. The BTCF visual object tracking increased 1.3% on AUC compared with BACF, which only uses histogram of oriented gradients (HOG) hand-crafted features on the OTB2015 database. The color and edge features have complementary characteristics; thus, we introduce color names (CNs) to our BTCF formulation to hoist the 4.2% AUC compared with BACF on the OTB2015 database. The AUC that reaches 0.663 and the speed that attains 25.4 fps on OTB2015 database only use the hand-crafted features (HOG and CNs). Conclusion Compared with the BACF algorithm and other current popular tracking approaches, the proposed BTCF-based visual tracking algorithm can be applied to many challenging conditions. Due to introduce the temporal-aware term on BACF model, a stronger discriminative classifier can be learned to separate the target from the background, especially in illumination variation, motion blur, out-of-plane rotation, and occlusion scene. Therefore, the proposed BTCF-based algorithm demonstrates the robustness and real-time characteristic.
Keywords

订阅号|日报