发布时间: 2018-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.170655
2018 | Volume 23 | Number 11

图像理解和计算机视觉

采用核相关滤波的快速TLD视觉目标跟踪

王姣尧¹, 侯志强^1,2, 余旺盛¹, 廖秀峰¹, 陈传华¹

1. 空军工程大学信息与导航学院, 西安 710077;

2. 西安邮电大学计算机学院, 西安 710121

收稿日期: 2018-12-22; 修回日期: 2018-04-18

基金项目: 国家自然科学基金项目（61473309，61703423，41601436）

第一作者简介: 王姣尧, 1995年生, 女, 硕士研究生在读, 主要研究方向为计算机视觉。E-mail:18629456938@163.com;
余旺盛, 男, 讲师, 主要研究方向为视觉目标跟踪、图像分割。E-mail:xing_fu_yu@sina.com;
廖秀峰, 男, 硕士研究生, 主要研究方向为计算机视觉。E-mail:18729476936@163.com;
陈传华, 男, 硕士研究生, 主要研究方向为计算机视觉。E-mail:15686248112@163.com.

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2018)11-1686-11

摘要

目的如何对目标进行快速鲁棒的跟踪一直是计算机视觉的重要研究方向之一，TLD（tracking-learning-detection）算法为这一问题提供了一种有效的解决方法，为了进一步提高TLD算法的跟踪性能，从两个方面对其进行了改进。方法首先在跟踪模块采用尺度自适应的核相关滤波器（KCF）作为跟踪器，考虑到跟踪模块与检测模块相互独立，本文算法使用检测模块对跟踪模块结果的准确性进行判断，并根据判断结果对KCF滤波器模板进行有选择地更新；然后在检测模块，运用光流法对目标位置进行初步预测，依据预测结果动态调整目标检测区域后，再使用分类器对目标进行精确定位。结果为了验证本文算法的优越性，对其进行了两组实验，实验1在OTB2013和Temple Color128这两个平台上对本文算法进行了跟踪性能的测试，其结果表明本文算法在OTB2013上的跟踪精度和成功率分别为0.761和0.559，在Temple Color128上的跟踪精度和成功率分别为0.678和0.481，且在所有测试视频上的平均跟踪速度达到了27.92帧/s；实验2将本文算法与其他3种改进算法在随机选取的8组视频上进行了跟踪测试与对比分析，实验结果表明，本文算法具有最小的中心位置误差14.01、最大的重叠率72.2%以及最快的跟踪速度26.23帧/s，展现出良好的跟踪性能。结论本文算法使用KCF跟踪器，提高了算法对遮挡、光照变化和运动模糊等场景的适应能力，使用光流法缩小检测区域，提高了算法的跟踪速度。实验结果表明，本文算法在多数情况下均取得优于参考算法的跟踪性能，在对目标进行长时间跟踪时表现出良好的跟踪鲁棒性。

关键词

视觉目标跟踪; TLD(tracking-learning-detection); 核相关滤波; 光流法; 检测区域调整

Fast TLD visual tracking algorithm with kernel correlation filter

Wang Jiaoyao¹, Hou Zhiqiang^1,2, Yu Wangsheng¹, Liao Xiufeng¹, Chen Chuanhua¹

1. Institute of Information and Navigation, Air Force Engineering University, Xi'an 710077, China;

2. Institute of Computer, Xi'an University of Posts and Telecommunications, Xi'an 710121, China

Supported by: National Natural Science Foundation of China (61473309, 61703423, 41601436)

Abstract

Objective Visual tracking is widely applied in fields, such as video surveillance, human-computer interaction, and intelligent transportation, at present. In recent years, domestic and foreign researchers have proposed numerous tracking algorithms for this purpose. When applied to practical use, these algorithms are required to track a target extensively. However, continuously tracking a target is difficult for most algorithms given the complexity of the tracking scenario. Therefore, conducting rapid and robust tracking of a target is a key issue that must be solved when applying visual target tracking technology to practical use. TLD algorithm provides an effective solution to this issue. This study improves two aspects of the TLD algorithm to improve its tracking performance. Method First, a scale adaptive kernel correlation filter (KCF) is used as a tracker in the tracking module. The KCF algorithm cannot adapt to the scale change of the target because the size of the filter template is fixed. However, the detection module of the TLD algorithm has a certain scale adaptability. Therefore, the proposed algorithm utilizes the scale adaptive capabilities of the detection module to measure the scale of the region of interest of the KCF tracker. Moreover, the scale adjustments can enable the KCF tracker to achieve an improved tracking precision. The algorithm uses the detection module to assess the accuracy of the results of the tracking, module and selectively updates the KCF filter template in accordance with the assessed result because the tracking and detection modules are independent of each other. Second, an optical flow method in the detection module is used to preliminarily predict a target position. The optical flow method is used to estimate the target movement between two adjacent frames without any prior knowledge. The target detection area is set in accordance with the predicted position, and the size of the detection area is proportional to the target size. A three-layer cascade classifier is used to locate the target accurately after dynamically adjusting the target detection area. An anti-interference capability of the algorithm to similar objects in the scene is enhanced since the target motion information is introduced. Result Two sets of experiments are conducted to verify the superiority of the proposed algorithm. The first set of experiments is conducted on the OTB2013 and Temple Color 128 data platforms. The OTB2013 data platform has 50 sets of video sequences, and the Temple Color 128 data platform has 128 sets of video sequences. Results show that the tracking accuracy and success rate of the algorithm on the OTB2013 data platform are 0.761 and 0.559, respectively, and the tracking accuracy and success rate of the algorithm on the Temple Color 128 data platform are 0.678 and 0.481, correspondingly. The proposed algorithm is compared with six state-of-the art algorithms, namely, DSST, KCF, CNT, Struck, TLD, and DLT. Among all the algorithms, the proposed algorithm exhibits the optimum performance on the two data platforms, Besides, the. The average tracking speed of all test videos reaches 27.92 frame/s, thereby indicating a favorable real-time performance. In another set of experiments, the proposed algorithm and three other improved algorithms are tested and compared with the randomly selected eight sets of video sequences. The experimental results show that the proposed algorithm has the smallest center position error of 14.01, the largest overlap rate of 72.2%, and the fastest tracking speed of 26.23 frame/s, thus denoting that the proposed algorithm achieves the optimum tracking performance among all of the improved algorithms. Conclusion The proposed algorithm uses the KCF tracker to improve the capability of the algorithm to adapt to different scenes, such as occlusion, illumination change, and motion blur. Furthermore, the proposed algorithm uses the optical flow method to narrow the detection area. Consequently, the tracking speed of the algorithm is improved. The experimental results show that the proposed algorithm exhibits better tracking performance than the reference algorithm in most cases and achieves favorable tracking robustness in an extensive tracking process.

Key words

visual tracking; TLD(tracking-learning-detection); kernel correlation; optical flow method; detection area adjustment

0 引言

视觉跟踪是计算机视觉领域中一个极具挑战性的问题^[1]，国内外研究学者为此提出了大量的跟踪算法，但由于光照变化、遮挡和尺度变化等因素的影响，目前还没有一种视觉跟踪算法能实现长时间的鲁棒跟踪。因此如果视觉跟踪算法具有目标丢失后重检测功能，算法的鲁棒性将会得到极大的提高。2012年Kalal等人^[2]提出的TLD算法考虑了这一问题，在该算法中，跟踪模块与检测模块独立地对每个输入的视频帧进行处理，当跟踪模块出现错误时，检测模块的检测结果会对其进行修正，同时学习模块对目标出现的新特征进行学习，这使得目标模型更加鲁棒。

为了提高TLD算法的鲁棒性与实时性，许多学者对其进行了研究，目前的工作主要是对跟踪模块与检测模块的改进。在对跟踪模块进行改进时，文献[3]均先对图像进行特征点检测，再使用光流法对特征点进行跟踪；文献[4]采用了单元格的方式对局部跟踪器进行布置，并对跟踪结果准确性的判定方式进行了改进。在对检测模块进行改进时，文献[4-5]均采用Kalman滤波器对检测区域进行预测，以缩小检测范围；文献[6]使用基于帧差法的检测器代替方差分类器。这些改进算法在鲁棒性与实时性方面取得了一定程度的提高，但TLD算法的性能较大程度上依赖跟踪模块，而以上改进算法的跟踪模块均为对光流跟踪器的改进，仍然受到光流法应用场景的限制，且由于视觉目标运动的不规则性，Kalman滤波的应用受到限制，帧差法的引入提高了算法的跟踪速度，但使算法的跟踪精度有所下降。

针对上述问题，本文进行了以下改进：

1) 考虑到KCF算法^[7]具有较高的跟踪精度与跟踪速度，本文算法将其用作跟踪器，并增加了目标尺度预测，并对原KCF算法中的模型更新策略进行了改进；

2) 在检测模块中，引入了光流法对目标进行粗定位，在保证算法可靠性的前提下，极大地减少了检测窗口的数目，同时利用目标的运动信息减小了视野中相似目标的干扰影响。

为了验证改进算法的有效性，共设计了两组实验，实验1对本文算法在OTB2013^[8]和Temple Color128^[9]两个平台上进行了测试，并与TLD以及其他5种跟踪算法进行了对比，其中CNT^[10]与DLT^[11]为深度学习跟踪算法的代表，KCF和DSST^[12]为相关滤波的代表算法，Struck^[13]为跟踪性能较好的经典算法；实验2将本文算法与其他3种TLD改进算法在8组视频序列进行了测试与对比分析，这3种对比算法分别为：仅使用KCF跟踪器的TLD算法(简称为TLD_KCF)、文献[3]提出的基于关键特征点的TLD改进算法(简称为TLD_KFP)，和文献[4]提出的使用Cell FoT布局方式、Σ预测器以及Kalman滤波的改进方法(简称为TLD+)。两组实验结果均表明：本文算法取得了较好的鲁棒性与实时性。

1 TLD算法

1.1 TLD算法

TLD算法主要由3个部分组成:跟踪模块、检测模块与学习模块。跟踪模块采用光流法跟踪器，且为获取其中的稳定跟踪点，TLD算法引入了基于NCC^[14](normal cross correlation)相似性计算和前向后向跟踪法的失败检测机制，最后将稳定跟踪点之间的位移中值与尺度变化中值作为跟踪模块的输出。检测模块对每个视频帧采用多种尺度进行全局扫描，并将得到的检测窗口依次通过级联分类器，将最终通过的窗口作为检测模块的输出。跟踪模块与检测模块独立并行地对每一个视频帧进行处理，将两者的结果依据一定的融合策略进行融合，得出最终的跟踪位置。学习模块根据跟踪结果对当前帧的正负样本进行采样，并采用P-N学习策略^[15]对目标模型进行学习与更新。

1.2 存在的不足

在TLD算法的跟踪模块中，光流跟踪器在目标出现快速运动、遮挡或者光照变化等情况时易产生跟踪漂移^[16-17]，进而导致跟踪失败。而在TLD算法中，检测模块的训练样本是实时获取的，若算法运行初期，跟踪模块无法为检测模块提供“高质量”的训练样本，则检测模块的准确性无法得到提高，进而导致算法丧失重检测的功能。在实时性方面，为了选取稳定跟踪点，跟踪模块在光流法的基础上增加了前向后向跟踪机制和运算量较大的NCC相似性计算^[14]，因此算法的运行速度受到影响。

TLD算法检测模块假设所有视频帧之间相互独立，在检测时对每个输入的视频帧进行全局扫描，对于一幅大小为320×240像素的图像，检测模块生成的窗口为30 433，其中包含了大量与目标无关的背景窗口，这些背景窗口占用了大量计算资源，降低了检测模块的运行速度^[4]。

2 本文算法

针对TLD算法存在的不足提出一种基于核相关滤波的快速TLD改进算法。在改进算法中，跟踪模块使用尺度自适应的KCF跟踪器，检测模块先使用光流法对目标进行粗定位，然后再使用级联分类器对目标进行精确定位，提高了算法的检测效率。

2.1 基于KCF的TLD跟踪模块

本文算法引入KCF算法作为跟踪器，并在此基础上增加了两个步骤：一是针对KCF不能很好地适应目标尺度变化的缺点，结合TLD算法的最终输出结果，对KCF跟踪器下一帧检测区域的尺度进行动态调整；二是通过计算KCF跟踪结果的置信度来解决目标遮挡时滤波器模板的更新问题，当置信度高于阈值时，认为目标没有受到大面积遮挡，对滤波器模板进行更新，而当置信度低于阈值时，则不对滤波器模板进行更新。

KCF在初始帧完成对滤波器的训练后，在后续跟踪中包括目标检测和滤波器更新两个步骤:

1) 滤波器训练。在视频序列第1帧中选取大小为$M×N$的图像块对滤波器进行训练，将该图像块记为基样本${\mathit{\boldsymbol{x}}_{\rm{0}}}$。对${\mathit{\boldsymbol{x}}_{\rm{0}}}$在行和列上分别进行循环移位，如图 1所示，可得到$M×N$个训练样本，记每个训练样本为${\mathit{\boldsymbol{x}}_i}$, $i$=0, 1, …, $M×N-1$，其集合记为$\mathit{\boldsymbol{X}}$，对每个样本按照2维高斯函数进行标记，依次记为${y_i}$, $i$=0, 1…，$M×N-1$，其集合记为$\mathit{\boldsymbol{Y}}$。

图 1 循环移位生成的训练样本

Fig. 1 Training samples generated by circularly shifting

KCF算法依据$f\left(\mathit{\boldsymbol{z}} \right) = {\mathit{\boldsymbol{\omega }}^{\rm{T}}}\cdot\mathit{\boldsymbol{z}}$的响应值对样本进行分类，因此分类器的训练即为找到最优的参数$\mathit{\boldsymbol{\omega }}$，使得代价函数

$ \mathop {{\rm{min}}}\limits_\omega \sum\limits_i^n {{{\left| {f({\mathit{\boldsymbol{x}}_i})-{y_i}} \right|}^2}} + \lambda \left\| \mathit{\boldsymbol{\omega }} \right\| $

(1)

最小。KCF算法为了解决非线性空间分类问题，使用映射函数$\phi $(·)将样本映射到高维的线性空间，并利用岭回归训练分类器

$ \begin{array}{l} f\left( \mathit{\boldsymbol{z}} \right) = {\mathit{\boldsymbol{\omega }}^{\rm{T}}}\phi \left( \mathit{\boldsymbol{z}} \right)\\ \mathit{\boldsymbol{\omega }} = \sum\limits_i {{\partial _i}\phi ({\mathit{\boldsymbol{x}}_i})} \end{array} $

(2)

因此分类器可表示为

$ f\left( \mathit{\boldsymbol{z}} \right) = \sum\limits_{i = 0}^{M \times N-1} {{\partial _i}\mathit{\boldsymbol{\kappa }}(\mathit{\boldsymbol{z}}, {\mathit{\boldsymbol{x}}_i})} $

(3)

式中，核函数$\mathit{\boldsymbol{\kappa }}(\mathit{\boldsymbol{z}}, {\mathit{\boldsymbol{x}}_i}) = \phi \left(\mathit{\boldsymbol{z}} \right) \otimes \phi ({\mathit{\boldsymbol{x}}_i})$为两个样本在高维空间中的内积。${{\partial _i}}$为每个样本的对应系数，其集合记为$\mathit{\boldsymbol{\partial }}$。由文献[18]可知，$\mathit{\boldsymbol{\partial }}$的解析解为

$ \mathit{\boldsymbol{\partial }} = {\left( {\mathit{\boldsymbol{K}} + \lambda \mathit{\boldsymbol{I}}} \right)^{ - 1}}\mathit{\boldsymbol{Y}} $

(4)

式中，$\mathit{\boldsymbol{K}} = \{ \mathit{\boldsymbol{\kappa }}({\mathit{\boldsymbol{x}}_i}, {\mathit{\boldsymbol{x}}_j}), i, j = 0, 1, \ldots, M \times N{\rm{-}}1\} $。

为了克服矩阵求逆运算量大的问题，KCF通过对循环矩阵进行傅里叶对角化，将矩阵的求逆运算转换为向量的点积，大大提高了算法的运行速度。文献[18]给出了傅里叶域内$\mathit{\boldsymbol{\partial }}$的快速求解方法，即

$ \hat {\mathit{\boldsymbol{\partial }}} = \frac{{\mathit{\boldsymbol{\hat Y}}}}{{{{\mathit{\boldsymbol{\hat \kappa }}}_{{x_0}, {x_0}}} + {\lambda ^*}}} $

(5)

式中，变量上方的“∧”为变量在傅里叶域的变换值。

2) 目标检测。考虑到TLD算法的检测模块使用了多个尺度，其最终输出结果更接近于目标的真实状态，因此本文算法在跟踪模块中使用TLD算法的跟踪结果增加了尺度预测这一步骤，使KCF跟踪器的尺度自适应能力有所提高。跟踪过程如图 2中目标检测部分所示，当下一帧(第$k+1$帧)图像输入后，依据上一帧跟踪结果的位置和大小在当前帧中选取用于检测的感兴趣区域，然后采用双线性插值法将其大小调整为$M×N$。缩放后的图像块记为$z$，对$z$进行循环移位得到样本集合，并计算滤波响应图，即

$ \hat f\left( \mathit{\boldsymbol{z}} \right) = {\left( {{{\mathit{\boldsymbol{\hat \kappa }}}_{{x_0}, z}}} \right)^*} \otimes \hat {\mathit{\boldsymbol{\partial }}} $

(6)

图 2 跟踪模块示意图

Fig. 2 Schematic diagram of the tracking module

得到的滤波响应图中最大响应值点对应的横纵坐标值即为目标在下一帧相对于上一帧的位移。

3) 滤波器更新。由式(6)可知，计算响应图只需用到样本模板$\mathit{\boldsymbol{x}}$和系数模板${\mathit{\boldsymbol{\partial }}}$，因此为了适应目标的变化，需要对滤波器模板$\left({\mathit{\boldsymbol{x}}, \mathit{\boldsymbol{\partial}}} \right)$进行更新。当目标被遮挡或者跟踪失败时，KCF算法使用的逐帧更新法将使得滤波模板偏离真实目标。由于跟踪模块与检测模块相互独立，因此本文使用检测模块来跟踪模块的输出结果的置信度。如图 2中滤波器更新部分所示，将第$k$帧跟踪模块的输出结果输入最近邻分类器可得到其置信度$Conf$，当$Conf$大于阈值$thr$时，对滤波器模板进行更新(学习率$β=0.02$)，反之，则滤波器模板不进行更新(学习率$β=0$)。经过大量的实验证明，当$thr$为0.7时，能够较好地区分跟踪结果是否准确。

具体的更新过程类似于滤波器的训练过程，

先对当前帧跟踪到的图像块进行循环移位得到训练样本，再对其进行傅里叶变换得到当前帧学习到的模板$\left({{\mathit{\boldsymbol{x}}_k}, {\partial _k}} \right)$，最后根据一定的学习率$β$对滤波器模板进行学习，得到更新后的滤波器模板$\left({\mathit{\boldsymbol{x}}, \partial } \right)$，样本模板与系数模板的更新过程分别为

$ \mathit{\boldsymbol{x}} = \left( {1-\beta } \right){\mathit{\boldsymbol{x}}_{{\rm{pre}}}} + \beta {\mathit{\boldsymbol{x}}_k} $

(7)

$ \partial = {\rm{ }}\left( {1-\beta } \right){\partial _{{\rm{pre}}}} + {\rm{ }}\beta {\partial _k} $

(8)

式中，$\left({{\mathit{\boldsymbol{x}}_{{\rm{pre}}}}, {\partial _{{\rm{pre}}}}} \right)$为更新前的滤波器模板。

4) 跟踪模块流程。综合上述描述，对本文算法跟踪模块的具体流程总结如下：

for i=2, 3, …, n

输入：第$i$帧图像I_i，上一帧目标位置L_i-1、目标尺度S_i-1和滤波器模板$\left({{\mathit{\boldsymbol{x}}_{i{\rm{-1}}}}, {\partial _{i{\rm{-1}}}}} \right)$。

输出：当前帧跟踪结果P_i和滤波器模板$\left({{\mathit{\boldsymbol{x}}_i}, {\partial _i}} \right)$。

1) 目标检测：

(1) 根据L_i-1和S_i-1从当前帧图像中提取出目标检测区域，并将其缩放至初始尺度S₀，记缩放后的图像块为$\mathit{\boldsymbol{x}}$；

(2) 使用$\mathit{\boldsymbol{x}}$和$\left({{\mathit{\boldsymbol{x}}_{i{\rm{-1}}}}, {\partial _{i{\rm{-1}}}}} \right)$，得到相应的$\mathit{\boldsymbol{Y}}$；

(3)$\mathit{\boldsymbol{Y}}$中最大值对应的位置即为跟踪模块输出的目标位置Tr_i，目标尺度与上一帧保持一致，即S_i=S_i-1；

(4) 根据Tr_i和S_i从I_i中提取跟踪结果P_i。

2) 模板更新:

(1) 将P_i输入最近邻分类器，得到其置信度Conf；

(2) 若Conf≥thr，则$β=0.02$，否则，$β=0$；

(3) 由式(7)和式(8)计算得到$\left({{\mathit{\boldsymbol{x}}_i}, {\partial _i}} \right)$。

end

2.2 检测区域动态调整的TLD检测模块

光流法是一种较为常用的前景检测方法，不需要背景区域的任何先验知识就能实现对运动目标的检测与跟踪。因此可以利用光流法在相邻两帧间对目标的位置进行粗定位，调整检测区域后再使用级联分类器实现对目标的精确定位。本文算法的具体检测步骤如下:

for $i$=2, 3, …, $n$

输入：第$i$帧图像$ {{\boldsymbol{I}}_i}$ ，上一帧目标位置${L_{i - 1}}$和尺度 ${S_{i - 1}}$。

输出：当前帧检测结果 ${P_i}$。

1) 根据${L_{i - 1}}$和${S_{i - 1}}$在第$i-1$帧中选取图像块，并将其均匀地划分为10×10的网格，取网格的中心为特征点;

2) 使用光流法计算所有特征点在当前帧中对应的位置，并分别对其行与列计算均值，得到的即为预测区域的中心位置$\mathop {{L_{i - 1}}}\limits^ \wedge $；

3) 以$\mathop {{L_{i - 1}}}\limits^ \wedge $为中心，在当前帧中取3倍${S_{i - 1}}$大小的图像块为预测区域，对预测区域仍采用滑动窗的方式生成检测子窗口；

4) 将检测子窗口输入三层级联分类器，最终通过的窗口即为检测模块的输出；

5) 若步骤4)没有窗口输出，则对输入图像进行全局搜索以生成检测子窗口，转步骤4)。

end

检测流程如图 3所示。由于跟踪模块使用了KCF跟踪器，改进算法的跟踪准确性有了较大的提高，因此在上一帧跟踪结果的基础上，使用光流法对目标位置进行预测有效地缓解了漂移现象，使得预测区域更加准确。实验结果表明，通过引入目标的运动信息有效地避免了视野中相似目标的干扰。

图 3 检测模块示意图

Fig. 3 Schematic diagram of the detecting module

如图 4所示，在视频序列David3和Freeman4中目标运动较快且视野中存在相似目标的干扰，由跟踪结果可以看出，当运动目标被遮挡后，引入光流法的改进算法有效地避免了相似目标的干扰，跟踪结果更加准确。

图 4 David3和Freeman4跟踪结果对比

Fig. 4 Tracking results comparison of the video David3 and Freeman4

((a) David3; (b) Freeman4)

3 实验结果与分析

为了验证改进算法(标记为Ours)的有效性，本文对其进行了两组实验，实验的软件环境为Matlab2013a、Visual Studio 2013，测试硬件环境为Intel(R)Core i7 2.8 GHz。

实验1 对本文算法在OTB2013中的50组视频序列和Temple Color128中的128组视频序列上进行测试，并与其他6种当前流行的算法进行了比较，这6种算法分别：CNT、DSST、DLT、Struck、KCF和TLD，所有对比算法在跟踪时采用的参数均设置为默认值。现分别从定性和定量两个方面对本文算法的优越性进行分析。

1) 定性分析。为了更直观地对算法进行定性分析，本文选取了8组具有代表性的视频用于展示，图 5为各组视频序列实验结果的部分截图。

图 5 跟踪算法的定性比较

Fig. 5 The comparisons of the tracking algorithms in different attributes

((a)Jogging-1;(b)Couple; (c)Skating1;(d)Lemming; (e)Liquor; (f)Girl; (g)Freeman4;(h)Freeman1)

(1) 遮挡。在视频序列Jogging-1、Girl中目标均存在被遮挡的现象。以Jogging-1为例进行分析，如图 5(a)所示，目标被完全遮挡后再次出现，只有本文算法和TLD能完全跟上重新出现在视野中的目标，且第93帧表明本文算法相对于TLD而言跟踪精度更高。

(2) 快速运动。在视频序列Couple、Skating1中目标均具有较快的移动速度，如图 5(b)和图 5(c)所示，只有本文算法能够对目标保持稳定的跟踪。

(3) 尺度变化。视频序列Couple、Skating1以及Liquor中目标均发生了尺度变化，以Liquor为例进行分析，如图 5(e)所示，从第397帧到429帧，目标的尺度逐渐变大，只有本文算法能够将完整的目标包含在跟踪框内。

(4) 目标移出视野。视频序列Lemming中目标有较长一段时间消失在视野中，如图 5(d)所示，可以看到只有本文算法和TLD能够及时重新捕捉到目标。

(5) 低分辨率。视频序列Freeman1和Freeman4中目标分辨率较小且目标运动较快，由图 5(g)(h)可以看出只有本文算法能够始终对目标保持稳定地跟踪。

2) 定量分析。定量分析时采用的两种定量评价方式为：精度曲线(precision plot)和成功率曲线(success plot)，评价时需要用到的两个指标是中心位置误差和重叠率，中心位置误差(CLE)指的是目标跟踪位置与目标真实位置之间的平均欧氏距离，重叠率(OR)指的是跟踪目标区域和实际目标区域的交集与并集之比。精度曲线描述的是中心位置误差小于给定阈值的视频帧数占总帧数的比值，成功率曲线描述的是重叠率大于给定阈值的视频帧数占总帧数的比值。在本文中精度曲线和成功率曲线的阈值分别设置为20像素和0.5。

图 6为在OTB2013测试平台上的精度曲线与成功率曲线对比图，由图 6可知，本文算法的精度和成功率在所有算法中均排名第一，分别达到了0.761和0.559，较TLD算法分别提高了15.7%和13.1%。

图 6 OTB2013测试平台上的精度曲线与成功率曲线

Fig. 6 Testing results in the OTB2013 test platform((a) precision rate curve; (b) success rate curve)

图 7为在Temple Color128实验数据上的精度曲线与成功率曲线对比图，由图 7可知，本文算法的精度和成功率在所有算法中均排名第一，分别达到了0.678和0.481，较TLD算法分别提高了18.6%和12.3%。

图 7 Temple Color128测试平台上的精度曲线与成功率曲线

Fig. 7 Testing results in the Temple Color128 test platform((a) precision rate curve; (b) success rate curve)

表 3和表 4分别描述了所有算法在OTB2013测试数据的11个属性上取得的精度与成功率。由表 3可知，本文算法在尺度变化(SV)、低分辨率(LR)和目标遮挡(OCC)、快速运动(FM)、运动模糊(MB)、出视野(OV)这6个属性上的跟踪精度均排名第一，在复杂背景(BC)这一属性上以0.002的劣势仅次于KCF。由表 4可知，本文算法在目标遮挡(OCC)、平面外旋转(OPR)、快速运动(FM)、出视野(OV)、运动模糊(MB)、背景复杂(BC)和低分辨率(LR)这7个属性上均排名第一，在尺度变化(SV)和平面内旋转(IPR)这两个属性上分别以0.005和0.011的劣势仅次于CNT和DSST。

表 3 不同属性上算法的精度对比
Table 3 The precision rate comparisons of the algorithms in different attributes

下载CSV

算法	IV	SV	OCC	DEF	OPR	FM	IPR	MB	OV	BC	LR
本文	0.686	0.734	0.769	0.671	0.728	0.723	0.701	0.651	0.753	0.744	0.685
DSST	0.747	0.730	0.748	0.675	0.749	0.567	0.782	0.603	0.474	0.684	0.534
KCF	0.733	0.672	0.769	0.766	0.744	0.600	0.724	0.650	0.603	0.746	0.379
CNT	0.564	0.654	0.637	0.658	0.654	0.479	0.650	0.507	0.428	0.634	0.557
Struck	0.573	0.630	0.577	0.515	0.607	0.607	0.618	0.551	0.467	0.569	0.545
TLD	0.543	0.607	0.556	0.487	0.592	0.556	0.587	0.518	0.572	0.420	0.349
DLT	0.543	0.578	0.582	0.564	0.567	0.487	0.542	0.453	0.347	0.474	0.396
注：最好和次好结果分别标为加粗和“__”。

表 4 不同属性上各算法的成功率对比
Table 4 The success rate comparisons of the algorithms in different attributes

下载CSV

算法	IV	SV	OCC	DEF	OPR	FM	MB	IPR	OV	BC	LR
本文	0.484	0.494	0.541	0.479	0.508	0.556	0.500	0.518	0.626	0.587	0.502
DSST	0.510	0.442	0.489	0.480	0.497	0.427	0.458	0.529	0.447	0.481	0.352
KCF	0.496	0.417	0.521	0.546	0.501	0.451	0.499	0.491	0.517	0.524	0.310
CNT	0.451	0.499	0.424	0.498	0.484	0.380	0.417	0.482	0.376	0.475	0.437
Struck	0.437	0.414	0.419	0.385	0.436	0.459	0.433	0.442	0.406	0.445	0.372
TLD	0.399	0.413	0.384	0.347	0.407	0.403	0.404	0.408	0.420	0.330	0.309
DLT	0.410	0.443	0.424	0.386	0.412	0.343	0.363	0.402	0.281	0.316	0.346
注：最好和次好结果分别标为加粗和“__”。

在跟踪速度上，本文算法在两个测试平台(共178组视频)上的平均帧率达到了27.92帧/s，取得了较好的实时跟踪效果。

实验2 在8组测试视频上将本文算法与已有的3种改进算法进行了对比，所采用的评价指标为平均中心位置误差CLE和平均重叠率OR。实验结果如表 5所示，由表 5中数据可知，在跟踪性能方面，本文算法只有在Girl和Fish这两个视频序列上分别低于TLD+和TLD_KCF，在其余6组视频上均能取得良好的跟踪效果。在跟踪速度方面，本文算法只有在Dog1和Liquor这两个视频上略微低于TLD+，在其余6组视频上均取得最快的跟踪速度，具有较好的实时性能。

表 5 4种算法在8组视频上的跟踪性能对比
Table 5 Tracking performance comparisons between 4 algorithms on 8 groups of video

下载CSV

测试视频	CLE/像素(OR/%)				速度/(帧/s)
测试视频	TLD_KFP	TLD+	TLD_KCF	Ours	TLD_KFP	TLD+	TLD_KCF	Ours
Boy	3.65(69.7)	3.41(74.3)	2.65(84.2)	2.54(85.1)	14.8	16.6	18.1	20.3
Dog1	4.01(63.4)	4.22(68.8)	3.51(75.4)	3.02(74.3)	18.6	23.3	19.6	21.8
Lemming	—	—	70.8(42.7)	66.5(44.5)	—	—	15.1	18.6
Dudek	—	14.6(65.1)	12.3(77.2)	11.8(78.1)	—	17.8	16.4	20.8
Girl	9.6(62.9)	8.9(70.1)	9.1(64.6)	9.4(68.8)	18.9	25.7	27.5	34.1
Fish	—	—	4.0(88.9)	4.6(85.4)	—	—	32.5	44.6
Liquor	—	6.2(78.1)	5.3(86.0)	4.2(88.2)	—	20.8	15.9	18.8
Car4	11.6(44.7)	11.2(45.2)	9.4(50.1)	10.8(53.2)	18.6	23.2	22.5	30.9
平均值	—	—	14.63(71.14)	14.01(72.2)	—	—	20.95	26.23
注：最好的结果分别标为加粗，“—”表示该算法跟踪失败，无输出。

4 结论

本文针对TLD算法在光照变化、运动模糊以及目标遮挡等复杂情况下跟踪失败的现象，在TLD框架下引入KCF跟踪器，并结合TLD中的检测模块增加了目标的尺度估计、改进了滤波器的模板更新策略，提高了KCF跟踪器的准确性。此外，本文基于光流法构造了一种检测区域动态调整的高效检测方法，检测区域的缩小在不影响跟踪精度的前提下提高了算法的实时性，同时也避免了相似目标的干扰。对改进算法在OTB2013与Temple Color128这两个测试平台上进行了实验，结果显示改进算法的跟踪精度与成功率在OTB2013测试平台上分别达到了0.761和0.559，在Temple Color128测试平台上分别达到了0.678和0.481，在所有测试视频上的平均帧率达到了27.92帧/s。对改进算法在数据集中各个属性上的定性分析表明，在发生尺度变化、快速运动、出视野以及低分辨率等复杂情况下，改进算法与其他6种当前流行的算法相比，具有更好的准确性和鲁棒性。为进一步提高算法的跟踪性能，在下一步的研究中将针对检测模块使用的特征以及相似性度量的方法来进行改进。

参考文献

[1] Smeulders A W M, Chu D M, Cucchiara R, et al. Visual tracking:an experimental survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1442–1468. [DOI:10.1109/TPAMI.2013.230]

[2] Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422. [DOI:10.1109/TPAMI.2011.239]

[3] Qin F, Wang R G, Liang Q X, et al. Improved TLD target tracking algorithm based on key feature points[J]. Computer Engineering and Application, 2016, 52(4): 181–187. [秦飞, 汪荣贵, 梁启香, 等. 基于关键特征点的改进TLD目标跟踪算法研究[J]. 计算机工程与应用, 2016, 52(4): 181–187. ] [DOI:10.3778/j.issn.1002-8331.1402-0365]

[4] Zhou X, Qian Q M, Ye Y Q, et al. Improved TLD visual target tracking algorithm[J]. Journal of Image and Graphics, 2013, 18(9): 1115–1123. [周鑫, 钱秋朦, 叶永强, 等. 改进后的TLD视频目标跟踪方法[J]. 中国图象图形学报, 2013, 18(9): 1115–1123. ] [DOI:10.11834/jig.20130908]

[5] Sun C J, Zhu S H, Liu J W. Fusing Kalman filter with TLD algorithm for target tracking[C]//Proceedings of the 34th Chinese Control Conference. Hangzhou, China: IEEE, 2015: 3736-3741.[DOI: 10.1109/ChiCC.2015.7260218]

[6] Lü N P, Cai X Y, Dong L, et al. Context object tracking algorithm based on TLD framework[J]. Video Engineering, 2015, 39(9): 6–9, 43. [吕枘蓬, 蔡肖芋, 董亮, 等. 基于TLD框架的上下文目标跟踪算法[J]. 电视技术, 2015, 39(9): 6–9, 43. ] [DOI:10.16280/j.videoe.2015.09.002]

[7] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with Kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. [DOI:10.1109/TPAMI.2014.2345390]

[8] Wu Y, Lim J, Yang M H. Online object tracking: a benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013, 9(4): 2411-2418.[DOI: 10.1109/CVPR.2013.312]

[9] Liang P P, Blasch E, Ling H B. Encoding color information for visual tracking:algorithms and benchmark[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5630–5644. [DOI:10.1109/TIP.2015.2482905]

[10] Zhang K H, Liu Q S, Wu Y, et al. Robust visual tracking via convolutional networks without training[J]. IEEE Transactions on Image Processing, 2016, 25(4): 1779–1792. [DOI:10.1109/TIP.2016.2531283]

[11] Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2013: 809-817.

[12] Zhang B H, Lu H C, Xiao Z Y, et al. Visual tracking via discriminative sparse similarity map[J]. IEEE Transactions on Image Processing, 2014, 23(4): 1872–1881. [DOI:10.1109/TIP.2014.2308414]

[13] Hare S, Golodetz S, Saffari A, et al. Struck:structured output tracking with kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096–2109. [DOI:10.1109/TPAMI.2015.2509974]

[14] Yang T Y, Peng G H. Fast algorithm for image matching based on NCC[J]. Modern Electronics Technique, 2012, 33(22): 107–109. [杨通钰, 彭国华. 基于NCC的图像匹配快速算法[J]. 现代电子技术, 2012, 33(22): 107–109. ] [DOI:10.3969/j.issn.1004-373X.2010.22.033]

[15] Kalal Z, Matas J, Mikolajczyk K. P-N Learning: bootstrapping binary classifiers by structural constraints[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 49-56.[ DOI:10.1109/CVPR.2010.5540231]

[16] Tu D W, Jiang J L. Improved algorithm for motion image analysis based on optical flow and its application[J]. Optics and Precision Engineering, 2011, 19(5): 1159–1164. [屠大维, 江济良. 改进的光流运动图像分析方法及其应用[J]. 光学精密工程, 2011, 19(5): 1159–1164. ] [DOI:10.3788/OPE.20111905.1159]

[17] Yang M H, Tao J H, Ye J T, et al. Robust outlier rejection from optical flow tracking points[J]. Journal of Computer-Aided Design & Computer Graphics, 2012, 24(1): 76–82. [杨明浩, 陶建华, 叶军涛, 等. 排除光流错误跟踪点的鲁棒方法[J]. 计算机辅助设计与图形学学报, 2012, 24(1): 76–82. ] [DOI:10.3969/j.issn.1003-9775.2012.01.013]

[18] Zhang B, Makram-Ebeid S, Prevost R, et al. Fast solver for some computational imaging problems:a regularized weighted least-squares approach[J]. Digital Signal Processing, 2014, 27: 107–118. [DOI:10.1016/j.dsp.2014.01.007]