发布时间: 2017-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.170004
2017 | Volume 22 | Number 11

图像理解和计算机视觉

前景划分下的双向寻优跟踪方法

刘万军¹, 刘大千², 费博雯³

1. 辽宁工程技术大学软件学院, 葫芦岛 125105;

2. 辽宁工程技术大学电子与信息工程学院, 葫芦岛 125105;

3. 辽宁工程技术大学工商管理学院, 葫芦岛 125105

收稿日期: 2017-01-18; 修回日期: 2017-08-17

基金项目: 国家自然科学基金项目（61172144）；辽宁省科技攻关计划项目（2012216026）

第一作者简介: 刘万军(1959-), 男, 教授, 博士生导师, 1991年于阜新矿业学院获电力传动及其自动化专业工学硕士学位, 主要研究方向为图像与视觉信息计算、运动目标检测与跟踪。E-mail:liuwanjun@lntu.edu.cn.

中图法分类号: TP301.6

文献标识码: A

文章编号: 1006-8961(2017)11-1553-12

摘要

目的基于目标模型匹配方法被广泛用于运动物体的检测与跟踪。针对传统模型匹配跟踪方法易受局部遮挡、复杂背景等因素影响的问题，提出一种前景划分下的双向寻优BOTFP（Bidirectional optimization tracking method under foreground partition）跟踪方法。方法首先，在首帧中人工圈定目标区域，提取目标区域的颜色、纹理特征，建立判别外观模型。然后，利用双向最优相似匹配方法进行目标检测，计算测试图像中的局部特征块与建立的外观模型之间的相似性，从而完成模型匹配过程。为了避免复杂背景和相似物干扰，提出一种前景划分方法约束匹配过程，得到更准确的匹配结果。最后，提出一种在线模型更新算法，引入了距离决策，判断是否发生误匹配，避免前景区域中相似物体的干扰，保证模型对目标的描述更加准确。结果本文算法与多种优秀的跟踪方法相比，可以达到相同甚至更高的跟踪精度，在Girl、Deer、Football、Lemming、Woman、Bolt、David1、David2、Singer1以及Basketball视频序列下的平均中心误差分别为7.43、14.72、8.17、13.61、24.35、7.89、11.27、13.44、12.18、7.79，跟踪重叠率分别为0.69、0.58、0.71、0.85、0.58、0.78、0.75、0.60、0.74、0.69。与同类方法L1APG（L1 tracker using accelerated proximal gradient approach），TLD（tracking-learning-detection），LOT（local orderless tracker）比较，平均跟踪重叠率提升了20%左右。结论实验结果表明，在前景区域中，利用目标的颜色特征和纹理特征进行双向最有相似匹配，使得本文算法在部分遮挡、目标形变、复杂背景、目标旋转等条件下具有跟踪准确、适应性强的特点。

关键词

判别外观模型; 前景划分; 双向寻优; 距离决策; 目标跟踪

Bidirectional optimization tracking method under foreground partition

Liu Wanjun¹, Liu Daqian², Fei Bowen³

1. School of Software, Liaoning Technical University, Huludao 125105, China;

2. School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China;

3. School of Business and Management, Liaoning Technical University, Huludao 125105, China

Supported by: National Natural Science Foundation of China(61172144)

Abstract

Objective Target tracking plays an important role in computer vision and is widely used in intelligent traffic, robot vision, and motion capture.In actual scenes, the accuracy of target tracking is low because of the influence of illumination change, target deformation, partial occlusion, and complex background.Thus, avoiding the influence of these factors and improving the tracking accuracy of the algorithm are major issues in target tracking.Target model matching methods are widely used in the detection and tracking of moving targets.In recent years, many experts and scholars have proposed several excellent target model matching tracking methods.Babenko et al.proposed a target tracking method based on online multiple instance learning, and this method selects the appropriate number of positive and negative templates around the target to track.The authors also constructed a discriminant model to achieve tracking and updated the appearance model of the target in real time.Wang et al.proposed a superpixel tracking method, which extracts the target model from the background and forms a dividing target model.The authors also calculated the possible position of the target at the subsequent moment by use of the maximum a posteriori estimation and the pixel confidence map.Mei et al.proposed a tracking method based on sparse representation classification, and this method is used to solve the problem of sparse approximation.The authors also determined the final tracking result on the basis of the size of the reconstructed residuals.Bao et al.proposed a real-time sparse representation tracking method, which uses multiple target models and sparse representation classification.This method can improve the tracking effectiveness while maintaining a high tracking accuracy.Kalal et al.proposed a tracking method based on the combination of tracking, learning, and detection.This method is robust to the local occlusion and the target deformation.Oron et al.proposed a locally orderless tracking, which divides the target into a plurality of superpixel blocks and tracks the targets in the subsequent frames by matching the pixels.At the same time, the authors used a particle filter to restrain target model matching and thus ensure robust tracking.Traditional model matching and tracking methods are easily affected by the local occlusion of other targets and the complex background.Thus, a novel tracking approach based on the bidirectional optimization tracking method under foreground partition (BOTFP) is proposed to solve these problems. Method In the first frame during manual delineation of the target area, the color and texture features of the target region are extracted and used to establish the discriminant appearance model.Subsequently, the similarity between the local features of the test images and the appearance models is calculated using the bidirectional optimization similarity matching method to complete the model matching process.This study presents a foreground partition method, which can obtain accurate matching results, to avoid the interference of complex background and similar targets.Finally, an online model updating algorithm is proposed, which introduces the distance decision method.This algorithm can be used to determine whether a false match occurs, avoid the interference of similar targets in the foreground region, and ensure that the model is an accurate description of the target. Result Compared with that of other excellent tracking algorithms, the proposed target tracking algorithm can achieve the same or even higher tracking accuracy.The average center errors in video sequences of Girl, Deer, Football, Lemming, Woman, Bolt, David1, David2, Singer1, and Basketball are 7.43, 14.72, 8.17, 13.61, 24.35, 7.89, 11.27, 13.44, 12.18, and 7.79, respectively.The tracking overlap ratios in video sequences of Girl, Deer, Football, Lemming, Woman, Bolt, David1, David2, Singer1, and Basketball are 0.69, 0.58, 0.71, 0.85, 0.58, 0.78, 0.75, 0.60, 0.74, and 0.69, respectively.The average running speeds (frame/s) in video sequences of Girl, Deer, Football, Lemming, Woman, Bolt, David1, David2, Singer1, and Basketball are 8.14, 7.32, 7.78, 6.69, 6.31, 7.57, 6.73, 7.17, 5.97, and 6.38, respectively.Compared with that of similar methods (e.g., L1APG, TLD, and LOT), the average tracking overlap rate of the proposed method is higher by approximately 20%. Conclusion Experimental results indicate that the use of the color and texture features of the target in conducting bidirectional optimization similarity matching of the foreground region ensures accurate tracking and strong adaptability of the algorithm under the conditions of partial occlusion, deformation, and complex background.The characteristics of BOTFP are as follows:1) A perfect appearance model is obtained when using the color and texture features of the target in conducting bidirectional optimization similarity matching of the foreground region.2) The foreground information of the image frames is estimated and evaluated, and the matching process is restricted to avoid the interference of background information.3) The result is robust when using the bidirectional optimization similarity matching method.4) In this study, an online model updating algorithm is proposed, which can be used to determine the accuracy of the model.

Key words

discriminant appearance model; foreground partition; bidirectional optimization; distance decision; target tracking

0 引言

目标跟踪是计算机视觉领域的重要组成部分，在智能交通^[1]、机器人视觉^[2]、人机交互^[3]等方面有着广泛的应用。而在实际场景中，由于光照变化、目标形变、遮挡、复杂背景等因素的影响，使得目标跟踪准确度不高^[4-5]。因而如何避免这些因素的影响，提高算法的跟踪准确率是目标跟踪领域的热点问题。

基于目标模型匹配的方法，由于其能够较好地表示目标外观特征的特点，而被广泛应用于目标跟踪中。近几年，国内外专家学者提出了许多优秀的目标模型匹配的跟踪方法。Babenko等人^[6]提出了基于在线多实例学习的目标跟踪方法OMIL (online multiple instance learning)，在被跟踪的目标周围选取适当的且包含一定数量的正负模板，构建判别式模型实现跟踪，并对目标的外观模型进行实时更新。Wang等人^[7]提出一种超像素跟踪方法SPT (super-pixel tracking)，从背景中提取目标模型，组成一个有区分度的目标模型，通过使用最大后验估计和超像素置信图计算在下一时刻目标可能出现的位置，外观模型会不断实时更新。Mei等人^[8]提出了一种基于稀疏表示分类的跟踪方法，将跟踪问题转化为求解稀疏逼近问题，并根据重构残差的大小确定最终的跟踪结果。但这种方法当跟踪过程中出现相似物体干扰时会产生一定的误差，且跟踪实时性较低。Bao等人^[9]对其进行改进，提出一种实时的稀疏表示跟踪方法L1APG (L1 tracker using accelerated proximal gradient)，利用多个目标模型结合稀疏表示分类，该方法在维持较高跟踪准确率的情况下，提高了跟踪的实效性。Kalal等人^[10]提出了一种TLD(tracking-learning-detection)目标跟踪方法，即将跟踪、学习和检测过程结合，该方法针对目标移动中出现的局部遮挡和目标形变具有较高的跟踪鲁棒性。Oron等人^[11]提出的局部无序跟踪方法LOT (locally orderless tracking)，将目标分为多个超像素块，通过超像素块的匹配对其后帧中的目标进行跟踪，同时利用粒子滤波对目标模型匹配加以约束，使得跟踪具有较高的鲁棒性。但这几种方法在模型匹配过程中，当目标被遮挡较多或背景复杂时，特别是背景中包含有非目标的运动物体与目标物体交叉遮挡，导致建模难度增加且目标容易淹没在背景中，使得跟踪方法易发生跟踪漂移，甚至丢失目标。

针对上述问题，提出了一种前景划分下的双向寻优跟踪BOTFP (bidirectional optimization tracking under foreground partition)方法。首先，在首帧中人工圈定目标，提取目标区域的颜色、纹理特征，建立判别外观模型。然后，利用双向最优相似匹配方法进行目标检测，计算测试图像中的局部特征块与建立的外观模型之间的相似性，从而完成模型匹配过程。为了避免复杂背景和相似物干扰，本文提出一种前景划分方法约束最优相似匹配过程，得到更准确的匹配结果。最后，提出一种在线模型更新算法，引入了距离决策，判断是否发生误匹配，避免前景区域中相似物体的干扰，保证模型对目标的描述更加准确。由于算法利用了目标的颜色特征和纹理特征的匹配结果，有效地解决了目标处于遮挡情况下的特征丢失、冗余特征过多的问题。

BOTFP的特点如下：

1) 利用目标的颜色特征和纹理特征进行匹配，使得外观模型对目标的描述更完善。

2) 对图像帧中的前景信息进行估计判别，约束相似匹配过程，只在前景区域匹配，避免背景信息的干扰，保证匹配到的目标更准确。

3) 采用双向最优相似匹配，即将匹配到的结果作为目标特征对外观模型特征进行校验，使得特征匹配结果更鲁棒。

4) 提出一种在线模型更新算法，引入距离决策判定，保证目标外观模型的准确性，使得模型对目标的描述更充分。

1 算法概述

前景划分下的双向寻优跟踪(BOTFP)分为建立目标的外观模型、双向最优相似匹配及模型更新3个过程。算法的流程如图 1所示。

图 1 BOTFP算法流程示例图

Fig. 1 The flow diagram of BOTFP algorithm

需要指出的是，在最优相似匹配过程中，利用具有相同颜色、纹理特征的特征进行最优相似性匹配，计算测试图像中的局部特征块与建立的外观模型之间的相似性。为了避免局部遮挡、复杂背景等影响，本文采用双向最优相似匹配，即将匹配到的结果作为目标特征对模型特征进行校验，完成目标检测过程。在模型的更新过程中，模型的更新分为两部分。首先引入距离决策，判断是否发生误匹配，若未发生，则每5帧更新一次外观模型，即利用当前帧的结果与先验目标模型进行加权融合实现模型的更新。若发生误匹配，利用每5帧间的平均距离估计匹配块位置，将其与满足条件的匹配块结合，确定最优目标区域，然后将最优结果与先验目标模型进行加权融合实现模型的更新。

2 前景划分下的双向寻优跟踪方法

2.1 前景区域的划分

为了避免复杂背景的干扰，特别是当背景中出现与目标相似物体时，跟踪算法易丢失目标，因此利用目标运动具有连续性，提出一种前景划分方法约束最优相似匹配过程，得到更准确的匹配结果。为了建立可以区分背景的目标模型，在首帧选定目标区域后，以圈定目标区域的中心为中心点，对角线长度为目标区域对角线的1.5倍进行扩展，得到模型匹配的前景区域。

在模型匹配过程中，定义一个二进制变量${A_p}$表示测试图像像素是否属于前景区域，即

$ {A_p} = \left\{ \begin{array}{l} 1\;\;p \in {\rm{前景区域}}\\ {\rm{0}}\;p \in {\rm{背景区域}} \end{array} \right. $

(1)

式中，$p$表示图像像素点，前景区域划分的具体效果如图 2所示，可以看出，引入前景区域的优点为：

图 2 划分前景区域效果图

Fig. 2 Effect of dividing of the foreground region((a) test image; (b) division of the foreground region; (c) extraction of foreground region)

1) 利用目标模型进行匹配，有时背景像素块也可能与目标模型匹配(当背景中出现与目标相似物体时)，将要匹配的图像帧分为前景和背景进行判定，排除背景干扰，达到局部特征最优匹配，避免跟踪过程中的漂移现象。

2) 由于目标运动的连续性，区分前景/背景信息是十分必要的，只在前景区域进行最优相似匹配，缩小匹配搜索范围，降低了跟踪算法的时间复杂度，提高跟踪的时效性。

2.2 最优相似匹配

为了克服非刚性形变、复杂背景等外界因素的影响，提出一种最优相似性匹配OSM (optimal similarity matching)模型，计算测试图像中的局部特征块与建立的外观模型之间的相似性。

首先将目标外观模型分为$N$个$l \times l$不重叠的局部子块。同时，为了满足局部特征块匹配，将测试图像的前景区域也分为$M$个$l \times l$不重叠子块。设匹配模型的特征像素集为$\mathit{\boldsymbol{p}} = \left\{ {{p_i}} \right\}i \in 1, \cdots, n, {p_i}$表示局部特征块的中心像素点，测试图像的特征像素集为$\mathit{\boldsymbol{Q}} = \left\{ {{q_j}} \right\}j \in 1, \cdots, n, {q_j}$表示待测局部特征块的中心像素点。OSM模型的核心是计算特征像素集之间的最优匹配对OMP (optimal matching pair)。若${p_i}$与${q_j}$为最优匹配点($\left\{ {{p_i} \in \mathit{\boldsymbol{P}}, {q_j} \in \mathit{\boldsymbol{Q}}} \right\}$})，即说明${q_j}$是${p_i}$的最近邻匹配点。最优匹配对的计算为

$ \begin{array}{l} \;\;\;\;\;\;\;\;\;OMP\left( {{p_i}, {q_j}, \mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right){\rm{ = }}\\ \left\{ \begin{array}{l} 1\;\;\;NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right) = {q_j} \cap NN\left( {{q_j}, \mathit{\boldsymbol{P}}} \right) = {p_i}\\ 0\;\;NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right) \ne {q_j} \cup NN\left( {{q_j}, \mathit{\boldsymbol{P}}} \right) \ne {p_i} \end{array} \right. \end{array} $

(2)

式中，$NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right)$表示最近邻匹配距离，本文采用双向匹配的原则，即不仅需要计算像素${{p_i}}$与像素集合$\mathit{\boldsymbol{Q}}$之间的最近邻匹配距离$NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right)$，而且还需计算像素${q_j}$与像素集合$\mathit{\boldsymbol{P}}$之间的最近邻匹配距离$NN\left( {{q_j}, \mathit{\boldsymbol{P}}} \right)$，即在双向匹配过程中，对于两个集合$\mathit{\boldsymbol{P}}$和$\mathit{\boldsymbol{Q}}$首先选取集合$\mathit{\boldsymbol{P}}$中的像素${{p_i}}$计算与集合$\mathit{\boldsymbol{Q}}$之间的最近邻匹配距离$NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right)$，对得到的最优匹配像素${q_j}$再次计算其与集合$\mathit{\boldsymbol{P}}$之间的最近邻匹配距离$NN\left( {{q_j}, \mathit{\boldsymbol{P}}} \right)$，若此时在集合$\mathit{\boldsymbol{P}}$中与像素${q_j}$的最佳匹配像素仍为${{p_i}}$，则令$OMP\left( {{p_i}, {q_j}, \mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)$为1，若最佳匹配像素不是${{p_i}}$，则令$OMP\left( {{p_i}, {q_j}, \mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)$为0。$NN\left( {{p_i}, \mathit{\boldsymbol{Q}}} \right){\rm{ = }}\mathop {\arg \;\min }\limits_{q \in \mathit{\boldsymbol{Q}}} \;d\left( {{p_i}, q} \right), d\left( {{p_i}, q} \right)$表示特征像素点${{p_i}}$与特征像素集$\mathit{\boldsymbol{Q}}$之间的匹配距离，其具体表示为

$ d\left( {p, q} \right) = \left\| {{p^A}-{q^A}} \right\|_2^2 + \left\| {{p^L}-{q^L}} \right\|_2^2 $

(3)

式中，上角$A$表示颜色特征(RGB)，上角$L$表示纹理特征(LBP)。由此可以计算模型的特征像素集$\mathit{\boldsymbol{P}}$与测试图像的特征像素集$\mathit{\boldsymbol{Q}}$之间的最优相似性匹配(OSM)为

$ {\rm{OSM = }}\frac{1}{{\min \left\{ {n, 1/n} \right\}}}\sum\limits_{i = 1}^n {\sum\limits_{j = 1}^m {OMP\left( {{p_i}, {q_j}, \mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)} } $

(4)

特征像素集$\mathit{\boldsymbol{P}}$与特征像素集$\mathit{\boldsymbol{Q}}$之间的最优相似性匹配的期望为

$ {E_{{\rm{OSM}}}} = \frac{1}{{\min \left\{ {n, 1/n} \right\}}}\sum\limits_{i = 1}^n {\sum\limits_{j = 1}^m {{E_{{\rm{OMP}}}}} } $

(5)

由式(5) 可知，计算模型的最优相似性匹配的期望即为计算最优匹配点的期望。已知最优匹配点的期望为

$ {{E}_{\rm{OMP}}}=\iint\limits_{P, Q}{OM{{P}_{i, j}}\left( \mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}} \right)p\left( \mathit{\boldsymbol{P}} \right)p\left( \mathit{\boldsymbol{Q}} \right)\rm{d}\mathit{\boldsymbol{P}}\rm{d}\mathit{\boldsymbol{Q}}} $

(6)

其为关于特征集$\mathit{\boldsymbol{P}}$、$\mathit{\boldsymbol{Q}}$的多元积分，$p\left( \cdot \right)$表示概率分布函数，假设每个像素点之间是相互独立的，对式(6) 进行简化可得

$ {E_{{\rm{OMP}}}} = \int\limits_{{p_1}} { \cdots \int\limits_{{p_n}} {\int\limits_{{q_1}} { \cdots \int\limits_{{q_m}} {OM{P_{i,j}}\left( {\mathit{\boldsymbol{P}},\mathit{\boldsymbol{Q}}} \right)\prod\limits_{x = 1}^n {{f_P}\left( {{p_x}} \right)} \times } } } } \\ \;\;\;\;\prod\limits_{y = 1}^m {{f_Q}\left( {{q_y}} \right){\rm{d}}{p_1} \cdots {\rm{d}}{p_n}{\rm{d}}{p_1} \cdots {\rm{d}}{p_m}} $

(7)

式中，$OM{P_{i, j}}\left( {\mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)$是一个二进制函数，当${{p_i}}$与${q_j}$为相互最近邻匹配对时其值为1，反之则为0。由此$OM{P_{i, j}}\left( {\mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)$也可以用距离的度量函数表示，即

$ \begin{array}{l} OM{P_{i, j}}\left( {\mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right){\rm{ = }}\prod\limits_{x = 1, x \ne i}^n {D\left[{d\left( {{p_x}, {q_j}} \right) > d\left( {{p_i}, {q_j}} \right)} \right]} \times \\ \;\;\;\;\;\;\;\;\;\;\;\;\prod\limits_{y = 1, y \ne j}^n {D\left[{d\left( {{p_i}, {q_y}} \right) > d\left( {{p_i}, {q_j}} \right)} \right]} \end{array} $

(8)

式中，$\mathit{\boldsymbol{D}}$为基于距离度量的二进制函数，假设给定${{p_i}}$、${q_j}$以及${p_x}$的分布函数，则$OM{P_{i, j}}\left( {\mathit{\boldsymbol{P}}, \mathit{\boldsymbol{Q}}} \right)$可进一步简化，记

$ {C_{{p_x}}} = \int\limits_{- \infty }^{ + \infty } {D\left[{d\left( {{p_x}, {q_j}} \right) > d\left( {{p_i}, {q_j}} \right)} \right]{f_P}\left( {{p_x}} \right){\rm{d}}{p_x}} $

(9)

式中，$d({{p_i}}，{q_j})$已由公式(3) 给出，继续对上式推导，可得

$ {C_{{p_x}}} = \int\limits_{- \infty }^{{\rm{ + }}\infty } {\left[\begin{array}{l} D\left[{\left( {{p_x} < q_j^-} \right) \cup \left( {{p_x} > q_j^ + } \right)} \right] \times \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{f_p}\left( {{p_x}} \right) \end{array} \right]} {\rm{d}}\;{p_x} $

(10)

式中，${q^-} = q-d\left( {q, p} \right), {q^ + } = q + d\left( {p, q} \right)$。由式(10) 可知${q^-} < {q^ + }$，则${C_{{p_x}}}$可由特征像素集$\mathit{\boldsymbol{P}}$的累积分布(CDF)${F_p}\left( x \right)$表示，即

$ {C_{{p_x}}} = {F_p}\left( {q_j^-} \right) + 1-{F_P}\left( {q_j^ + } \right) $

(11)

同理可以推导出测试图像特征像素集$\mathit{\boldsymbol{Q}}$的累积分布(CDF)${F_Q}\left( y \right)$表示，即

$ {C_{{q_y}}} = {F_Q}\left( {p_i^-} \right) + 1-{F_Q}\left( {p_i^ + } \right) $

(12)

由此可以推导出式(6) 为

$ \begin{align} & {{E}_{\rm{OMP}}}=\iint\limits_{p, q}{\left[\left( {{E}_{Q}}\left( {{p}^{-}} \right)+1-{{F}_{Q}}\left( {{p}^{+}} \right) \right)^{m-1} \right.} \times \\ & \left. {{\left( {{F}_{p}}\left( {{q}^{-}} \right)+1-{{F}_{P}}\left( {{q}^{+}} \right) \right)}^{n-1}}{{f}_{P}}\left( p \right){{f}_{Q}}\left( q \right) \right]{\rm{d}}p{\rm{d}}q \\ \end{align} $

(13)

式中，${F_P}\left( x \right)$、${F_Q}\left( y \right)$表示像素集$\mathit{\boldsymbol{P}}$、像素集$\mathit{\boldsymbol{Q}}$的累积分布，${p^-}$、${p^ + }$、${q^-}$、${q^ + }$分别为

$ \begin{array}{l} {p^-} = p-d\left( {p, q} \right), {p^ + } = p + d\left( {p, q} \right)\\ {q^-} = q - d\left( {q, p} \right), {q^ + } = q + d\left( {p, q} \right) \end{array} $

(14)

由此可以计算出模型的特征像素集$\mathit{\boldsymbol{P}}$与测试特征像素集$\mathit{\boldsymbol{Q}}$之间的最优相似性匹配(OSM)。OSM在Bolt序列中的检测效果如图 3所示。

图 3 最优匹配的检测效果

Fig. 3 The detection effect of optimal matching((a) test image; (b) bidirectional check matching; (c)optimal matching configuration; (d)detection result)

由图 3(c) (d)可以看出，采用双向最优相似匹配能够有效地避免局部遮挡、复杂背景等外界因素的影响，从而较准确地完成匹配检测过程。利用OSM作模型匹配主要有如下两个特点：

1) OSM模型匹配只取决于特征像素集之间的近邻最优匹配对OMP，未匹配到的局部区域被视为背景区域，避免局部遮挡、复杂背景等外界因素的影响。

2) OSM模型采用双向最优相似匹配(如式(2))，无需事先获得先验形状、数据模型约束匹配，避免目标形变、旋转变化等因素的影响，同时也提高了算法的匹配速率。

2.3 目标模型的更新

本文算法中利用的目标模型分为两部分：距离决策判定和目标外观模型更新。

2.3.1 距离决策判定

当前景区域中有相似物体干扰时(如局部遮挡时)，所匹配到的最佳区域往往不能准确地显示被跟踪目标(误识问题)，因此算法分两步操作避免相似物体的干扰。

为避免发生误匹配，设定在目标连续运动情况下相邻帧各个最佳匹配块的中心距离为

$ {d_i} = \sqrt {{{\left( {{x_i}-{{x'}_i}} \right)}^2} + {{\left( {{y_i}-{{y'}_i}} \right)}^2}} $

(15)

式中，${d_i}$为相邻帧最佳匹配区域间的距离，${x_i}$和${y_i}$分别为第$t$帧目标区域中心点的横纵坐标，${x'_i}$和${y'_i}$分别为第$t-1$帧目标区域中心点的横纵坐标。然后取平均，即$\bar d = \frac{1}{h}\sum\limits_{i = 1}^h {{d_i}\left( {h = 5} \right)} $。设定度量阈值区间为$\overline d = \frac{1}{5}\overline d $，对相邻帧间的最佳匹配块进行判定，若其距离${d_i}$不在该区间范围内，则自动舍弃，只保留在该区间内的最佳匹配块。在圈定目标时，利用$ $估计被舍弃的匹配块位置，将其与保留的匹配块相结合确定最优目标区域。

匹配最优块位置信息示意如图 4所示。左图表示在$t-1$帧中圈定的最优前景区域。假定由6个最佳匹配块组成，第$t$帧中匹配块位置如右图所示。最佳匹配块1不在度量阈值区间$\left( {\bar d \pm \Delta d} \right)$内，则被舍弃，改用${\bar d}$重新估计匹配块1的位置从而得到第$t$帧的最优目标区域。

图 4 匹配块位置信息示意

Fig. 4 Diagram of matching block location information

2.3.2 目标外观模型更新

在动态场景中，目标的特征分布随着目标的移动、形变而不断变化。根据距离决策判断是否发生误匹配，若未发生，则每5帧更新一次外观模型，即利用式(16) 将当前帧的结果与先验目标模型进行加权融合实现模型的更新。

$ {\mathit{\boldsymbol{P}}_{{\rm{update}}}} = a\mathit{\boldsymbol{P}} + \left( {1-a} \right){\mathit{\boldsymbol{Q}}_t} $

(16)

式中，${\mathit{\boldsymbol{P}}_{{\rm{update}}}}$为更新后目标区域的特征分布，$\mathit{\boldsymbol{P}}$为目标外观模型的特征分布，${\mathit{\boldsymbol{Q}}_t}$为第$t$帧目标区域的特征分布，$a$为更新权重，在实验中取$a$=0.8。若发生误匹配，则利用上次5帧间的平均距离估计匹配块位置，将其与满足条件的匹配块结合，从而确定最优目标区域，然后利用式(16) 将最优结果与先验目标模型进行加权融合实现模型的更新。

3 实验与结果分析

计算机环境：CPU为Intel Core i7，内存为16 GB。实验平台：Matlab2012b。为了验证本文算法的性能，Wu等人^[12]提出的目标跟踪评测标准上选取10个具有代表性的图像序列并与8种较为流行的种跟踪算法进行对比实验。10个图像序列分别是Girl、Deer、Lemming、Woman、Bolt、CarDark、David1、David2、Singer1以及Basketball，这些视频序列基本涵盖了光照变化、部分遮挡、目标形变、复杂背景等影响因素。8种跟踪算法分别为L1APG(L1 tracker using accelerated proximal gradient approach)^[9]，TLD(tracking-learning-detection) ^[10]，LOT (local orderless tracker)^[11]，VTD(visual tracking decomposition)^[13]，SCM (sparsity-based collaborative model)^[14]，FRAG (fragments-based tracking)^[15]，KCF (kernelized correlation filters)^[16]，DLT(deep learning tracking)^[17]。本文采用中心误差^[18-20]、跟踪重叠率^[21-22]以及算法跟踪效率3个评估标准来评价BOTFP跟踪方法。实验中用到的图像帧序列信息见表 1。

表 1 实验图像序列信息
Table 1 The information of the test image sequence

下载CSV

图像序列	局部遮挡	目标形变	复杂背景	目标旋转
Girl	√			√
Deer			√	√
Football	√		√	√
Lemming	√		√
Woman	√	√
Bolt	√	√	√
David1		√		√
David2				√
Singer1	√		√
Basketball	√	√	√	√

3.1 实验参数的选择

根据目标运动的连续性，在每个测试图像帧中扩展前景区域(以上帧圈定目标区域的中心为中心点，对角线长度为目标区域对角线的1.5倍进行扩展)。在局部子区域内进行双向寻优匹配。由于跟踪实时性在整个算法的评价中是一个重要指标，本文在保证跟踪成功率的前提下，减少双向寻优匹配的时间来满足算法实时性要求，所以取目标区域的1.5倍进行扩展。

将训练图像、测试图像分为$l \times l$相互不重叠的特征子块。若特征子块过大会导致特征选取不足，而特征子块过小又无法保证跟踪的时效性。借助Oron等人^{[11, 23]}的经验，考虑到算法的检测准确性、时效性要求，实验中取$l$=6。

3.2 实验结果

实验所比较的9种跟踪方法在10个图像序列上平均中心误差见表 2，9种跟踪方法在10个图像序列上的跟踪重叠率见表 3。从表 2和表 3中的实验统计结果可以看出，在大部分实验序列中，BOTFP算法优于同类算法(L1APG、TLD和LOT)。不仅如此，与其他5种跟踪算法相比，BOTFP算法也具有较大优势，而且在所有图像序列上的平均中心误差及跟踪重叠率均是优秀的。这表明本文BOTFP算法是合理有效的，达到甚至超过了当前主流算法的跟踪效果。

表 2 不同跟踪方法的平均中心误差
Table 2 Average center error of different tracking methods

下载CSV

图像序列	FRAG	SCM	VTD	L1APG	TLD	LOT	KCF	DLT	BOTFP
Girl	24.27	3.47	8.64	25.51	7.66	20.28	7.67	15.73	7.43
Deer	87.64	37.62	7.93	38.76	25.34	29.44	14.86	13.58	14.72
Football	15.38	9.64	13.57	11.31	13.82	7.15	17.23	12.32	8.17
Lemming	112.63	75.43	60.73	142.28	32.68	14.48	89.57	65.87	13.61
Woman	118.28	22.24	107.69	115.57	137.47	118.41	97.73	21.76	24.35
Bolt	73.62	9.37	49.17	132.48	141.24	17.62	13.26	75.28	7.89
David1	84.41	4.38	48.95	5.76	8.92	37.84	15.78	8.71	11.27
David2	67.51	6.72	3.51	3.23	6.73	3.97	7.36	65.75	13.44
Singer1	57.83	17.85	12.53	49.84	22.17	16.64	63.43	6.12	12.18
Basketball	18.42	116.24	9.48	84.47	95.46	127.83	124.43	20.36	7.79
平均	66.00	30.30	32.22	60.92	49.15	39.37	45.13	30.55	12.10
注：加粗显示的为最优结果。

表 3 不同跟踪方法的跟踪重叠率
Table 3 Tracking overlap ratio of different tracking methods

下载CSV

图像序列	FRAG	SCM	VTD	L1APG	TLD	LOT	KCF	DLT	BOTFP
Girl	0.41	0.74	0.69	0.39	0.56	0.43	0.72	0.58	0.69
Deer	0.11	0.47	0.63	0.45	0.41	0.50	0.54	0.54	0.58
Football	0.63	0.71	0.65	0.68	0.65	0.73	0.61	0.66	0.71
Lemming	0.43	0.53	0.57	0.39	0.77	0.83	0.47	0.56	0.85
Woman	0.14	0.59	0.15	0.15	0.12	0.14	0.21	0.62	0.58
Bolt	0.47	0.76	0.55	0.23	0.16	0.72	0.72	0.48	0.78
David1	0.23	0.82	0.53	0.80	0.79	0.55	0.69	0.79	0.75
David2	0.21	0.69	0.73	0.74	0.69	0.73	0.69	0.24	0.60
Singer1	0.55	0.73	0.74	0.57	0.69	0.72	0.32	0.79	0.74
Basketball	0.63	0.17	0.67	0.25	0.21	0.14	0.15	0.62	0.69
平均	0.38	0.62	0.59	0.47	0.51	0.55	0.51	0.59	0.70

3.3 算法准确性分析

1) 局部遮挡。由表 2、表 3可以看出，在处理目标遮挡的问题上BOTFP方法的优势较为明显，均达到甚至超过对比的8种跟踪方法(FRAG、SCM、VTD、L1APG、TLD、LOT、KCF以及DLT)，BOTFP方法在7组局部遮挡视频图像序列的平均中心误差为11.63，平均重叠率达到0.72。9种方法的跟踪结果如图 5所示。BOTFP方法首先建立前景判别区域，并利用具有相同颜色、纹理特征的特征块进行最优相似性匹配，计算测试图像中的局部特征块与建立的目标外观模型之间的相似性，采用双向最优相似匹配，对目标模型特征进行双向校验，避免在目标检测过程中出现误识问题，从而较准确匹配目标。

图 5 9种跟踪方法在10组图像序列中的跟踪结果

Fig. 5 Tracking results of the nine methods in the ten image sequences((a)Girl; (b)Deer; (c) Football; (d) Lemming; (e) Woman; (f) Bolt; (g) David1;(h) David2;(i) Singer1;(j) Basketball)

2) 目标形变。在针对目标形变的实验中，选取4组视频图像序列分别对9种跟踪方法做测试，SCM、DLT及BOTFP方法的跟踪效果较为理想，而BOTFP方法的跟踪效果最佳，特别是针对目标匹配的跟踪方法(L1APG、TLD及LOT)，本文算法的优势较为明显，4组视频图像序列的平均中心误差为12.83，平均重叠率达到0.7。9种方法的跟踪结果如图 5所示。本文提出的BOTFP方法利用具有相同颜色、纹理特征的特征块进行双向最优相似性匹配，计算测试图像中的局部特征块($l \times l$子块)与建立的目标模型($l \times l$子块)之间的相似性，使得在目标发生形变时基于模型匹配的跟踪能够更准确地描述目标。需要指出的是，若在目标做最有相似匹配时，将$l$的值调小，即将$l$取[3,5](本文统一取$l$为6)，算法的跟踪重叠率会有所提高，但为了保证实验的统一性、公平性，并没有将此结果作为参考。

3) 复杂背景。针对复杂背景问题，选取6组视频图像序列分别对9种跟踪方法进行测试，在这组测试中，KCF、DLT、TLD、LOT及BOTFP方法的跟踪效果较为理想，而BOTFP方法的跟踪效果最佳，6组视频图像序列的平均中心误差为10.73，平均重叠率达到0.73。9种检测算法的检测结果如图 5所示。与同类跟踪方法相比，BOTFP方法构建由若干特征块组成的目标外观模型，并利用上帧结果建立前景区域约束匹配过程。然后利用具有相同颜色、纹理特征的特征块进行最优相似性匹配，计算测试图像中的局部特征块与建立的目标模型之间的相似性，并采用双向最优相似匹配，对模型特征进行校验，当图像中背景信息较为复杂时，算法依然能够准确匹配目标。

4) 目标旋转。采用6组视频图像序列测试当目标旋转时9种算法的检测情况。由表 2、表 3可以看出，几种跟踪方法均出现不同程度的跟踪漂移，提出BOTFP方法的优势较为明显，6组图像序列的平均中心误差为10.47，平均重叠率达到0.67。9种方法的跟踪结果如图 5所示。BOTFP方法建立由若干特征块组成的判别外观模型，并将具有相同颜色、纹理特征的特征块进行双向校验的相似性匹配，当目标发生旋转、部分特征丢失时，采用计算测试图像中的局部特征块($l \times l$子块)与建立的目标模型($l \times l$子块)之间的相似性，使得算法对目标的描述更准确。

3.4 跟踪效率分析

为了说明跟踪方法的实时性，测试9种跟踪方法在处理10组图像帧序列时每帧所需要的平均时间，即不同方法运行平均速度的结果如表 4所示。

表 4 不同跟踪方法的平均运行速度
Table 4 Average running speeds of different tracking methods

下载CSV

/(帧/s)
图像序列	FRAG	SCM	VTD	L1APG	TLD	LOT	KCF	DLT	BOTFP
Girl	6.32	0.65	2.74	1.76	26.84	0.79	173.66	16.13	8.14
Deer	4.78	0.97	2.67	1.64	27.17	0.83	171.49	15.68	7.32
Football	5.68	0.61	3.14	1.61	26.63	0.93	173.92	15.76	7.78
Lemming	6.27	0.69	2.77	1.75	26.70	0.71	172.34	14.97	6.69
Woman	6.41	0.57	2.16	1.53	26.32	0.66	171.71	15.32	6.31
Bolt	3.97	0.46	2.21	1.63	24.74	0.61	171.23	14.51	7.57
David1	4.48	0.53	3.47	2.03	26.47	0.67	173.74	15.43	6.73
David2	5.25	0.48	2.68	1.44	25.89	0.73	174.25	15.86	7.17
Singer1	4.96	0.52	2.91	1.73	26.31	0.71	171.57	14.27	5.97
Basketball	6.23	0.89	2.34	2.04	24.53	0.62	171.84	14.54	6.38

通过本文方法与近几年目标跟踪方法的平均运行速度相比不难看出，虽然TLD、KCF以及DLT方法的实时性较高，但其跟踪成功率偏低。而BOTFP方法的平均速度基本都在7帧/s左右，跟踪重叠率较高。相比较于同类方法，在平均运行速度相当的前提下，跟踪方法的平均中心误差较小、跟踪重叠率较高。

4 结论

提出了一种前景划分下的双向寻优匹配跟踪方法。首先利用目标的颜色特征和纹理特征的演化结果进行匹配，使得外观模型对目标的描述更完善。然后对图像帧中的前景信息进行估计判别，约束相似匹配过程，避免背景信息的干扰，同时采用双向最优相似匹配，即将匹配到的结果作为目标特征对外观模型特征进行校验，使得特征匹配结果更鲁棒。最后提出一种在线模型更新算法，引入距离决策判定，保证目标外观模型的准确性。

通过对10组视频图像序列进行实验对比分析，实验结果表明，BOTFP方法在部分遮挡、目标形变、复杂背景等条件下，均可以得到较高的跟踪准确率和较好的鲁棒性。

本文虽引入了颜色、纹理特征作双向最优相似匹配，但对于目标发生全局遮挡时检测准确率不高。在未来的工作中，将针对此问题进行分析研究。

志谢: 在此，向对本文工作给予支持、帮助的Tali Dekel博士和Shaul Oron博士及专家学者表示衷心的感谢。

参考文献

[1] Vatavu A, Danescu R, Nedevschi S. Stereovision-based multiple object tracking in traffic scenarios using free-form obstacle delimiters and particle filters[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(1): 498–511. [DOI:10.1109/TITS.2014.2366248]

[2] Lian F, Han C Z, Liu W F, et al. Tracking partly resolvable group targets using SMC-PHDF[J]. Acta Automatica Sinica, 2010, 36(5): 731–741. [连峰, 韩崇昭, 刘伟峰, 等. 基于SMC-PHDF的部分可分辨的群目标跟踪算法[J]. 自动化学报, 2010, 36(5): 731–741. ] [DOI:10.3724/SP.J.1004.2010.00731]

[3] Khatoonabadi S H, Bajic I V. Video object tracking in the compressed domain using spatio-temporal Markov random fields[J]. IEEE Transactions on Image Processing, 2013, 22(1): 300–313. [DOI:10.1109/TIP.2012.2214049]

[4] Zhang H L, Hu S Q, Yang G S. Video object tracking based on appearance models learning[J]. Journal of Computer Research and Development, 2015, 52(1): 177–190. [张焕龙, 胡士强, 杨国胜. 基于外观模型学习的视频目标跟踪方法综述[J]. 计算机研究与发展, 2015, 52(1): 177–190. ] [DOI:10.7544/issn.1000-1239.2015.20130995]

[5] Yang W, Fu Y W, Long J Q, et al. The FISST-based target tracking techniques:a survey[J]. Acta Electronica Sinica, 2012, 40(7): 1440–1448. [杨威, 付耀文, 龙建乾, 等. 基于有限集统计学理论的目标跟踪技术研究综述[J]. 电子学报, 2012, 40(7): 1440–1448. ] [DOI:10.3969/j.issn.0372-2112.2012.07.025]

[6] Babenko B, Yang M H, Belongie S.Visual tracking with online multiple instance learning[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Miami, Florida, USA:IEEE, 2009:983-990.[DOI:10.1109/CVPR.2009.5206737] http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5206737

[7] Wang S, Lu H C, Yang F, et al.Superpixel tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Barcelona, Spain:IEEE, 2011:1323-1330.[DOI:10.1109/ICCV.2011.6126385]

[8] Mei X, Ling H B. Robust visual tracking and vehicle classification via sparse representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2259–2272. [DOI:10.1109/TPAMI.2011.66]

[9] Bao C L, Wu Y, Ling H B, Ji H.Real time robust L1 tracker using accelerated proximal gradient approach[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Providence, RI:IEEE, 2012:1830-1837.[DOI:10.1109/CVPR.2012.6247881]

[10] Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422. [DOI:10.1109/TPAMI.2011.239]

[11] Oron S, Bar-Hillel A, Levi D, et al.Locally orderless tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Providence RI:IEEE, 2012:1940-1947.[DOI:10.1109/CVPR.2012.6247895]

[12] Wu Y, Lim J, Yang M H.Online object tracking:a benchmark[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Portland:IEEE, 2013:2411-2418.[DOI:10.1109/CVPR.2013.312]

[13] Kwon J, Lee K M.Visual tracking decomposition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.San Francisco, CA:IEEE, 2010:1269-1276.[DOI:10.1109/CVPR.2010.5539821]

[14] Zhong W, Lu H C, Yang M H.Robust object tracking via sparsity-based collaborative model[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Providence, RI:IEEE, 2012:1838-1845.[DOI:10.1109/CVPR.2012.6247882].

[15] Adam A, Rivlin E, Shimshoni I.Robust fragments-based tracking using the integral histogram[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York, USA:IEEE, 2006:798-805.[DOI:10.1109/CVPR.2006.256]

[16] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. [DOI:10.1109/TPAMI.2014.2345390]

[17] Wang N Y, Yeung D Y.Learning a deep compact image representation for visual tracking[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Lake Tahoe, Nevada, USA:Curran Associates Inc., 2013:809-817. https://www.researchgate.net/publication/286963565_Learning_a_deep_compact_image_representation_for_visual_tracking?ev=auth_pub

[18] Wang M H, Liang Y, Liu F M, et al. Object tracking based on component-level appearance model[J]. Journal of Software, 2015, 26(10): 2733–2747. [王美华, 梁云, 刘福明, 等. 部件级表观模型的目标跟踪方法[J]. 软件学报, 2015, 26(10): 2733–2747. ] [DOI:10.13328/j.cnki.jos.004737]

[19] Liu W J, Liu D Q, Fei B W, et al. Geometric active contour tracking based on locally model matching[J]. Journal of Image and Graphics, 2015, 20(5): 652–663. [刘万军, 刘大千, 费博雯, 等. 基于局部模型匹配的几何活动轮廓跟踪[J]. 中国图象图形学报, 2015, 20(5): 652–663. ] [DOI:10.11834/jig.20150508]

[20] Yang B, Lin G Y, Zhang W G, et al. Robust object tracking incorporating residual unscented particle filter and discriminative sparse representation[J]. Journal of Image and Graphics, 2014, 19(5): 730–738. [杨彪, 林国余, 张为公, 等. 融合残差Unscented粒子滤波和区别性稀疏表示的鲁棒目标跟踪[J]. 中国图象图形学报, 2014, 19(5): 730–738. ] [DOI:10.11834/jig.20140511]

[21] Wang F, Fang S. Visual tracking based on the discriminative dictionary and weighted local features[J]. Journal of Image and Graphics, 2014, 19(9): 1316–1323. [王飞, 房胜. 加权局部特征结合判别式字典的目标跟踪[J]. 中国图象图形学报, 2014, 19(9): 1316–1323. ] [DOI:10.11834/jig.20140908]

[22] Luo H L, Zhong B K, Kong F S. Object tracking algorithm by combining the predicted target position with compressive tracking[J]. Journal of Image and Graphics, 2014, 19(6): 875–885. [罗会兰, 钟宝康, 孔繁胜. 结合目标预测位置的压缩跟踪[J]. 中国图象图形学报, 2014, 19(6): 875–885. ] [DOI:10.11834/jig.20140608]

[23] Dekel T, Oron S, Avidan S, et al.Best Buddies Similarity for Robust Template Matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston, MA, USA:IEEE, 2015:2021-2029.[DOI:10.1109/CVPR.2015.7298813]