发布时间: 2017-06-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.160624
2017 | Volume 22 | Number 6

图像分析和识别

判别稀疏表示鲁棒快速视觉跟踪

刘文琢¹, 袁广林², 薛模根¹

1. 陆军军官学院偏振光成像探测技术安徽省重点实验室, 合肥 230031;

2. 陆军军官学院十一系, 合肥 230031

收稿日期: 2016-12-21; 修回日期: 2017-03-01

基金项目: 国家自然科学基金项目（61175035，61379105）

第一作者简介: 刘文琢(1991-), 男, 陆军军官学院信息与通信工程专业硕士研究生, 主要研究方向为图像处理和计算机视觉。E-mail:13945049233@163.com

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2017)06-0815-09

摘要

目的 L1跟踪对局部遮挡具有良好的鲁棒性，但存在易产生模型漂移和计算速度慢的问题。针对这两个问题，该文提出了一种基于判别稀疏表示的视觉跟踪方法。方法考虑到背景和遮挡信息的干扰，提出了一种判别稀疏表示模型，并基于块坐标优化原理，采用学习迭代收缩阈值算法和软阈值操作设计出了表示模型的快速求解算法。结果在8组图像序列中，该文方法与现有的4种经典跟踪方法分别在鲁棒性和稀疏表示的计算时间方面进行了比较。在鲁棒性的定性和定量比较实验中，该文方法不仅表现出了对跟踪过程中的多种干扰因素具有良好的适应能力，而且在位置误差阈值从0~50像素的变化过程中，其精度曲线均优于实验中的其他方法；在稀疏表示的计算时间方面，在采用大小为16×16和32×32的模板进行跟踪时，该文算法的时间消耗分别为0.152 s和0.257 s，其时效性明显优于实验中的其他方法。结论与经典的跟踪方法相比，该文方法能够在克服遮挡、背景干扰和外观改变等诸多不良因素的同时，实现快速目标跟踪。由于该文方法不仅具有较优的稀疏表示计算速度，而且能够克服多种影响跟踪鲁棒性的干扰因素，因此可以将其应用于视频监控和体育竞技等实际场合。

关键词

机器视觉; 目标跟踪; 判别稀疏表示; 前馈神经网络; 粒子滤波

Robust and fast visual tracking using discriminative sparse representation

Liu Wenzhuo¹, Yuan Guanglin², Xue Mogen¹

1. Anhui Province Key Laboratory of Polarization Imaging Detection Technology, Army Officer Academy of PLA, Hefei 230031, China;

2. Eleventh Department, Army Officer Academy of PLA, Hefei 230031, China

Supported by: National Natural Science Foundation of China(61175035, 61379105)

Abstract

Objective Visual tracking is an important field in computer vision and is applied in various domains. Although numerous visual tracking methods have been developed in the past several decades, many challenging issues (e.g., occlusions, illumination changes, and background clutter) still affect the tracking performance of these methods. Inspired by sparse representation applied in face recognition, the L1 tracker based on sparse representation was proposed by Mei et al. The L1 tracker has good robustness toward partial occlusion but is prone to model drift and time consuming. To address these two problems, this study proposes a tracking method based on discriminative sparse representation. Method Considering the interference of background and occlusion information, a discriminative sparse representation model is proposed. The proposed model uses the sparseness of the coefficients associated with target and background templates so that the candidate targets can be represented accurately. The sparseness of the coefficients associated with trivial templates makes the proposed tracker robust to partial occlusion. By using the coefficients associated with trivial and target templates, the observation likelihood model, which is adopted in this study, eliminates the interference of the background information and leads to improved tracking results. A fast sparse representation algorithm is designed to increase the tracking speed and used to calculate the coefficients of the discriminative sparse representation model. At the first stage, the proposed algorithm uses the learned iterative shrinkage and thresholding algorithm (LISTA) to calculate the coefficients associated with target templates. At the second stage, the proposed algorithm uses the soft shrinkage operator to calculate the coefficients associated with trivial templates. Based on block coordinate optimization theory, the above optimization procedure is iteratively used to obtain excellent sparse representation coefficients. Under the particle filter framework, the tracking task is accomplished with the proposed model and the fast solution algorithm. Result The proposed tracker is tested on eight sequences, namely, FaceOcc1, FaceOcc2, David3, Dudek, Singer1, Car4, Jumping, and CarDark. The strength of the proposed tracker is analyzed by comparing the proposed tracker with L1, L1APG (L1 tracker based on accelerated proximal gradient), SP, and L1L2 trackers. The issues in these sequences include occlusion, in-plane rotation, out-plane rotation, target appearance variations, illumination changes, camera motion, scale changes, motion blur, and background clutter. The selected state-of-the-art trackers, which are used to demonstrate the effectiveness of the proposed tracker, are all based on sparse representation. L1APG tracker, SP tracker, and L1L2 tracker are improvements of the L1 tracker. For robustness evaluation, qualitative and quantitative experiments are conducted to evaluate the proposed tracker. The qualitative comparison shows that the proposed tracker overcomes various challenging issues during tracking. For the quantitative comparison, a precision plot is used to analyze the performance of the proposed tracker. With the location threshold varying from 0 to 50 pixels, the precision plot of the proposed tracker is better than that of the others in the eight sequences. In terms of computing speed, the proposed algorithm can significantly reduce the computational cost of sparse representation. The time for solving an image patch is 0.152 s and 0.257 s for patches with resolutions of 16×16 and 32×32, respectively. The proposed tracker consumes less time than the others in the experiment. Compared with other trackers that do not adopt background templates to construct the sparse representation model, the proposed tracker produces better tracking results. Conclusion The proposed tracker is more robust to occlusion and other challenges, such as background clutter and appearance changes, and has better tracking speed than the state-of-the-art trackers. Thus, trackers based on the proposed method can be used for many engineering applications, such as video surveillance, medical diagnosis, and athletics. The adopted method, which is used to update the target templates, has low time consumption but may sometimes bring some interference information to the trackers. Thus, a more effective method of updating target templates needs to be developed in the future.

Key words

machine vision; target tracking; discriminative sparse representation; feed-forward neural network; particle filter

0 引言

视觉跟踪是对视频图像序列内的目标状态进行连续估计的过程，其广泛应用于车辆导航，医疗诊断和竞技体育等方面。尽管视觉跟踪在过去的几十年得到了飞速的发展，但实际应用过程中的诸多干扰因素(如光照条件改变、目标的快速运动和尺度改变等)也时刻挑战着跟踪方法的性能。因此，克服不良条件的影响，进一步拓展视觉跟踪的应用范围成为了未来研究的热点。

受到稀疏表示解决人脸遮挡问题^[1]的启发，Mei等人^[2]首次提出了基于稀疏表示的视觉跟踪方法，即L1跟踪。该方法利用小模板表示系数的稀疏性，较好地保证了跟踪方法对局部遮挡的鲁棒性。但L1跟踪也存在易产生模型漂移和跟踪速度较慢的问题。为了抑制模型漂移问题，Wang等人^[3]提出了基于在线鲁棒非负字典学习(online robust non-negative dictionary learning)的视觉跟踪方法，通过提出的字典学习算法，使得到的模板不仅可以较好地捕捉目标的实时变化，而且对遮挡和背景干扰也具有较强的适应能力。Xing等人^[4]提出了一种多生存期字典学习模型(multi-lifespan dictionary model)，并采用在线字典学习^[5](ODL)算法对模板进行更新，有效地缓解了模型漂移问题，提高了跟踪的鲁棒性。Bozorgtabar等人^[6]提出了一种在判别多任务稀疏学习框架下的视觉跟踪方法，该方法使用适应性字典表示目标，并采用条件随机域^[7](CRF)方法排除背景干扰，达到了提高跟踪鲁棒性的目的。为了提高跟踪速度，Mei等人^{[8, 9]}提出使用有界粒子重采样(BPR)进行视觉跟踪，该方法通过两阶段概率采样策略排除了大量低观测似然的候选采样，不仅减小了计算消耗，而且提高了跟踪的鲁棒性。同时，文献[9]利用加速临近梯度(APG)算法求解L1最小问题，进一步提高了跟踪方法的速度。Zhang等人^[10]提出使用多任务稀疏学习(multi-task sparse learning)进行视觉跟踪，并结合APG算法计算表示系数，有效地提高了跟踪的鲁棒性和速度。

综上所述，经过国内外学者的共同努力，L1跟踪的性能虽然得到了一定的提升，但仍然存在以下不足：提出的字典学习算法虽然可以较好地缓解模型漂移现象，但其更新过程普遍存在计算消耗过大的问题；同时，在跟踪速度方面，经过改进的L1跟踪与工程实践的要求还有一定的差距。据此，为了能够在抑制模型漂移问题的同时，实现跟踪速度的提高，本文提出了一种判别稀疏表示模型。该模型通过在模板集中引入背景模板来表示候选目标，不仅达到了缓解背景信息干扰的目的，而且背景模板的更新过程简单、快捷，不会增加计算负担。在此基础上，基于块坐标优化原理^[11]，结合学习迭代收缩阈值算法^[12](LISTA)和软阈值操作^[13]设计出了表示模型的快速求解算法。在粒子滤波框架下，结合提出的表示模型和算法，实现了鲁棒快速的视觉跟踪。在多组图像序列中，将本文方法与经典的跟踪方法进行比较分析，验证了其鲁棒性和时效性。

1 判别稀疏表示模型

L1跟踪是以粒子滤波为跟踪框架，通过稀疏表示模型得到候选目标的特征表示系数，以此为基础，并利用目标模板对候选目标的重构误差构造观测似然，从而实现目标跟踪的方法。L1跟踪的主要优势在于，稀疏表示模型表示系数项的稀疏性(L1范数)约束保证了L1跟踪对局部遮挡具有良好的鲁棒性，但同时也带来了以下两点不足：一方面，L1跟踪采用的是产生式外观模型，即没有考虑到背景信息对候选目标表示的干扰，但在实际的跟踪过程中，目标区域可能会包含一定的背景像素，而这样的干扰信息势必会影响跟踪精度，甚至可能造成模型漂移问题；另一方面，表示系数的稀疏性约束也给其求解过程带来了很大的计算消耗，因此严重影响了L1跟踪的速度。

针对以上不足，本文提出判别稀疏表示模型，即

$\begin{array}{l} \mathop {{\rm{min}}}\limits_{{a_{{\rm{dis}}}}{\rm{,}}e} \frac{1}{2}\left\| {\mathit{\boldsymbol{y}} - {\mathit{\boldsymbol{T}}_{{\rm{dis}}}}{\mathit{\boldsymbol{a}}_{{\rm{dis}}}} - \mathit{\boldsymbol{Ie}}} \right\|_2^2 + \\ \quad \quad {\lambda _1}{\left\| {{\mathit{\boldsymbol{a}}_{{\rm{dis}}}}} \right\|_1} + {\lambda _2}{\left\| \mathit{\boldsymbol{e}} \right\|_1} \end{array}$

(1)

式中，$\mathit{\boldsymbol{y}} \in {{\bf{R}}^D}$为候选目标，${\mathit{\boldsymbol{T}}_{{\rm{dis}}}} = \left[ {{\mathit{\boldsymbol{T}}_f},{\mathit{\boldsymbol{T}}_b}} \right]$为模板集，${\mathit{\boldsymbol{T}}_f} = \left[ {{\mathit{\boldsymbol{t}}_{f,1}},{\mathit{\boldsymbol{t}}_{f,2}}, \cdots ,{\mathit{\boldsymbol{t}}_{f,M,}}} \right] \in {{\bf{R}}^{D \times M}}$为目标模板，${\mathit{\boldsymbol{T}}_b} = \left[ {{\mathit{\boldsymbol{t}}_{b,1}},{\mathit{\boldsymbol{t}}_{b,2}}, \cdots ,{\mathit{\boldsymbol{t}}_{b,B,}}} \right] \in {{\bf{R}}^{D \times B}}$为在目标周围的环形区域内进行随机采样得到的背景模板，单位矩阵$\mathit{\boldsymbol{I}} \in {{\bf{R}}^{D \times D}}$为小模板。${\mathit{\boldsymbol{a}}_{{\rm{dis}}}} = {\left[ {{\mathit{\boldsymbol{a}}_f},{\mathit{\boldsymbol{a}}_b}} \right]^{\rm{T}}}$为模板集的表示系数，${\mathit{\boldsymbol{a}}_f} \in {{\bf{R}}^M}$为目标模板表示系数，${\mathit{\boldsymbol{a}}_b} \in {{\bf{R}}^B}$为背景模板表示系数，$\mathit{\boldsymbol{e}} \in {{\bf{R}}^D}$为小模板表示系数，${\left\| \cdot \right\|_1}$和${\left\| \cdot \right\|_2}$分别表示L1和L2范数，${\lambda _1}$和${\lambda _2}$为正则化参量。

该模型的优点是：1) 该模型中小模板表示系数的稀疏性约束，使得提出的方法继承了L1跟踪对局部遮挡具有鲁棒性的优点；2) 使用背景模板可以较好地捕捉被跟踪目标区域中的背景信息，这样可以增加目标模板表示候选目标的精确性。以上两个方面的共同作用，确保了跟踪方法的鲁棒性。在此基础上，基于块坐标优化原理，利用LISTA和软阈值操作设计出了表示模型的快速求解算法，从而实现了跟踪速度的提高。

2 快速稀疏表示算法

为了提高稀疏表示的计算速度，基于块坐标优化原理，本文结合LISTA和软阈值操作，提出了表示模型式(1) 的快速求解算法，从而得到表示系数${\mathit{\boldsymbol{a}}_{\mathit{dis}}}$和$\mathit{\boldsymbol{e}}$。

首先，假设在已知小模板表示系数$\mathit{\boldsymbol{e}}$的情况下，式(1) 转化为

$\underset{{{a}_{\rm{dis}}}}{\mathop{\rm{min}}}\,\frac{1}{2}\left\| \left( \mathit{\boldsymbol{y}}-\mathit{\boldsymbol{Ie}} \right)-{{\mathit{\boldsymbol{T}}}_{\rm{dis}}}{{\mathit{\boldsymbol{a}}}_{\rm{dis}}} \right\|_{2}^{2}+{{\lambda }_{1}}\left\| {{\mathit{\boldsymbol{a}}}_{\rm{dis}}} \right\|{{~}_{1}}$

(2)

式(2) 的求解过程为L1最小问题，该过程计算量较大。针对此问题，受文献[12]的启发，本文采用基于前馈神经网络^[14](feed-forward neural network)的LISTA求解模板集${{\mathit{\boldsymbol{T}}}_{\rm{dis}}}$的近似表示系数${{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}}$，具体过程如图 1所示。

图 1 LISTA流程图

Fig. 1 The flow chart of the LISTA

通过图 1中的网络进行有限次迭代(采用2次迭代)，可以快速、精确地近似式(2) 的稀疏表示结果。在图 1中，$\mathit{\boldsymbol{Y}}=\mathit{\boldsymbol{y}}-\mathit{\boldsymbol{Ie}},\mathit{\boldsymbol{W}}=\frac{1}{L}\mathit{\boldsymbol{T}}_{\rm{dis}}^{\rm{T}},\mathit{\boldsymbol{S}}=\mathit{\boldsymbol{I}}-\frac{1}{L}\times \mathit{\boldsymbol{T}}_{\rm{dis}}^{\rm{T}}{{\mathit{\boldsymbol{T}}}_{\rm{dis}}}$，$L$为矩阵$\mathit{\boldsymbol{T}}_{\rm{dis}}^{\rm{T}}{{\mathit{\boldsymbol{T}}}_{\rm{dis}}}$的最大特征值，$\theta $为用于软阈值操作${S_\theta }\left( l \right) = {\mathop{\rm sgn}} \left( l \right) \cdot \max \left\{ {\left| l \right| - \theta ,0} \right\}$的收缩阈值$\theta ={{\lambda }_{1}}/l$。LISTA网络的每次迭代可以表示为

${{\mathit{\boldsymbol{a}}}_{k+1}}={{S}_{\theta }}(\mathit{\boldsymbol{WY}}+\mathit{\boldsymbol{S}}{{\mathit{\boldsymbol{a}}}_{k}})$

(3)

然后，在已知近似表示系数${{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}}$的情况下，式(1) 可以转化为

$\underset{e}{\mathop{\rm{min}}}\,\frac{1}{2}\left\| \mathit{\boldsymbol{Ie}}-(\mathit{\boldsymbol{y}}-{{\mathit{\boldsymbol{T}}}_{\rm{dis}}}{{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}}) \right\|_{2}^{2}+{{\lambda }_{2}}{{\left\| \mathit{\boldsymbol{e}} \right\|}_{1}}$

(4)

由于式(4) 为凸优化问题，由文献[13]可知，可以通过软阈值操作$\mathit{\boldsymbol{e}}={{S}_{{{\lambda }_{2}}}}\left( \mathit{\boldsymbol{y}}-{{\mathit{\boldsymbol{T}}}_{\rm{dis}}}{{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}} \right)$对其进行求解。

综上，基于块坐标优化原理，并利用LISTA和软阈值操作设计出的快速稀疏表示算法如算法1所示。

算法1 快速稀疏表示算法

1) 输入：候选目标$\mathit{\boldsymbol{y}}$，模板集${{\mathit{\boldsymbol{T}}}_{\rm{dis}}}$，收缩阈值$\theta $，正则化参量${{\lambda }_{2}}$；

2) 初始化${{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}}=\bf{0},\mathit{\boldsymbol{e}}=\bf{0}$；

3) 利用LISTA对式(2) 进行求解，得到模板集表示系数${{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}}$；

4) 利用软阈值操作对式(4) 进行求解，得到小模板表示系数$\mathit{\boldsymbol{e}}$；

5) 重复步骤3和4，直到达到截止条件；

6) 输出：${{{\mathit{\boldsymbol{\hat{a}}}}}_{\rm{dis}}},\mathit{\boldsymbol{e}}$。

3 目标跟踪

使用粒子滤波作为实现目标跟踪的基本框架，粒子滤波包括预测和更新两部分。已知1到$t-1$时刻的所有图像观测为${{\mathit{\boldsymbol{y}}}_{1:t-1}}=\left\{ {{\mathit{\boldsymbol{y}}}_{1}},{{\mathit{\boldsymbol{y}}}_{2}},\cdots ,{{\mathit{\boldsymbol{y}}}_{t-1}} \right\}$，则目标状态的先验概率可以表示为

$\begin{array}{l} \quad \quad \quad \quad p({\mathit{\boldsymbol{x}}_t}|\mathit{\boldsymbol{y}}{_{1:t - 1}}) = \\ \int {p\left( {{\rm{ }}{\mathit{\boldsymbol{x}}_t}|{\mathit{\boldsymbol{x}}_{t - 1}}} \right)p({\mathit{\boldsymbol{x}}_{t - 1}}\mathit{\boldsymbol{|}}{\mathit{\boldsymbol{y}}_{1:t - 1}}){\rm{d}}{\mathit{\boldsymbol{x}}_{t - 1}}} \end{array}$

(5)

式中，${{\mathit{\boldsymbol{x}}}_{t}}=({{x}_{t}},{{y}_{t}},{{w}_{t}},{{h}_{t}},{{\theta }_{t}})$表示$t$时刻的目标状态，$({{x}_{t}},{{y}_{t}})$表示目标中心位置的横纵坐标，$({{w}_{t}},{{h}_{t}})$表示目标的宽度和高度，${{\theta }_{t}}$为目标的倾斜角。$p({{\mathit{\boldsymbol{x}}}_{t}}|{{\mathit{\boldsymbol{x}}}_{t-1}})$为状态转移模型，其具体表示形式为

$p\left( {{\mathit{\boldsymbol{x}}_t}|{\mathit{\boldsymbol{x}}_{t - 1}}} \right) = N({\mathit{\boldsymbol{x}}_t};{\mathit{\boldsymbol{x}}_{t - 1}},\mathit{\boldsymbol{ \boldsymbol{\varPsi} }})$

(6)

式中，$\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}$为对角矩阵，其对角线上的元素表示相应状态变量的方差。在$t$时刻，得到图像观测${\mathit{\boldsymbol{y}}_t}$，则此时后验概率可以表示为

$p\left( {{\mathit{\boldsymbol{x}}_t}|{\mathit{\boldsymbol{y}}_{1:t}}} \right) \propto p\left( {{\mathit{\boldsymbol{y}}_t}|{\mathit{\boldsymbol{x}}_t}} \right)p({\mathit{\boldsymbol{x}}_t}|{\mathit{\boldsymbol{y}}_{1:t - 1}})$

(7)

式中，$p\left( {{\mathit{\boldsymbol{y}}_t}|{\mathit{\boldsymbol{x}}_t}} \right)$为观测似然模型。对于任意候选采样$\mathit{\boldsymbol{x}}_t^i$，其图像观测$\mathit{\boldsymbol{y}}_t^i$的稀疏表示结果的计算公式为

$\begin{align} & \underset{a_{\rm{dis,}t}^{i},e_{t}^{i}}{\mathop{\rm{min}}}\,\frac{1}{2}\left\| \mathit{\boldsymbol{y}}_{t}^{i}-{{\mathit{\boldsymbol{T}}}_{\rm{dis,}t}}\mathit{\boldsymbol{a}}_{\rm{dis,}t}^{i}-\mathit{\boldsymbol{Ie}}_{t}^{i} \right\|_{2}^{2}+ \\ & \quad \quad \quad {{\lambda }_{1}}{{\left\| \mathit{\boldsymbol{a}}_{\rm{dis,}t}^{i} \right\|}_{1}}+{{\lambda }_{2}}~{{\left\| \mathit{\boldsymbol{e}}_{t}^{i} \right\|}_{1}} \\ \end{align}$

(8)

然后利用目标模板表示系数$\mathit{\boldsymbol{a}}_{f,t}^{i}$和小模板表示系数$e_{t}^{i}$计算观测似然，即

$\begin{array}{l} \quad \quad \quad \quad p\left( {\mathit{\boldsymbol{y}}_t^i|\mathit{\boldsymbol{x}}_t^i} \right) = \frac{1}{\Gamma } \times \\ {\rm{exp}}\left( { - \alpha \left( {\frac{1}{2}\left\| {\mathit{\boldsymbol{y}}_t^i - {\mathit{\boldsymbol{T}}_{f,t}}\mathit{\boldsymbol{a}}_{f,t}^i - \mathit{\boldsymbol{Ie}}_t^i} \right\|_2^2 + {\lambda _2}{{\left\| {\mathit{\boldsymbol{e}}_t^i} \right\|}_1}} \right)} \right) \end{array}$

(9)

式中，$\alpha $为高斯核尺度参量，Γ为归一化因子。

以粒子滤波为基础，本文方法如算法2。

算法2 基于判别稀疏表示的跟踪方法

1) 输入：初始化的目标状态${\mathit{\boldsymbol{x}}_1}$和模板集${\mathit{\boldsymbol{T}}_{{\rm{dis,}}1}} = \left[ {{\mathit{\boldsymbol{T}}_{f,1}},{\mathit{\boldsymbol{T}}_{b,1}}} \right]$；

2) 利用状态转移模型对候选目标$\mathit{\boldsymbol{x}}_t^i$进行采样，并得到相应的图像观测$\mathit{\boldsymbol{y}}_t^i,i = 1,2, \cdots ,\mathit{N}$；

3) 利用算法1计算每个采样粒子的表示系数，并通过式(9) 得到相应的观测似然$p\left( {\mathit{\boldsymbol{y}}_t^i|\mathit{\boldsymbol{x}}_t^i} \right),i = 1,2, \cdots ,\mathit{N}$；

4) 利用最大后验概率准则，估计$t$时刻的最优状态${{\mathit{\boldsymbol{\hat x}}}_t}$；

5) 利用文献[2]中的方法更新目标模板${{\mathit{\boldsymbol{T}}_{f,t}}}$，利用背景采样作为背景模板${\mathit{\boldsymbol{T}}_{b,t}}$；

6) 如果未到最后一帧，则转至步骤2)；如果已到最后一帧，则跟踪结束；

7) 输出：目标状态${{\mathit{\boldsymbol{\hat x}}}_t}$。

4 实验结果与分析

使用Matlab R2013a为开发工具，在Intel(R) Core(TM) CPU主频1.70 GHz，内存4 GB的台式机上实现了提出的方法。使用FaceOcc1、FaceOcc2、David3、Dudek、Singer1、Car4、Jumping和CarDark序列^[15]验证跟踪方法克服不良干扰的能力，并与L1跟踪^[2]、L1APG跟踪^[9]、SP跟踪^[16]和L1L2跟踪^[17]进行了比较。本文方法的采样粒子数为600，神经网络的迭代次数为2次，正则化参量${\lambda _1}$为0.5，${\lambda _2}$为0.1，目标模板数为16，背景模板数为8，单个模板大小为32×32，每5帧图像更新一次背景模板。

4.1 实验结果

在图 2(a)中，本文方法和SP跟踪表现出了对遮挡干扰具有较好的鲁棒性。在图 2(b)中，存在遮挡干扰、in-plane旋转和外观改变等不良因素，从实验结果可以看出，本文方法表现出了较好的跟踪鲁棒性。在图 2(c)中，遮挡干扰和out-plane旋转使得L1跟踪和L1APG跟踪出现了跟踪失败的情况，而SP跟踪、L1L2跟踪和本文方法则较好地克服了两种干扰因素的影响。在图 2(d)中，主要存在out-plane旋转、目标的外观改变和遮挡干扰的影响，L1APG跟踪和本文方法的跟踪结果较优。在图 2(e)中，主要存在光照条件改变和相机运动两种干扰因素，除L1跟踪外，实验中的其他方法均良好地完成了跟踪任务。在图 2(f)中，由于光照条件的频繁变化，使得L1跟踪和SP跟踪出现了跟踪失败的情况，而L1APG跟踪、L1L2跟踪和本文方法则较好地克服了光照条件和目标尺度改变的影响。在图 2(g)中，SP跟踪、L1L2跟踪和本文方法表现出了对目标的快速运动具有较好的适应能力，而L1跟踪和L1APG跟踪则出现了跟踪失败的情况。在图 2(h)中，由于out-plane旋转和背景干扰的同时作用使得L1跟踪出现了跟踪失败的情况，而本文方法则较好地克服了两种不良因素的干扰。

图 2 跟踪结果比较

Fig. 2 Comparison of tracking results

((a)FaceOcc1;(b)FaceOcc2;(c)David3;(d)Dudek; (e)Singer1;(f)Car4;(g)Jumping; (h)CarDark)

本文使用精度^[15]作为跟踪方法定量评价的依据。设跟踪结果对应的窗口和目标在图像序列中的实际位置对应的窗口分别为${r_t}$和${r_a}$，中心位置误差可以定义为窗口${r_t}$和${r_a}$中心之间的欧氏距离。则精度可以定义为中心位置误差小于规定位置误差阈值的图像帧数与图像序列总帧数的比值，本文设定的位置误差阈值为0到50个像素。由图 3可以看出，本文方法在精度方面优于其他4种方法。

图 3 精度比较

Fig. 3 Comparison of the precision

((a)FaceOcc1;(b)FaceOcc2;(c)David3;(d)Dudek; (e)Singer1;(f)Car4;(g)Jumping; (h)CarDark)

4.2 计算复杂度分析

假设$\mathit{\boldsymbol{T}} \in {{\bf{R}}^{D \times M}}$为L1跟踪、L1APG跟踪和L1L2跟踪的目标模板，$\mathit{\boldsymbol{U}} \in {{\bf{R}}^{D \times M}}$为SP跟踪的特征基，${T_{{\rm{dis}}}} = \left[ {{\mathit{\boldsymbol{T}}_f},{\mathit{\boldsymbol{T}}_b}} \right] \in {{\bf{R}}^{D \times \left( {M + B} \right)}}$为本文方法的模板集，$N$为候选采样粒子数，${k_1}$、${k_2}$、${k_3}$和${k_4}$分别为L1APG跟踪、SP跟踪、L1L2跟踪和本文算法1的迭代次数，${k_5}$为神经网络的迭代次数。为了保证实验的公平性，采取只比较跟踪方法稀疏表示的计算复杂度和相应的时间消耗，并保证实验环境和参量设置相同，即$D$=16×16和$D$=32×32，$N$=600，$M$=16，$B$=8。每种跟踪方法的计算复杂度和时间消耗如表 1所示。从稀疏表示的计算时间可以看出，本文方法具有更快的速度。

表 1 计算复杂度与时间消耗比较
Table 1 Comparison of computational complexity and computation time

下载CSV

跟踪方法	计算复杂度	时间/s
跟踪方法	计算复杂度	16×16	32×32
L1	${\rm{O}}\left( {N\left( {{D^2} + DM} \right)} \right)$	28.419	333.650
L1APG	${\rm{O}}\left( {N{k_1}DM} \right)$	0.283	3.509
SP	${\rm{O}}\left( {N{k_2}DM} \right)$	0.277	0.580
L1L2	${\rm{O}}\left( {N{k_3}DM} \right)$	0.167	0.401
本文	${\rm{O}}\left( {N{k_4}{k_5}{{\left( {M + B} \right)}^2}} \right)$	0.152	0.257

4.3 讨论

从以上实验可以看出，本文方法体现出了较优的鲁棒性和较快的速度，其主要原因在于：

1) 鲁棒性方面。首先，本文方法继承了L1跟踪中小模板表示系数具有稀疏性的优点，保证了对局部遮挡具有良好的鲁棒性；其次，考虑到背景干扰的影响，本文方法在构造外观模型时使用背景采样作为模板来表示候选目标区域的背景信息；最后，利用小模板表示系数和排除了背景干扰的目标模板表示系数，构造精确性较高的观测似然，进一步提高了跟踪的鲁棒性。图 4给出了在Dudek图像序列中，是否引入背景模板对跟踪精度造成的影响，从实验结果可以看出，本文方法的跟踪鲁棒性更优。

图 4 Dudek图像背景模板的使用对精度造成的影响

Fig. 4 The influence of using background template to the precision of Dudek

2) 速度方面。由于L1最小化的计算量较大，所以造成了跟踪速度较慢的问题。针对此问题，本文提出快速稀疏表示算法分别求解模板集${{\mathit{\boldsymbol{T}}_{{\rm{dis}}}}}$和小模板$I$的表示系数。首先，假设在已知小模板表示系数$\mathit{\boldsymbol{e}}$的情况下，使用LISTA求解${{\mathit{\boldsymbol{T}}_{{\rm{dis}}}}}$的近似稀疏表示系数。由于该求解过程的迭代次数少，且每次迭代的计算量较小，故有效地提高了稀疏表示的计算速度。然后，在已知模板集的近似稀疏表示系数${{{\mathit{\boldsymbol{\hat a}}}_{{\rm{dis}}}}}$的情况下，式(1) 转化为式(4)，此时可以采用软阈值操作快速求解小模板表示系数。最后，基于块坐标优化原理，结合LISTA和软阈值操作，设计出了快速稀疏表示算法。

在以上的比较方法中，L1L2跟踪与本文方法有一定的相似性，即均对小模板表示系数施加了稀疏性约束，但从实验结果可以看出，本文方法的性能更优，其主要原因在于：首先是抑制模型漂移的方法不同，L1L2跟踪对目标模板表示系数施加的稠密性约束，虽然可以在一定程度上缓解离群模板对表示候选目标的干扰，但同时也在表示过程中引入了与候选目标相似度较低的目标模板，因此会造成表示效果下降的问题。而本文方法则采用背景模板捕捉目标区域的背景信息，并对模板集(目标和背景模板)表示系数施加了稀疏性约束，这样就能保证最相似的目标模板表示目标，并能排除背景干扰的影响；其次是观测似然的构造方式不同，虽然L1L2跟踪和本文方法均利用目标模板和小模板表示系数构造观测似然，但本文方法利用的是排除了背景干扰的目标模板表示系数，这样就能保证构造出的观测似然模型具有更好的精确性；最后，在表示系数的求解方面，虽然L1L2跟踪与本文方法均基于块坐标优化原理，采用目标模板和小模板表示系数分段进行求解的方式，但本文方法利用LISTA求解目标模板表示系数的近似解，从而有效地减小了计算消耗，实现了速度的提高。

基于以上原因，本文方法不仅在性能方面较L1L2跟踪有了很大的提升，并且优于实验中的其他方法。

5 结论

本文提出了一种判别稀疏表示模型，用于解决L1跟踪易产生模型漂移的问题。该模型利用小模板表示系数的L1范数约束，保证了跟踪方法对局部遮挡具有良好的鲁棒性。在此基础上，利用背景模板表示目标区域中的背景信息，有效地缓解了背景干扰的影响，提高了跟踪方法的鲁棒性。基于块坐标优化原理，提出了判别稀疏表示模型的快速求解算法，用于解决L1最小化计算量较大的问题。该算法分别使用LISTA和软阈值操作计算模板集${{\mathit{\boldsymbol{T}}_{{\rm{dis}}}}}$和小模板$\mathit{\boldsymbol{I}}$的表示系数，由于求解过程简单、快捷，计算量较小，故有效地提高了目标跟踪的速度。以粒子滤波为框架，结合提出的判别稀疏表示模型和相应的快速求解算法，实现了鲁棒快速的目标跟踪。在多个图像序列中的实验结果表明：与经典的跟踪方法相比，本文方法不仅能够在含有遮挡、目标的外观改变和out-plane旋转的图像序列中较好地完成跟踪任务，而且在一些包含复杂背景干扰、光照条件改变、in-plane旋转、相机运动造成观察视角改变、目标的尺度改变和快速运动的场景中也能取得比较理想的结果。同时，本文方法计算稀疏表示的时间消耗较少，因此能够更加高效地实现目标跟踪。在跟踪过程中的模板更新阶段，本文采用的是直接使用跟踪结果替代目标模板的方式，虽然这种方法的计算消耗较少，但未经处理的跟踪结果则很可能含有一定的干扰信息，而这种信息的不断积累则势必会影响跟踪的鲁棒性。因此，下一步的研究重点将是寻找更加高效的模板更新方法，从而实现既能够得到高质量的目标模板以便适应不良因素的干扰，同时又不会过度增加计算负担。

参考文献

[1] Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210–227. [DOI:10.1109/TPAMI.2008.79]

[2] Mei X, Ling H B. Robust visual tracking using l₁ minimization[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009:1436-1443.[DOI:10.1109/ICCV.2009.5459292]

[3] Wang N Y, Wang J D, Yeung D Y. Online robust non-negative dictionary learning for visual tracking[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia:IEEE, 2013:657-664.[DOI:10.1109/ICCV.2013.87]

[4] Xing J L, Gao J, Li B, et al. Robust object tracking with online multi-lifespan dictionary learning[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia:IEEE, 2013:665-672.[DOI:10.1109/ICCV.2013.88]

[5] Mairal J, Bach F, Ponce J, et al. Online dictionary learning for sparse coding[C]//Proceedings of the International Conference on Machine Learning. Montreal, Canada:ACM, 2009:689-696.[DOI:10.1145/1553374.1553463]

[6] Bozorgtabar B, Goecke R. Discriminative multi-task sparse learning for robust visual tracking using conditional random field[C]//Proceedings of the International Conference on Digital Image Computing:Techniques and Applications. Wollongong, Australia:IEEE, 2014:1-8.[DOI:10.1109/DICTA.2014.7008102]

[7] Fulkerson B, Vedaldi A, Soatto S. Class segmentation and object localization with superpixel neighborhoods[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009:670-677.[DOI:10.1109/ICCV.2009.5459175]

[8] Mei X, Ling H B, Wu Y, et al. Minimum error bounded efficient l₁ tracker with occlusion detection[C]//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado, America:IEEE, 2011:1257-1264.[DOI:10.1109/CVPR.2011.5995421]

[9] Bao C L, Wu Y, Ling H B, et al. Real time robust L1 tracker using accelerated proximal gradient approach[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island, America:IEEE, 2012:1830-1837.[DOI:10.1109/CVPR.2012.6247881]

[10] Zhang T Z, Ghanem B, Liu S, et al. Robust visual tracking via structured multi-task sparse learning[J]. International Journal of Computer Vision, 2013, 101(2): 367–383. [DOI:10.1007/s11263-012-0582-z]

[11] Tseng P, Yun S. Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization[J]. Journal of Optimization Theory and Applications, 2009, 140(3): 513–535. [DOI:10.1007/s10957-008-9458-3]

[12] Wang Z W, Liu D, Yang J C, et al. Deep networks for image super-resolution with sparse prior[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile:ACM, 2015:370-378.[DOI:10.1109/ICCV.2015.50]

[13] Yang A Y, Zhou Z H, Balasubramanian A G, et al. A review of fast l₁-minimization algorithms for robust face recognition[J]. IEEE Transactions on Image Processing, 2013, 22(8): 3234–3246. [DOI:10.1109/TIP.2013.2262292]

[14] Gregor K, LeCun Y. Learning fast approximations of sparse coding[C]//Proceedings of 2010 International Conference on Machine Learning. Haifa, Israel:IEEE, 2010:399-406.

[15] Wu Y, Lim J, Yang M H. Online object tracking:a benchmark[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, America:IEEE, 2013:2411-2418.[DOI:10.1109/CVPR.2013.312]

[16] Wang D, Lu H C, Yang M H. Online object tracking with sparse prototypes[J]. IEEE Transactions on Image Processing, 2013, 22(1): 314–325. [DOI:10.1109/TIP.2012.2202677]

[17] Yuan G L, Xue M G. Visual tracking based on sparse dense structure representation and online robust dictionary learning[J]. Journal of Electronics & Information Technology, 2015, 37(3): 536–542. [袁广林, 薛模根. 基于稀疏稠密结构表示与在线鲁棒字典学习的视觉跟踪[J]. 电子与信息学报, 2015, 37(3): 536–542. ] [DOI:10.11999/JEIT140507]