发布时间: 2018-05-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170472 2018 | Volume 23 | Number 5 图像处理和编码

 收稿日期: 2017-08-28; 修回日期: 2017-12-08 基金项目: 国家自然科学基金项目（61771177，61375011） 第一作者简介: 鲁国智(1991-), 男, 杭州电子科技大学自动化学院控制工程专业硕士研究生, 主要研究方向为嵌入式系统开发、数字图像处理、计算机视觉。E-mail:m18094784726@163.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)05-0662-12

关键词

Robust correlation filtering-based tracking by multifeature hierarchical fusion
Lu Guozhi, Peng Dongliang, Gu Yu
Fundamental Science on Communication Information Transmission and Fusion Technology Laboratory, Hangzhou Dianzi University, Hangzhou 310018, China
Supported by: National Natural Science Foundation of China (61771177, 61375011)

Abstract

Objective A robust correlation filtering-based visual tracking algorithm based on multifeature hierarchical fusion is proposed to improve the robustness of target tracking after summarizing the main multifeature fusion strategies to solve the multifeature fusion problem in correlation filtering-based tracking. Method Three features, including histogram of oriented gradient (HOG), color name (CN), and color histogram, are extracted from the target area and its surroundings to depict the appearances of the target and background when the multichannel correlation filtering algorithm is used to track the target. Two fusion layers are used in the proposed hierarchical fusion scheme to combine the response maps of the three features. The HOG and CN features, which describe the gradient and color information of the target, respectively, have a strong discrimination capability and are a pair of complementary features. Given that the saliency of the HOG and CN features is different under different tracking scenarios, the adaptive weighted fusion strategy, which can adaptively adjust fusion weights according to scene change, can be used to combine the responses of the HOG and CN features. Therefore, the adaptive weighted fusion strategy is used to combine the response maps of the HOG and CN features at the first fusion layer, where fusion weights are computed by calculating the smooth constraint and peak-to-sidelobe ratio of the feature response maps. Color histogram is a global statistical feature, and it can handle the case of deformation because the position information is discarded during computation of the color histogram. However, the tracking algorithm has a low accuracy when using the color histogram only because it is susceptible to the interference of similar-colored backgrounds. Thus, the color histogram feature is used as an additional feature in the proposed algorithm. The fixed-coefficient fusion strategy is adopted to combine the feature response maps of the first fusion layer and the feature response maps of the second fusion layer based on the color histogram. Finally, the position of the target is estimated based on the final response map, and the maximum of the final response map corresponds to the target position. The scale estimation algorithm, which uses a 1D scale-dependent filter to estimate the target scale rapidly, is adopted to obtain an accurate bounding box of the target. The model update procedure using a fixed learning factor at each frame is performed to adapt to appearance changes. Result The performance of the proposed tracking algorithm is verified using two public datasets, i.e., OTB-2013 and VOT-2014, for the evaluation of the visual tracking algorithm. The OTB-2013 dataset contains 51 test sequences, of which 35 are color video sequences. The distance precision and success rate curves are selected as performance metrics for the OTB-2013 dataset, and the one-pass evaluation assessment method is used to compute these metrics. The VOT-2014 dataset contains 25 color test sequences, and the accuracy and robustness metrics are used to analyze the performance for the VOT-2014 dataset. The experiments are divided into two parts, i.e., performance analysis of different parameters on the proposed algorithm and comparison with five mainstream correlation-filtering-based tracking algorithms, to analyze the performance of the proposed algorithm fully. The parameters of the proposed multifeature hierarchical fusion scheme, including fusion methods, target features, and fusion parameters, are analyzed using 35 sequences of the OTB-2013 dataset. Experimental results indicate that the proposed adaptive weighted fusion strategy is better than multiplicative fusion strategy, and the HOG, CN, color histogram features can improve the performance of the tracking algorithm. Second, the performance of our algorithm and five mainstream tracking algorithms are compared and analyzed. The six tracking algorithms are initially tested on all sequences and subsequently tested on 10 different individual attribute sequences. Experimental results indicate that the tracking performance is improved, where the precision score of the proposed algorithm is higher than that of the Staple algorithm by 5.9% (0.840 vs 0.781). Meanwhile, the robustness of the proposed algorithm is superior to that of other algorithms in most scenarios because of the effective integration of the CN, HOG, and color histogram features, and the highest success rate is achieved on out-of-plane rotation, occlusion, and fast motion sequences. Conclusion The robustness of the proposed multifeature hierarchical fusion tracking algorithm is superior to that of other algorithms based on correlation filtering under the premise of ensuring the tracking accuracy. The proposed hierarchical fusion strategy can be used and expanded when different types of features are adopted in the correlation filtering-based tracking algorithm.

Key words

target tracking; correlation filter; multi-feature fusion; hierarchical fusion; feature response map

1 多通道相关滤波跟踪原理

$d$ 通道目标外观模板为 $\mathit{\boldsymbol{f }}$ ，其第 $l$ 个通道特征表示为 $\mathit{\boldsymbol{f }} ^l$ $\{l\in{1, \cdots , d}\}$ 。记相关滤波器为 $\mathit{\boldsymbol{h }}$ ，其由 $d$ 个单通道滤波器 $\mathit{\boldsymbol{h }}^l$ 组成。多通道相关滤波跟踪算法通过最小化训练损失函数 $ε$ 求取 $\mathit{\boldsymbol{h }}$ ，即

 $\varepsilon = {\left\| {\mathit{\boldsymbol{g-}}\sum\limits_{l = 1}^d {{\mathit{\boldsymbol{h}}^l} * {\mathit{\boldsymbol{f}}^l}} } \right\|^2} + \lambda \sum\limits_{l = 1}^d {{{\left\| {{\mathit{\boldsymbol{h}}^l}} \right\|}^2}}$ (1)

 ${\mathit{\boldsymbol{H}}^l} = \frac{{\mathit{\boldsymbol{\bar G}}{\mathit{\boldsymbol{F}}^l}}}{{\sum\limits_{k = 1}^d {{{\mathit{\boldsymbol{\bar F}}}^k}{\mathit{\boldsymbol{F}}^k} + \lambda } }};\;\;\;l = 1, \cdots, d$ (2)

 $\begin{array}{l} \mathit{\boldsymbol{A}}_{_t}^{^l} = \left( {1-\eta } \right)\mathit{\boldsymbol{A}}_{_{t-1}}^{^l} + \eta \mathit{\boldsymbol{\bar GF}}_{_t}^{^l};\;\;\;\;\;l = 1, \cdots, d\\ {\mathit{\boldsymbol{B}}_t} = \left( {1-\eta } \right){\mathit{\boldsymbol{B}}_{t - 1}} + \eta \sum\limits_{k = 1}^d {\mathit{\boldsymbol{\bar F}}_{_t}^{^k}\mathit{\boldsymbol{F}}_{_t}^{^k}} \end{array}$ (3)

 ${y_t} = {F^{-1}}\left\{ {\frac{{\sum\limits_{l = 1}^d {\mathit{\boldsymbol{\bar A}}_{_{t-1}}^{^l}\mathit{\boldsymbol{Z}}_{_t}^{^l}} }}{{{\mathit{\boldsymbol{B}}_{t-1}} + \lambda }}} \right\}$ (4)

2.1.2 颜色直方图特征

 $\begin{array}{l} {\varepsilon _{{\rm{hist}}}} = \frac{1}{{\left| \mathit{\boldsymbol{O}} \right|}}\sum\limits_{u \in O} {{{\left( {{\mathit{\boldsymbol{\beta }}^{\rm{T}}}\mathit{\boldsymbol{\varphi }}\left[u \right] - 1} \right)}^2} + } \\ \;\;\;\;\;\;\;\;\frac{1}{{\left| \mathit{\boldsymbol{B}} \right|}}\sum\limits_{u \in B} {{{\left( {{\mathit{\boldsymbol{\beta }}^{\rm{T}}}\mathit{\boldsymbol{\varphi }}\left[u \right]} \right)}^2}} \end{array}$ (5)

 ${\mathit{\boldsymbol{\beta }}^j} = \frac{{{\mathit{\boldsymbol{\rho }}^j}\left( \mathit{\boldsymbol{O}} \right)}}{{{\mathit{\boldsymbol{\rho }}^j}\left( \mathit{\boldsymbol{O}} \right) + {\mathit{\boldsymbol{\rho }}^j}\left( \mathit{\boldsymbol{B}} \right) + \lambda }};\;\;\;\;\;j = 1, \cdots, M$ (6)

PSR的计算公式为

 $P = \frac{{\max \left( {{y_t}} \right)-{u_\mathit{\Phi} }\left( {{\mathit{\boldsymbol{y}}_t}} \right)}}{{{\sigma _\mathit{\Phi} }({\mathit{\boldsymbol{y}}_t})}}$ (8)

SCCM越小，PSR值越大，表明对应特征的跟踪结果可信度越高，在模板特征融合时应该赋予更大的权重。基于以上考虑，设计的自适应特征融合权重计算公式为

 ${w_{{\rm{CN}}}} = \frac{{\frac{{{P_{{\rm{CN}}}}}}{{{S_{{\rm{CN}}}}}}}}{{\frac{{{P_{{\rm{CN}}}}}}{{{S_{{\rm{CN}}}}}} + \frac{{{P_{{\rm{HOG}}}}}}{{{S_{{\rm{HOG}}}}}}}}$ (9)

 ${w_{{\rm{CN}}, t}} = \left( {1-\tau } \right) \times {w_{{\rm{CN}}, t-1}} + \tau \times {w_{{\rm{CN}}}}$ (10)

 ${\mathit{\boldsymbol{y}}_{{\rm{tmpl}}}} = {w_{{\rm{CN}}}} \times {\mathit{\boldsymbol{y}}_{{\rm{CN}}}} + \left( {1-{w_{{\rm{CN}}}}} \right) \times {\mathit{\boldsymbol{y}}_{{\rm{HOG}}}}$ (11)

2.2.2 固定系数特征融合

 ${\mathit{\boldsymbol{y}}_{{\rm{trans}}}} = \alpha \times {\mathit{\boldsymbol{y}}_{{\rm{hist}}}} + \left( {1-\alpha } \right) \times {\mathit{\boldsymbol{y}}_{{\rm{tmpl}}}}$ (12)

2.4 算法流程

1) 在第 $t$ 帧的目标估计位置 $p_t$ 处和尺度 $s_t$ 上，提取HOG和CN特征，通过式(3)更新滤波器 $\boldsymbol{H }_{\rm{HOG}}$ $\boldsymbol{H } _{\rm{CN}}$ 和尺度滤波器 $\boldsymbol{H } _{\rm{scale}}$ 。提取$\mathit{\boldsymbol{\rho }}(\mathit{\boldsymbol{O}}) $$\mathit{\boldsymbol{\rho }}(\mathit{\boldsymbol{B}}) 特征，通过线性插值方法更新 {\mathit{\boldsymbol{\rho }}_t}(\mathit{\boldsymbol{O}})$$ {\mathit{\boldsymbol{\rho }}_t}(\mathit{\boldsymbol{B}})$

2) 在第 $t+1$ 帧的 $p_t$ 处和尺度 $s_t$ 上，提取HOG和CN特征位置候选样本，通过式(4)分别得到 $\boldsymbol{y}_{\rm{HOG}}$ $\boldsymbol{y}_{\rm{CN}}$ 。通过式(10)进行融合权重 $w_{\rm{CN}}$ 更新。通过式(6)和积分图技术，在候选样本 $\boldsymbol{Z}$ 上计算得到 $\boldsymbol{y} _{\rm{hist}}$

3) 通过式(11)进行自适应特征融合得到 $\boldsymbol{y}_{\rm{tmpl}}$ 。通过式(12)进行固定系数特征融合得到位置响应图 $\boldsymbol{y}_{\rm{trans}}$ ，通过 $\boldsymbol{y}_{\rm{trans}}$ 的峰值位置得到第 $t+1$ 帧目标估计位置 $p_{t+1}$

4) 在第 $t+1$ 帧的目标估计位置 $p_{t+1}$ 处，提取HOG特征尺度候选样本，通过式(4)得到尺度响应图 $\boldsymbol{y}_{\rm{scale}}$ ，通过 $\boldsymbol{y}_{\rm{scale}}$ 的峰值位置得到第 $t+1$ 帧目标估计尺度 $s_{t+1}$

5) 输出第 $t+1$ 帧目标位置 $p_{t+1}$ 和目标尺度 $s_{t+1}$ 。返回1)，跟踪下一帧。

3.3.3 多特征分层融合参数分析

$α$ 固定为0.3时，学习系数 $τ$ 的变化对算法的准确率影响较小，这从另一方面表明了HOG和CN特征的有效性和互补性。当 $τ$ 为0.2时，算法准确率最高。 $τ$ 固定为0.2时，融合系数 $α$ 的变化对算法的准确率影响很大。当 $α$ 系数大于0.45时，随着 $α$ 的增大，算法准确率会随之下降。 $α$ 为1时，算法只利用颜色直方图特征，算法准确率最低，说明了颜色直方图特征的局限性。当 $α$ 为0.3时，算法准确率最高，因此实验中取 $τ$ 为0.2， $α$ 为0.3。

3.4 与5种主流基于相关滤波的跟踪算法对比分析

Table 1 Summary and comparison of the six kinds of algorithms

 算法 特征组合 特征融合方法 尺度自适应 Gray HOG CN 颜色直方图 合并成多通道特征 加权融合 CN[8] √ √ √ × KCF[7] √ × SAMF[10] √ √ √ √ √ DSST[9] √ √ √ √ Staple[17] √ √ √ √ √ √ 本文 √ √ √ √ √ √ 注：√表示是，×表示否。

3.4.1 OTB-2013实验结果

Table 2 Average speed of six tracking algorithms on OTB-2013

 本文 Staple[17] SAMF[10] DSST[9] KCF[7] CN[8] 平均速度/(帧/s) 21.3 41.4 10.4 25.3 131.1 99.5

3.4.2 VOT-2014实验结果

VOT-2014的实验类型包含baseline和region noise两种，其中region noise实验是测试算法在干扰情况下的跟踪性能。本文算法没有随机性，在baseline实验上做了3次仿真卡罗模拟，在region noise实验上做了5次仿真卡罗模拟。表 3为本文算法与对比算法在VOT-2014上的跟踪结果，可以看出，本文算法在baseline和region noise实验中的鲁棒性均好于其他算法，这进一步说明了提出的多特征分层融合策略的有效性。

Table 3 Summary of six kinds of algorithms tracking results on VOT-2014

 算法 baseline实验 region noise实验 Accuracy Robustness Accuracy Robustness CN[8] 0.52 1.68 0.48 1.64 KCF[7] 0.62 1.32 0.57 1.51 SAMF[10] 0.61 1.28 0.57 1.43 DSST[9] 0.62 1.16 0.57 1.28 Staple[17] 0.64 0.96 0.58 1.04 本文 0.62 0.88 0.58 0.95 注：加粗字体为最优结果。

参考文献

• [1] Huang K Q, Chen X T, Kang Y F, et al. Intelligent visual surveillance:a review[J]. Chinese Journal of Computers, 2015, 38(6): 1093–1118. [黄凯奇, 陈晓棠, 康运锋, 等. 智能视频监控技术综述[J]. 计算机学报, 2015, 38(6): 1093–1118. ] [DOI:10.11897/SP.J.1016.2015.01093]
• [2] Smeulders A W M, Chu D M, Cucchiara R, et al. Visual tracking:an experimental survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1442–1468. [DOI:10.1109/TPAMI.2013.230]
• [3] Kristan M, Matas J, Leonardis A, et al. The visual object tracking VOT2015 challenge results[C]//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE, 2016: 564-586. [DOI: 10.1109/ICCVW.2015.79]
• [4] Kristan M, Pflugfelder R, Leonardis A, et al. The visual object tracking VOT2014 challenge results[C]//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 191-217. [DOI: 10.1007/978-3-319-16181-5_14]
• [5] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 2544-2550. [DOI: 10.1109/CVPR.2010.5539960]
• [6] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer-Verlag, 2012: 702-715. [DOI: 10.1007/978-3-642-33765-9_50]
• [7] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. [DOI:10.1109/TPAMI.2014.2345390]
• [8] Danelljan M, Khan F S, Felsberg M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 1090-1097. [DOI: 10.1109/CVPR.2014.143]
• [9] Danelljan M, Häger G, Khan F S, et al. Accurate scale estimation for robust visual tracking[C]//Proceedings of British Machine Vision Conference 2014. Nottingham, UK: BMVA Press, 2014: 1-11. [DOI: 10.5244/C.28.65]
• [10] Li Y, Zhu J K. A scale adaptive kernel correlation filter tracker with feature integration[C]//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 254-265. [DOI: 10.1007/978-3-319-16181-5_18]
• [11] Felzenszwalb P F, Girshick R, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645. [DOI:10.1109/TPAMI.2009.167]
• [12] Van De Weijer J, Schmid C, Verbeek J, et al. Learning color names for real-world applications[J]. IEEE Transactions on Image Processing, 2009, 18(7): 1512–1523. [DOI:10.1109/TIP.2009.2019809]
• [13] Xu Y L, Wang J B, Li Y, et al. Scale adaptive correlation tracking combined with color features[J]. Application Research of Computers, 2017, 34(3): 945–948. [徐玉龙, 王家宝, 李阳, 等. 融合颜色特征的尺度自适应相关跟踪[J]. 计算机应用研究, 2017, 34(3): 945–948. ] [DOI:10.3969/j.issn.1001-3695.2017.03.071]
• [14] Shen Q, Yan X L, Liu L F, et al. Multi-scale correlation filtering tracker based on adaptive feature selection[J]. Acta Optica Sinica, 2017, 37(5): #515001. [沈秋, 严小乐, 刘霖枫, 等. 基于自适应特征选择的多尺度相关滤波跟踪[J]. 光学学报, 2017, 37(5): #515001. ] [DOI:10.3788/aos201737.0515001]
• [15] Wang W, Wang C P, Li J, et al. Correlation filter tracking based on feature fusing and model adaptive updating[J]. Optics and Precision Engineering, 2016, 24(8): 2059–2066. [王暐, 王春平, 李军, 等. 特征融合和模型自适应更新相结合的相关滤波目标跟踪[J]. 光学 精密工程, 2016, 24(8): 2059–2066. ] [DOI:10.3788/OPE.20162408.2059]
• [16] Ma C, Huang J B, Yang X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 3074-3082. [DOI: 10.1109/ICCV.2015.352]
• [17] Bertinetto L, Valmadre J, Golodetz S, et al. Staple: complementary learners for real-time tracking[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1401-1409. [DOI: 10.1109/CVPR.2016.156]
• [18] Wu Y, Lim J, Yang M H. Online object tracking: a benchmark[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 2411-2418. [DOI: 10.1109/CVPR.2013.312]
• [19] Comaniciu D, Ramesh V, Meer P, et al. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–577. [DOI:10.1109/TPAMI.2003.1195991]
• [20] Isard M, Blake A. CONDENSATION-conditional density propagation for visual tracking[J]. International Journal of Computer Vision, 1998, 29(1): 5–28. [DOI:10.1023/A:1008078328650]
• [21] Liu T, Wang G, Yang Q X. Real-time part-based visual tracking via adaptive correlation filters[C]//Proceedings of 2015 IEEE Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 4902-4912. [DOI: 10.1109/CVPR.2015.7299124]