自适应多特征融合相关滤波目标跟踪

张艳琳; 钱小燕; 张淼; 葛红娟

doi:10.11834/jig.190304

图像分析和识别 | 浏览量 : 0 下载量: 68 CSCD: 6

PDF
导出
分享
收藏
专辑

自适应多特征融合相关滤波目标跟踪
Correlation filter target tracking algorithm based on adaptive multifeature fusion
2020年25卷第6期页码：1160-1170
收稿：2019-06-21，

修回：2019-11-13，

录用：2019-11-20，

纸质出版：2020-06-16
DOI： 10.11834/jig.190304
稿件说明：

移动端阅览

张艳琳, 钱小燕, 张淼, 葛红娟. 自适应多特征融合相关滤波目标跟踪[J]. 中国图象图形学报, 2020,25(6):1160-1170. DOI： 10.11834/jig.190304.

Yanlin Zhang, Xiaoyan Qian, Miao Zhang, Hongjuan Ge. Correlation filter target tracking algorithm based on adaptive multifeature fusion[J]. Journal of Image and Graphics, 2020, 25(6): 1160-1170. DOI： 10.11834/jig.190304.

摘要

目的

针对现实场景中跟踪目标背景复杂、光照变化、快速运动、旋转等问题，提出自适应多特征融合的相关滤波跟踪算法。

方法

提取目标的HOG(histogram of oriented gradients)特征和利用卷积神经网络提取高、低层卷积特征，借助一种自适应阈值分割方法评估每种特征的有效性，得到特征融合的权重比。根据权重系数融合每种特征的响应图，并据此得到目标的新估计位置，利用尺度相关滤波器计算目标尺度，得到目标尺度完成跟踪。

结果

在OTB(object tracking benchmark)-2013公开数据集上进行实验，在对多特征融合进行分析的基础上，测试了本文算法在11种不同属性下的跟踪性能，并与当前流行的7种算法进行对比分析。结果表明，本文算法的成功率和精确度均排名第1，相较于基准算法DSST (discriminative scale space tracking)跟踪精确度提高了4%，成功率提高了6%。在复杂场景下比其他主流算法更具有鲁棒性。

结论

本文算法以DSST相关滤波跟踪器为基准算法，借助自适应阈值分割方法评估每种特征的有效性，自适应融合两层卷积特征和HOG特征，使得判别性越强的单一特征融合权重越大，较好表达了目标的外观模型，在背景复杂、目标消失、光照变化、快速运动、旋转等场景下表现出较强的跟踪准确性。

Abstract

Objective

Target tracking is one of the basic problems in the field of computer vision. It is widely used in security monitoring

military operations

and automatic driving

among others. Tracking algorithms based on correlation filtering have been developed rapidly in recent years because of their fast and efficient features. However

designing a robust tracking algorithm remains a challenging problem due to background clutter

illumination variation

fast motion

rotation

and other complex factors. Building an effective appearance model is a key factor in tracking the success of an algorithm. The expressions of the current appearance model have two major types. In the first type

the appearance model is based on manual design. The common artificial design appearance model is the histogram of oriented gradients (HOG) feature because this feature can efficiently describe the contour and shape information of a target by calculating the direction gradient of the local area of the detected image. In the second type

the appearance model is based on deep learning. Low-level convolutional features contain rich texture information but are unable to adapt to background changes. High-level convolutional features contain rich semantic information that distinguishes backgrounds from targets even in complex contexts. Different informations of an image are described due to varying features; thus

this study proposes a correlation filter tracking method to achieve the effect of adaptive multifeature fusion.

Method

In this work

the DSST(discriminative scale space tracking) correlation filter is adopted as the benchmark algorithm

and conv1 and conv5 of the convolutional neural network (CNN) ImageNet-VGG (visual geometry group)-2 048 are used. First

the HOG feature of the target is extracted. Then

the high-and low-level convolutional features of the target are extracted using the CNN. The characteristic response graph is obtained. The maximum peak and shape of the response graph reflect the accurate information of the tracking results. Second

to evaluate the validity of the feature

using the area ratio of the peak as the new index is proposed to distinguish the confidence level of the correlation response graph. The validity of each feature is evaluated using an adaptive threshold segmentation method. If the peak of the response graph is sharp and the periphery is smooth

then the tracking result is reliable. The weight ratio of the feature fusion is obtained

such that the feature is improved and the fusion coefficient is increased. Lastly

the response graph of each feature is fused in accordance with the fusion coefficient

the final response output is calculated

and the target response position is determined from the maximum response value in the response graph. Scale-dependent filter estimation scales are reintroduced to achieve adaptive target tracking.

Result

To effectively evaluate the performance of the proposed method

the algorithm is tested on the public dataset OTB (object tracking benchmark)-2013. The 50 videos mostly contain 11 different challenges encountered in the target tracking process (including background complex

deformation

object disappearance

and scale variation). This study compares the algorithm with seven mainstream algorithms. The accuracy and success rate are used as the evaluation and tracking performance indicators. These algorithms are divided into two major categories: traditional tracking algorithms with representative and top ranks: ASLA (adaptive structural local sparse appearance)

SCM (sparsity-based collaborative model)

TLD (tracking-learning-detection). Correlation filtering algorithms: CFNet (correlation filter networks)

KCF (kernel correlation filter)

DSST

SAMF (scale adaptive with multiple features). The experimental results show that the proposed algorithm achieves the highest success rate and accuracy compared with the other algorithms. The accuracy of the proposed method on the OTB-2013 dataset is 77.8%. Those of the other algorithms are as follows: CFNet (76.1%)

DSST (74.6%)

KCF (73.5%)

SAMF (72.5%)

and the traditional algorithm SCM (67.8%). The success rate of the algorithm proposed in this work is 71.5%. Those of the other algorithms are CFNet (71.4%)

DSST (67.5%)

SAMF (66.2%)

and KCF (61.1%). The algorithm presented in this study increases tracking accuracy by 4% and improves success rate by 6%. From the aforementioned experimental data analysis

the method can effectively improve tracking performance. The proposed method ranks first in terms of accuracy compared with CFNet

DSST

SAMF

and KCF in seven attributes: background clutter

deformation

out-of-view

illumination variation

in-plane rotation

out-of-plane rotation

and fast motion. Compared with the other algorithms

the algorithm proposed in this study achieves the highest success rate in the scenes of nine attributes.

Conclusion

Different information of an image are described due to varying features; thus

this study proposes a correlation filter tracking method to achieve the effect of adaptive multifeature fusion. A CNN is used to extract high- and low-layer convolutional and HOG features. The adaptive threshold segmentation method is proposed to evaluate the validity of each feature. The two-layer convolutional and HOG features are adaptively fused. The response graph is fused in accordance with the fusion coefficient using feature validity analysis. Compared with most feature fusion methods that connect features serially or in parallel

this algorithm increases the fusion weight of a single feature with strong discriminability

and the appearance model of the target can be accurately represented. Therefore

the proposed algorithm exhibits strong robustness and tracking accuracy in scenarios with low resolution and scale variation. The presented target tracking method will be further studied in the future under occlusion

motion blur

and fast motion conditions.

关键词

Keywords

references

Bolme D S, Beveridge J R, Draper B A and Lui Y M. 2010. Visual object tracking using adaptive correlation filters//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern. San Francisco, CA: IEEE: 2544-2550[ DOI: 10.1109/CVPR.2010.5539960 http://dx.doi.org/10.1109/CVPR.2010.5539960 ]

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE: 886-893[ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

Danelljan M, Häger G, Khan F S and Felsberg M. 2014a. Accurate scale estimation for robust visual tracking//Proceedings of British Machine Vision Conference 2014. Nottingham, UK: BMVA Press: 1-11[ DOI: 10.5244/C.28.65 http://dx.doi.org/10.5244/C.28.65 ]

Danelljan M, Khan F S, Felsberg M and van de Weijer J. 2014b. Adaptive color attributes for real-time visual tracking//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 1090-1097[ DOI: 10.1109/CVPR.2014.143 http://dx.doi.org/10.1109/CVPR.2014.143 ]

Danelljan M, Häger G, Khan F S and Felsberg M. 2015a. Convolutional features for correlation filter based visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 621-629[ DOI: 10.1109/ICCVW.2015.84 http://dx.doi.org/10.1109/ICCVW.2015.84 ]

Danelljan M, Häger G, Khan F S and Felsberg M. 2015b. Learning spatially regularized correlation filters for visualtracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4310-4318[ DOI: 10.1109/ICCV.2015.490 http://dx.doi.org/10.1109/ICCV.2015.490 ]

Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with Kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583-596[DOI:10.1109/TPAMI.2014.2345390]

Jia X, Lu H C and Yang M H. 2012. Visual tracking via adaptive structural local sparse appearance model//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1822-1829[ DOI: 10.1109/CVPR.2012.6247880 http://dx.doi.org/10.1109/CVPR.2012.6247880 ]

Kalal Z, Matas J and Mikolajczyk K. 2010. P-N learning: bootstrapping binary classifiers by structural constraints//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 49-56[ DOI: 10.1109/CVPR.2010.5540231 http://dx.doi.org/10.1109/CVPR.2010.5540231 ]

Li Y and Zhu J K. 2014. A scale adaptive kernel correlation filter tracker with feature integration//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 254-265[ DOI: 10.1007/978-3-319-16181-5_18 http://dx.doi.org/10.1007/978-3-319-16181-5_18 ]

Lu G Z, Peng D L and Gu Y. 2018. Robust correlation filtering-based tracking by multifeature hierarchical fusion. Journal of Image and Graphics, 23(5):662-673

鲁国智, 彭冬亮, 谷雨. 2018.多特征分层融合的相关滤波鲁棒跟踪.中国图象图形学报, 23(5):662-673)[DOI:10. 11834/jig. 170472]

Ma C, Huang J B, Yang X K and Yang M S. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[ DOI: 10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ]

Mao N, Yang D D, Yang F C and Cai Y Z. 2016. Adaptive object tracking based on hierarchical convolution features. Laser and Optoelectronics Progress, 53(12):195-207

毛宁, 杨德东, 杨福才, 蔡玉柱. 2016.基于分层卷积特征的自适应目标跟踪.激光与光电子学进展, 53(12):195-207)[DOI:10.3788/LOP53.121502]

Otsu N. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62-66[DOI:10.1109/TSMC.1979.4310076]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition.[EB/OL].[2020-03-22] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Tuzel O, Porikli F and Meer P. 2006. Region covariance: a fast descriptor for detection and classification//Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer: 589-600[ DOI: 10.1007/11744047_45 http://dx.doi.org/10.1007/11744047_45 ]

Valmadre J, Bertinetto L, Henriques J F, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for Correlation Filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5000-5008[ DOI: 10.1109/CVPR.2017.531 http://dx.doi.org/10.1109/CVPR.2017.531 ]

Viola P and Jones M J. 2004. Robust real-time face detection. International Journal of Computer Vision, 57(2):137-154[DOI:10.1023/b:visi.0000013087.49260.fb]

Wang L J, Ouyang W L, Wang X G and Lu H C. 2015. Visual tracking with fully convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3119-3127[ DOI: 10.1109/ICCV.2015.357 http://dx.doi.org/10.1109/ICCV.2015.357 ]

Wu Y, Lim J and Yang M H. 2013. Online object tracking: a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 2411-2418[ DOI: 10.1109/CVPR.2013.312 http://dx.doi.org/10.1109/CVPR.2013.312 ]

Xiong C Z, Che M Q and Wang R L. 2018. Adaptive convolutional feature selection for real-time visual tracking. Journal of Image and Graphics, 23(11):1742-1750

熊昌镇, 车满强, 王润玲. 2018.自适应卷积特征选择的实时跟踪算法.中国图象图形学报, 23(11):1742-1750)[DOI:10.11834/jig.180252]

Zhong W, Lu H C and Yang M H. 2012. Robust object tracking via sparsity-based collaborative model//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1838-1845[ DOI: 10.1109/CVPR.2012.6247882 http://dx.doi.org/10.1109/CVPR.2012.6247882 ]