双特征模型核相关滤波目标跟踪算法

孟琭; 李诚新

doi:10.11834/jig.190033

图像分析和识别 | 浏览量 : 0 下载量: 4 CSCD: 2

PDF
导出
分享
收藏
专辑

双特征模型核相关滤波目标跟踪算法
Kernel correlation filtering algorithm based on a dual-feature model
2019年24卷第12期页码：2183-2199
收稿：2019-02-25，

修回：2019-6-1，

录用：2019-6-8，

纸质出版：2019-12-16
DOI： 10.11834/jig.190033
稿件说明：

移动端阅览

孟琭, 李诚新. 双特征模型核相关滤波目标跟踪算法[J]. 中国图象图形学报, 2019,24(12):2183-2199. DOI： 10.11834/jig.190033.

Lu Meng, Chengxin Li. Kernel correlation filtering algorithm based on a dual-feature model[J]. Journal of Image and Graphics, 2019, 24(12): 2183-2199. DOI： 10.11834/jig.190033.

摘要

目的

基于深度学习的目标跟踪算法，利用卷积深层作为特征，虽然精度高但无法做到实时跟踪；基于相关滤波的跟踪算法，利用HOG（histogram of oriented gridients）、CN（color name）和颜色直方图作为特征，速度快但精度较差。为兼顾目标跟踪算法的实时性与准确性，提出了一种基于双模型核相关滤波算法。

方法

提出了自适应的双特征模型选择机制，主特征模型采用浅层纹理特征HOG，辅助特征模型采用包含深层语意信息的CNN（convolutional neural networks）特征，二者协同作用，产生更加稳定的相关滤波器。为了提高算法的速度，采用主成分分析（PCA）技术对高维的CNN特征进行降维，并通过尺度优化、最优解求解方式优化等方法提高跟踪算法的准确性。

结果

在公开数据集OTB-2013上，本文算法与目前先进且速度能达到实时的SiamFC（fully-convolutional Siamese networks）、MEEM（multiple experts using entropy minimization）、SAMF（scale adaptive multiple features）、DSST（discriminative scale space tracking）等跟踪算法进行比较，一次成功率（OPE）结果显示，本文算法在距离精准度指标中综合排名第一，与KCF（kernel correlation filter）算法相比，本文算法的距离精准度提高了25.2%，重叠成功率提高了25.6%，平均速度达到38帧/s。

结论

本文提出的双模型自适应机制，针对主特征模型的置信响应自适应调用最优模型策略，并且实时更新模型，在综合考虑跟踪准确性和跟踪实时性的情况下，本文提出的目标跟踪算法的性能优于目前的跟踪算法。

Abstract

Objective

The target tracking algorithm that is based on deep learning and uses deep convolution features is highly accurate

but it cannot be tracked in real time nor applied to actual situations. The deep convolutional features of convolutional neural networks (CNNs) contain advanced semantic information. Even when the target appearance model has serious interference

such as illumination variation

deformation

and other interference factors

the deep convolution features still exhibit an accurate discriminative performance in the target. Although the tracking algorithm based on correlation filtering is fast (up to several hundred frame/s)

it is inaccurate. The algorithm uses the histogram of oriented gradient (HOG)

color name (CN)

and color histogram as statistical features to calculate the correlation of two image blocks. The position with the highest correlation is the predicted position. To balance the real-time tracking capability and accuracy of the target tracking algorithm

this study proposes a dual-model kernel correlation filtering algorithm based on the combination of the accuracy of the deep convolution feature algorithm and the speed of the correlation filtering algorithm.

Method

An adaptive dual-feature model selection mechanism is proposed. The dual model consists of main-and auxiliary-feature models. The main-feature model adopts a shallow texture feature. The dimension of the HOG feature is relatively low. Thus

it has a high calculation speed. The main-feature model is used for the real-time tracking of video sequences with clear texture contour features

and the kernel correlation function of the correlation filter of the main-feature model uses the Gaussian kernel function. The auxiliary-feature model employs CNN features containing deep semantic information. When serious interference factors

such as illumination variation

occlusion

and deformation

occur in video sequences

the auxiliary-feature model with deep CNN features is used to determine and correct the target position because such factors lead to low-confidence responses of the main-feature model. The linear kernel function is utilized by the kernel correlation function of the auxiliary feature model's correlation filter. The main-and auxiliary-feature models correspond to individual model update and synergistically cooperate to generate a stable correlation filter and improve the computational efficiency of the algorithm. The auxiliary-feature model adopts deep CNN features. The dimension of deep CNN features is too high

resulting in low calculation speed. To optimize the calculation speed and ensure the real-time performance of the algorithm

we use principal component analysis (PCA) to reduce the dimensionality of high-dimensional deep convolution features. Under the premise of saving as much effective information of the original features as possible

the dimension of the CNN features is reduced

and the computing speed is improved. This work also improves the accuracy of the tracking algorithm by optimizing the scale and the solution method.

Result

When the appearance model of the target changes seriously

the confidence response value of the main-feature model becomes too low. In this case

the dual-feature model discriminating mechanism promptly calls the auxiliary-feature model to correct the target positioning in real time. Experiments show that adopting the adaptive dual-feature model recognition mechanism is effective. We compare our algorithm with current advanced tracking algorithms with real-time speed

such as SiamFC(fully-convolutional Siamese networks)

MEM(multiple experts using entropy minimization)

SAMF(scale adaptive muttiple features)

DSST(discriminative scale space tracking)

KCF(kernel correlation filter) Struck

and TLD(tracking learning detection). The OPEresult of the public dataset OTB-2013 shows that the proposed algorithm ranks first in terms of distance precision rate. Compared with the KCF algorithm

the distance precision and overlap success rates of the proposed algorithm are improved by 25.2% and 25.6%

respectively

and the average speed of the proposed algorithm can reach 38 frame/s. To demonstrate the performance of the proposed tracking algorithm more concretely

we also compare it with the most advanced tracking algorithms based on deep convolution characteristics

such as VITAL

SANet

and CCOT. However

these algorithms cannot meet the real-time performance requirement and cannot be applied in actual situations.

Conclusion

A new tracking model mechanism is proposed in this study. An auxiliary-feature model is added to the main-feature model. The auxiliary-feature model adjusts the optimal position of the target in real time according to the change in the confidence response of the main-feature model and prevents the main-feature model from drifting. PCA is introduced to reduce the dimensionality of the deep convolution feature and optimize the speed of the algorithm. However

the proposed algorithm still has problems. First

a change in the environment would lead to uncertainty of the auxiliary model's threshold. Setting the threshold of the mobilization auxiliary model to a fixed value would reduce the adaptability of the tracking algorithm. Second

the stability of the main-feature model's correlation filter directly affects the accuracy of the algorithm. The correlation filtering algorithm expands the sample set by introducing a sample period hypothesis

but the period hypothesis also introduces the boundary effect

which considerably reduces the accuracy of the filter. To improve the accuracy of the algorithm

methods to eliminate the boundary effect should be introduced. Examples of such methods include adding a spatial regularization term in the ridge regression solution or adding a mask matrix image to highlight the target position. The results of the OTB-2013 public dataset show that the performance of the proposed target tracking algorithm is better than that of current tracking algorithms in terms of tracking accuracy and real-time tracking. The proposed tracking algorithm has good adaptability under 10 different interference factors

such as motion blur

scale variation

and rotation.

关键词

Keywords

references

Bertinetto L, Valmadre J, Golodetz S, Miksik O and Torr P. 2016a. H. S. Staple: complementary learners for real-time tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 1401-1409 [ DOI: 10.1109/CVPR.2016.156 http://dx.doi.org/10.1109/CVPR.2016.156 ]

Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016b. Fully-convolutional Siamese networks for object tracking//Proceedings of the European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 850-865 [ DOI: 10.1007/978-3-319-48881-3_56 http://dx.doi.org/10.1007/978-3-319-48881-3_56 ]

Bouwmans T and Zahzah E H. 2014. Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Computer Vision and Image Understanding, 122: 22-34 [DOI: 10.1016/j.cviu.2013.11.009]

Choi J, Chang H J, Fischer T, Yun S, Lee K, Jeong J, Demiris Y and Choi J Y. 2018. Context-aware deep feature compression for high-speed visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 479-488 [ DOI: 10.1109/CVPR.2018.00057 http://dx.doi.org/10.1109/CVPR.2018.00057 ]

Danelljan M, Bhat G, Khan F S and Felsberg M. 2017. ECO: efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 6931-6939 [ DOI: 10.1109/CVPR.2017.733 http://dx.doi.org/10.1109/CVPR.2017.733 ]

Danelljan M, Häger G, Khan F S and Felsberg M. 2016a. Learning spatially regularized correlation filters for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 4310-4318 [ DOI: 10.1109/ICCV.2015.490 http://dx.doi.org/10.1109/ICCV.2015.490 ]

Danelljan M, Häger G, Khan F S and Felsberg M. 2014a. Accurate scale estimation for robust visual tracking//Proceedings of 2014 British Machine Vision Conference. Nottingham: BMVA Press, 65.1-65.11 [ DOI: 10.5244/C.28.65 http://dx.doi.org/10.5244/C.28.65 ]

Danelljan M, Robinson A, Khan F S and Felsberg M. 2016b. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 472-488 [ DOI: 10.1007/978-3-319-46454-1_29 http://dx.doi.org/10.1007/978-3-319-46454-1_29 ]

Danelljan M, Khan F S, Felsberg M and van de Weijer J. 2014b. Adaptive color attributes for real-time visual tracking//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 1090-1097 [ DOI: 10.1109/CVPR.2014.143 http://dx.doi.org/10.1109/CVPR.2014.143 ]

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 886-893 [ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

Fan H and Ling H B. 2017. SANet: Structure-aware network for visual tracking//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE, 2217-2224 [ DOI: 10.1109/CVPRW.2017.275 http://dx.doi.org/10.1109/CVPRW.2017.275 ]

Hare S, Saffari A and Torr P H S. 2011. Struck: structured output tracking with kernels//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 263-270 [ DOI: 10.1109/ICCV.2011.6126251 http://dx.doi.org/10.1109/ICCV.2011.6126251 ]

Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 583-596 [DOI: 10.1109/TPAMI.2014.2345390]

Kalal Z, Mikolajczyk K and Matas J. 2012. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1409-1422 [DOI: 10.1109/TPAMI.2011.239]

Li B, Yan J J, Wu W, Zheng Z and Hu X L. 2018a. High performance visual tracking with siamese region proposal network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 8971-8980 [ DOI: 10.1109/CVPR.2018.00935 http://dx.doi.org/10.1109/CVPR.2018.00935 ]

Li F, Tian C, Zuo W M, Zhang L and Yang M H. 2018b. Learning spatial-temporal regularized correlation filters for visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4904-4913 [ DOI: 10.1109/CVPR.2018.00515 http://dx.doi.org/10.1109/CVPR.2018.00515 ]

Li Y and Zhu J K. 2014. A scale adaptive kernel correlation filter tracker with feature integration//Proceedings of 2014 European Conference on Computer Vision. Zurich, Switzerland: Springer, 254-265 [ DOI: 10.1007/978-3-319-16181-5_18 http://dx.doi.org/10.1007/978-3-319-16181-5_18 ]

Lukežic A, Vojír T, Zajc L C, Matas J and Kristan M. 2017. Discriminative correlation filter with channel and spatial reliability//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 4847-4856 [ DOI: 10.1109/CVPR.2017.515 http://dx.doi.org/10.1109/CVPR.2017.515 ]

Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 3074-3082 [ DOI: 10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ]

SongY B, Ma C, Wu X H, Gong L J, Bao L C, Zuo W M, Shen C H, Lau R W H and Yang M H. 2018. VITAL: VIsual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 8990-8999 [ DOI: 10.1109/CVPR.2018.00937 http://dx.doi.org/10.1109/CVPR.2018.00937 ]

Sun C, Wang D, Lu H C and Yang M H. 2018. Correlation tracking via joint discrimination and reliability learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 489-497 [ DOI: 10.1109/CVPR.2018.00058 http://dx.doi.org/10.1109/CVPR.2018.00058 ]

Valmadre J, Bertinetto L, Henriques J F, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 5000-5008 [ DOI: 10.1109/CVPR.2017.531 http://dx.doi.org/10.1109/CVPR.2017.531 ]

Wang N, Zhou W G, Tian Q, Hong R C, Wang M and Li H Q. 2018a. Multi-cue correlation filters for robust visual tracking// Proceedings of 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4844-4853 [ DOI: 10.1109/CVPR.2018.00509 http://dx.doi.org/10.1109/CVPR.2018.00509 ]

Wang X, Li C L, Luo B and Tang J. 2018b. SINT++: robust visual tracking via adversarial positive instance generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4864-4873 [ DOI: 10.1109/CVPR.2018.00511 http://dx.doi.org/10.1109/CVPR.2018.00511 ]

Wu Y, Lim J and Yang M H. 2013. Online object tracking: a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2411-2418 [ DOI: 10.1109/CVPR.2013.312 http://dx.doi.org/10.1109/CVPR.2013.312 ]

Zhang J M, Ma S G and Sclaroff S. 2014. MEEM: robust tracking via multiple experts using entropy minimization//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 188-203 [ DOI: 10.1007/978-3-319-10599-4_13 http://dx.doi.org/10.1007/978-3-319-10599-4_13 ]

Zhang K H, Zhang L and Yang M H. 2012. Real-time compressive tracking//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 864-877 [ DOI: 10.1007/978-3-642-33712-3_62 http://dx.doi.org/10.1007/978-3-642-33712-3_62 ]

Zhang T Z, Xu C S and Yang M H. 2017. Multi-task correlation particle filter for robust object tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 4819-4827 [ DOI: 10.1109/CVPR.2017.512 http://dx.doi.org/10.1109/CVPR.2017.512 ]