融入时序和速度信息的自适应更新目标跟踪
Adaptive update object tracking algorithm incorporating timing and speed information
- 2021年26卷第4期 页码:883-897
收稿:2020-04-03,
修回:2020-8-8,
录用:2020-8-15,
纸质出版:2021-04-16
DOI: 10.11834/jig.200100
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-04-03,
修回:2020-8-8,
录用:2020-8-15,
纸质出版:2021-04-16
移动端阅览
目的
2
针对目标跟踪算法在现实场景的遮挡、光照变化和尺度变化等问题,提出一种融入时序信息和速度信息的多特征融合自适应模型更新目标跟踪算法。
方法
2
通过提取目标的分级深度特征和手工设计方向梯度直方图(histogram of oriented gradients,HOG)特征,以全深度特征组合和深层深度特征与手工设计特征组合的方式构造两个融合特征器,提高在复杂场景下跟踪的稳健性;对融合特征进行可信度计算,选择最可靠融合特征对当前帧目标进行跟踪;在跟踪质量不可靠时,对目标表征模型进行更新,加入时间上下文信息和当前鲁棒表征信息,通过多峰值判定和运动速度判定选择最优目标预测位置作为最终结果。
结果
2
在OTB(object tracking benchmark)2013和OTB2015数据库上进行大量测试,与其他7个算法相比,本文算法总体效果取得最优,且在不同复杂环境下也取得了优秀的跟踪效果,在OTB13和OTB15数据库中,跟踪精度分别为89.3%和83.3%,成功率分别为87%和78.3%。
结论
2
本文算法利用深度特征与手工设计特征进行融合,对跟踪结果进行多峰值分析和运动速度判定,跟踪结果不佳时自适应更新特征进行重跟踪。实验结果表明,本文算法可以有效处理光照变化、背景杂波和遮挡等复杂因素的干扰,有效提升了跟踪质量。
Objective
2
Target tracking has become a hot research topic in the field of computer vision. It aims to use the target labeled in the first frame of the video sequence as training data so that the target can be tracked in real time throughout the subsequent video sequence. Target tracking has broad application prospects in the fields of intelligent video surveillance systems
human-computer interaction
and intelligent transportation systems. Thanks to the rapid development of artificial intelligence technology in the current era
target tracking has also been improved. However
video sequences obtained in the real world often have various complex situations. Thus
the motion state of the tracking target in the video sequence becomes more complicated. The current mainstream target tracking algorithms are mainly divided into target tracking algorithms based on deep learning and tracking algorithms based on correlation filter
but none of them can well solve various challenges in target tracking. In order to solve the problem that the target tracking algorithm will encounter occlusion
illumination variation
and scale variation in real scenes
we propose an adaptive model update target tracking algorithm that incorporates multi-feature fusion of time information and speed information to improve the performance of the target tracking algorithm in these situations.
Method
2
In the field of computer vision
the recognition of a target is often to construct a model of the target by extracting its features. The current commonly used features include deep features and handcrafted features. Deep features contain more semantic information
and handcrafted features have higher resolution. For deep features
the features extracted from the deeper network layers in the convolutional neural network have more semantic information but have lower resolution
and the features extracted from the shallower network layers have higher resolution but have less semantic information. Semantic information helps to improve the success rate of target tracking
and high resolution helps to improve the accuracy of target tracking. Considering this factor
we need to make a comprehensive selection of features. Our method aims to fuse different types of features in order to make the best combination between the semantic information and resolution of the features
so as to obtain the best tracking accuracy and success rate. Visual Geometry Group (VGG19) is an excellent deep neural network in the field of computer vision. It is used initially for image recognition. Because it has enough network layers
it has a good effect on the representation of objects. Thus
we extract three depth features of the target through the three different network layers of the pre-trained VGG19 convolutional neural network and extract the handcrafted histogram of oriented gradients (HOG) features of the target at the same time. The HOG feature is a feature descriptor used for object detection in computer vision and image processing. It is constructed by calculating and counting the gradient direction histogram of the local area of the image
and it maintains good invariance to image geometric and optical deformations. The three deep features are then fused to obtain a fusion feature of full depth features. Subsequently
two of the depth features are fused with the handcrafted HOG feature to obtain a fusion feature that combines the depth feature and the handcrafted feature. Then
we use these two fusion features to track the target independently
calculate the reliability of the two tracking results
and select the tracking result of the most reliable fusion feature as the final tracking result of the current frame. When the overall tracking results of the current frame cannot meet the requirements
we update the target model in time
add time context information and current robust characterization information to build a new tracking model
and use the updated model to re-track the current frame target. Then
the response map of re-tracking is determined by multi-peak determination and motion speed determination to select the best predicted position as the final result of the current frame.
Result
2
A large number of tests have been performed on the object tracking benchmark (OTB) 2013 and OTB2015 benchmarks. Both benchmarks are the mainstream test datasets in the field of target tracking. The OTB2013 benchmark contains 50 video sequences
whereas the OTB2015 benchmark contains 100 video sequences. All video sequences cover 11 different challenges and include color and grayscale video sequences. We selected seven more mainstream tracking algorithms this year for comparison. The experimental results show that our algorithm achieved the best overall effect compared with the comparison algorithm and obtained excellent tracking results in different complex environments. The tracking accuracy and success rate of our algorithm in the OTB2013 benchmark reached 89.3% and 87%
respectively. The accuracy is 4.5% higher than that of the second-ranked long-term correlation tracking (LCT)
and the success rate is 8.1% higher than that of the second-ranked spatially regularized discriminative correlation filters (SRDCF). The tracking accuracy and success rate of our algorithm in the OTB2015 benchmark reached 83.3% and 78.3%
respectively. The accuracy is 4.5% higher than that of the second-ranked SRDCF
and the success rate is 5.3% higher than that of the second-ranked SRDCF. Similarly
in the 11 different challenges of the OTB benchmark
the accuracy and success rate of our algorithm are almost always optimal or suboptimal.
Conclusion
2
Our algorithm combines depth features and handcrafted features
performs multi-peak analysis and motion speed determination on the tracking results
and adaptively updates features for re-tracking when the tracking results are poor. The experimental results show that our algorithm can effectively deal with the interference of complex factors such as illumination variation
background clutter
and occlusion and achieved a good tracking accuracy and success rate
which effectively improves the tracking quality.
Bertinetto L, Valmadre J, Golodetz S, Miksik O and Torr P H S. 2016. Staple: complementary learners for real-time tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1401-1409[ DOI: 10.1109/CVPR.2016.156 http://dx.doi.org/10.1109/CVPR.2016.156 ]
Bolme D S, Beveridge J R, Draper B A and Lui Y M. 2010. Visual object tracking using adaptive correlation filters//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2544-2550[ DOI: 10.1109/CVPR.2010.5539960 http://dx.doi.org/10.1109/CVPR.2010.5539960 ]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). San Diego, USA: IEEE: 886-893[ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]
Danelljan M, Häger G, Khan F S and Felsberg M. 2014. Accurate scale estimation for robust visual tracking//Proceedings of 2014 British Machine Vision Conference. Nottingham, UK: BMVA Press: 65.1-65.11
Danelljan M, Häger G, Khan F S and Felsberg M. 2016a. Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1430-1438[ DOI: 10.1109/CVPR.2016.159 http://dx.doi.org/10.1109/CVPR.2016.159 ]
Danelljan M, Robinson A, Khan F S and Felsberg M. 2016b. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 472-488[ DOI: 10.1007/978-3-319-46454-1_29 http://dx.doi.org/10.1007/978-3-319-46454-1_29 ]
Henriques J F, Caseiro R, Martins P and Batista J. 2012. Exploiting the circulant structure of tracking-by-detection with kernels//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 702-715[ DOI: 10.1007/978-3-642-33765-9_50 http://dx.doi.org/10.1007/978-3-642-33765-9_50 ]
Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 583-596[DOI:10.1109/tpami.2014.2345390]
Li J L, Yin K, Chu C X and Wang H N. 2019. Overview of video target tracking technology. Journal of Yanshan University, 43(3): 251-262
李均利, 尹宽, 储诚曦, 汪鸿年. 2019. 视频目标跟踪技术综述. 燕山大学学报, 43(3): 251-262[DOI:10.3969/j.issn.1007-791X.2019.03.009]
Li Y and Zhu J K. 2014. A scale adaptive kernel correlation filter tracker with feature integration//Proceedings of 2014 Conference on Computer Vision. Zurich, Switzerland: Springer: 254-265[ DOI: 10.1007/978-3-319-16181-5_18 http://dx.doi.org/10.1007/978-3-319-16181-5_18 ]
Lu H C, Li P X and Wang D. 2018. Visual object tracking: a survey. Pattern Recognition and Artificial Intelligence, 31(1): 61-76
卢湖川, 李佩霞, 王栋. 2018. 目标跟踪算法综述. 模式识别与人工智能, 31(1): 61-76 [DOI:10.16451/j.cnki.issn1003-6059.201801006]
Ma C, Huang J B, Yang X K and Yang M H. 2015a. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[ DOI: 10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ]
Ma C, Yang X K, Zhang C Y and Yang M H. 2015b. Long-term correlation tracking//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5388-5396[ DOI: 10.1109/CVPR.2015.7299177 http://dx.doi.org/10.1109/CVPR.2015.7299177 ]
Qi Y, Zhang S P, Qin L, Yao H X, Huang Q M, Lim J and Yang M H. 2016. Hedged deep tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4303-4311[ DOI: 10.1109/CVPR.2016.466 http://dx.doi.org/10.1109/CVPR.2016.466 ]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-03-30]. https//arxiv.org/pdf/1409.1556.pdf
Wang N Y and Yeung D Y. 2013. Learning a deep compact image representation for visual tracking//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: ACM: 809-817
Wu Y, Lim J and Yang M H. 2013. Online object tracking: a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 2411-2418[ DOI: 10.1109/CVPR.2013.312 http://dx.doi.org/10.1109/CVPR.2013.312 ]
Wu Y, Lim J and Yang M H. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1834-1848[DOI:10.1109/TPAMI.2014.2388226]
Yun S, Choi J, Yoo Y, Yun K and Choi J Y. 2017. Action-decision networks for visual tracking with deep reinforcement learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2711-2720[ DOI: 10.1109/CVPR.2017.148 http://dx.doi.org/10.1109/CVPR.2017.148 ]
相关作者
相关机构
京公网安备11010802024621