融入时序和速度信息的自适应更新目标跟踪

尹宽; 李均利; 胡凯; 李丽

发布时间： 2021-04-17
摘要点击次数： 1680
全文下载次数： 916
DOI: 10.11834/jig.200100
2021 | Volume 26 | Number 4

融入时序和速度信息的自适应更新目标跟踪

尹宽, 李均利, 胡凯, 李丽(四川师范大学计算机科学学院, 成都 610101)

摘要

目的针对目标跟踪算法在现实场景的遮挡、光照变化和尺度变化等问题，提出一种融入时序信息和速度信息的多特征融合自适应模型更新目标跟踪算法。方法通过提取目标的分级深度特征和手工设计方向梯度直方图（histogram of oriented gradients，HOG）特征，以全深度特征组合和深层深度特征与手工设计特征组合的方式构造两个融合特征器，提高在复杂场景下跟踪的稳健性；对融合特征进行可信度计算，选择最可靠融合特征对当前帧目标进行跟踪；在跟踪质量不可靠时，对目标表征模型进行更新，加入时间上下文信息和当前鲁棒表征信息，通过多峰值判定和运动速度判定选择最优目标预测位置作为最终结果。结果在OTB（object tracking benchmark）2013和OTB2015数据库上进行大量测试，与其他7个算法相比，本文算法总体效果取得最优，且在不同复杂环境下也取得了优秀的跟踪效果，在OTB13和OTB15数据库中，跟踪精度分别为89.3%和83.3%，成功率分别为87%和78.3%。结论本文算法利用深度特征与手工设计特征进行融合，对跟踪结果进行多峰值分析和运动速度判定，跟踪结果不佳时自适应更新特征进行重跟踪。实验结果表明，本文算法可以有效处理光照变化、背景杂波和遮挡等复杂因素的干扰，有效提升了跟踪质量。

关键词

目标跟踪分级深度特征时间上下文信息多峰值判定模型更新

Adaptive update object tracking algorithm incorporating timing and speed information

Yin Kuan, Li Junli, Hu Kai, Li Li(School of Computer Science, Sichuan Normal University, Chengdu 610101, China)

Abstract

Objective Target tracking has become a hot research topic in the field of computer vision. It aims to use the target labeled in the first frame of the video sequence as training data so that the target can be tracked in real time throughout the subsequent video sequence. Target tracking has broad application prospects in the fields of intelligent video surveillance systems, human-computer interaction, and intelligent transportation systems. Thanks to the rapid development of artificial intelligence technology in the current era, target tracking has also been improved. However, video sequences obtained in the real world often have various complex situations. Thus, the motion state of the tracking target in the video sequence becomes more complicated. The current mainstream target tracking algorithms are mainly divided into target tracking algorithms based on deep learning and tracking algorithms based on correlation filter, but none of them can well solve various challenges in target tracking. In order to solve the problem that the target tracking algorithm will encounter occlusion, illumination variation, and scale variation in real scenes, we propose an adaptive model update target tracking algorithm that incorporates multi-feature fusion of time information and speed information to improve the performance of the target tracking algorithm in these situations. Method In the field of computer vision, the recognition of a target is often to construct a model of the target by extracting its features. The current commonly used features include deep features and handcrafted features. Deep features contain more semantic information, and handcrafted features have higher resolution. For deep features, the features extracted from the deeper network layers in the convolutional neural network have more semantic information but have lower resolution, and the features extracted from the shallower network layers have higher resolution but have less semantic information. Semantic information helps to improve the success rate of target tracking, and high resolution helps to improve the accuracy of target tracking. Considering this factor, we need to make a comprehensive selection of features. Our method aims to fuse different types of features in order to make the best combination between the semantic information and resolution of the features, so as to obtain the best tracking accuracy and success rate. Visual Geometry Group (VGG19) is an excellent deep neural network in the field of computer vision. It is used initially for image recognition. Because it has enough network layers, it has a good effect on the representation of objects. Thus, we extract three depth features of the target through the three different network layers of the pre-trained VGG19 convolutional neural network and extract the handcrafted histogram of oriented gradients (HOG) features of the target at the same time. The HOG feature is a feature descriptor used for object detection in computer vision and image processing. It is constructed by calculating and counting the gradient direction histogram of the local area of the image, and it maintains good invariance to image geometric and optical deformations. The three deep features are then fused to obtain a fusion feature of full depth features. Subsequently, two of the depth features are fused with the handcrafted HOG feature to obtain a fusion feature that combines the depth feature and the handcrafted feature. Then, we use these two fusion features to track the target independently, calculate the reliability of the two tracking results, and select the tracking result of the most reliable fusion feature as the final tracking result of the current frame. When the overall tracking results of the current frame cannot meet the requirements, we update the target model in time, add time context information and current robust characterization information to build a new tracking model, and use the updated model to re-track the current frame target. Then, the response map of re-tracking is determined by multi-peak determination and motion speed determination to select the best predicted position as the final result of the current frame. Result A large number of tests have been performed on the object tracking benchmark (OTB) 2013 and OTB2015 benchmarks. Both benchmarks are the mainstream test datasets in the field of target tracking. The OTB2013 benchmark contains 50 video sequences, whereas the OTB2015 benchmark contains 100 video sequences. All video sequences cover 11 different challenges and include color and grayscale video sequences. We selected seven more mainstream tracking algorithms this year for comparison. The experimental results show that our algorithm achieved the best overall effect compared with the comparison algorithm and obtained excellent tracking results in different complex environments. The tracking accuracy and success rate of our algorithm in the OTB2013 benchmark reached 89.3% and 87%, respectively. The accuracy is 4.5% higher than that of the second-ranked long-term correlation tracking (LCT), and the success rate is 8.1% higher than that of the second-ranked spatially regularized discriminative correlation filters (SRDCF). The tracking accuracy and success rate of our algorithm in the OTB2015 benchmark reached 83.3% and 78.3%, respectively. The accuracy is 4.5% higher than that of the second-ranked SRDCF, and the success rate is 5.3% higher than that of the second-ranked SRDCF. Similarly, in the 11 different challenges of the OTB benchmark, the accuracy and success rate of our algorithm are almost always optimal or suboptimal. Conclusion Our algorithm combines depth features and handcrafted features, performs multi-peak analysis and motion speed determination on the tracking results, and adaptively updates features for re-tracking when the tracking results are poor. The experimental results show that our algorithm can effectively deal with the interference of complex factors such as illumination variation, background clutter, and occlusion and achieved a good tracking accuracy and success rate, which effectively improves the tracking quality.

Keywords

object tracking hierarchical deep feature time context information multi-peak determination model updating

在线采编平台

在线出版

年度会议

下载中心

年度信息