汲清波1,2, 陈奎丞1, 侯长波1, 李子琦1, 戚宇飞1(1.哈尔滨工程大学信息与通信工程学院, 哈尔滨 150001;2.哈尔滨工程大学先进船舶通信与信息技术工业和信息化部重点实验室, 哈尔滨 150001)
目的 多数以深度学习为基础的红外目标跟踪方法在对比度弱、噪声多的红外场景下,缺少对目标细节信息的利用,而且当跟踪场景中有相似目标且背景杂乱时,大部分跟踪器无法对跟踪的目标进行有效的更新,导致长期跟踪时鲁棒性较差。为解决这些问题,提出一种基于注意力和目标模型自适应更新的红外目标跟踪算法。方法 首先以无锚框算法为基础,加入针对红外跟踪场景设计的快速注意力增强模块以并行处理红外图像,在不损失原信息的前提下提高红外目标与背景的差异性并增强目标的细节信息,然后将提取的特征融合到主干网络的中间层,最后利用目标模型自适应更新网络,学习红外目标的特征变化趋势,同时对目标的中高层特征进行动态更新。结果 本文方法在 4 个红外目标跟踪评估基准上与其他先进算法进行了比较,在 LSOTB-TIR(large-scale thermalinfrared object tracking benchmark)数据集上的精度为 79.0%,归一化精度为 71.5%,成功率为 66.2%,较第 2 名在精度和成功率上分别高出 4.0%和 4.6%;在 PTB-TIR(thermal infrared pedestrian tracking benchmark)数据集上的精度为85.1%,成功率为 66.9%,较第 2 名分别高出 1.3% 和 3.6%;在 VOT-TIR2015(thermal infrared visual object tracking)和VOT-TIR2017 数据集上的期望平均重叠与精确度分别为 0.344、0.73 和 0.276、0.71,本文算法在前 3 个数据集的测评结果均达到最优。同时,在 LSOTB-TIR 数据集上的消融实验结果显示,本文方法对基线跟踪器有着明显的增益作用。结论 本文算法提高了对红外目标特征的捕捉能力,解决了红外目标跟踪易受干扰的问题,能够提升红外目标长期跟踪的精度和成功率。
Infrared target tracking algorithm based on attention mechanism enhancement and target model update
Ji Qingbo1,2, Chen Kuicheng1, Hou Changbo1, Li Ziqi1, Qi Yufei1(1.College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China;2.Key Laboratory of Advanced Marine Communication and Information Technology, Ministry of Industry and Information Technology, Harbin Engineering University, Harbin 150001, China)
Objective Most target tracking algorithms are designed based on visible sight scenes.However, in some cases, infrared target tracking has advantages that visible light does not have.Infrared equipment uses the radiation of an object itself to image and does not require additional lighting sources.It can display the target in weak light or dark scenes and has a certain penetration ability.However, infrared images have defects, such as unclear boundaries between targets and backgrounds, blurred images, and cluttered backgrounds.Moreover, some infrared dataset images are rough, negatively impacting the training of data-driven-based deep learning algorithms to a certain extent.Infrared tracking algorithms can be divided into traditional methods and deep learning methods.Traditional methods generally take the idea of correlation filtering as the core.Deep learning methods are mainly divided into the method of a neural network providing target features for correlation filters and the method of calculating the similarity of the image area with the framework of the Siamese network.The feature extraction ability of traditional methods for infrared targets is far inferior to that of deep learning methods.Moreover, the filters trained online cannot adapt to fast-moving or blurred targets, resulting in poor tracking accuracy in scenes with complex backgrounds.At present, most deep-learning-based infrared target tracking methods still lack the use of detailed information on infrared targets in infrared scenes with weak contrast and noise.Most trackers cannot effectively update the tracked target when the tracking scene has similar targets and cluttered background.This scenario results in poor robustness in long-term tracking.Therefore, an infrared target tracking algorithm based on attention and template adaptive update is proposed to solve the problems mentioned.Method The Siamese network tracking algorithm takes the target in the first frame as the template and performs similarity calculation on the search area of the subsequent frames to obtain the position of the target with the maximum response.The method has a simple structure and high tracking efficiency.However, most algorithms currently use the anchor-based mechanism, and the preset anchor requires tedious manual debugging to adapt to changes in the scale and aspect ratio of the target.The anchor-free design of the Siamese box adaptive network(SiamBAN)avoids the hyperparameters related to the candidate box.These hyperparameters are flexible and general.Therefore, this study is based on the SiamBAN tracking framework.Then, a fast attention enhancement module designed for infrared tracking scenes is added to process infrared images in parallel.This module mainly includes two parts:The first part is the contrast limited adaptive histogram equalization;the second part is the efficient channel attention module.A three-layer convolutional network connects the two parts to form a residual structure.This structure can improve the difference between the infrared target and the background.It can also enhance the detailed information of the target without losing the original information.The extracted features are proportionally fused into the middle layer of the backbone network to achieve rapid utilization.The target adaptive update network is used to learn the feature change trend of the infrared target while dynamically updating the middle- and high-level features of the target.The target adaptive update network uses the target information of the first frame as the initial template.Then, it superimposes the historical accumulation template and the template of the current frame to calculate the best template of the target in the next frame and realize the continuous use of the historical information of the target.Result We compare our infrared target tracking algorithm with 10 state-of-the-art trackers on four infrared target tracking evaluation benchmarks, namely, large-scale thermal infrared object tracking benchmark(LSOTB-TIR), thermal infrared pedestrian tracking benchmark(PTB-TIR), thermal infrared visual object tracking(VOT-TIR2015), and VOT-TIR2017.The precision on the LSOTB-TIR dataset is 79.0%.The normalized precision and the success rate of the first-ranked algorithm are 71.5% and 66.2%, which are 4.0% and 4.6% higher than those of the second-ranked algorithm.On the PTB-TIR dataset, the precision and the success rate of the first-ranked algorithm are 85.1% and 66.9%, which are 1.3% and 3.6% higher than those of the second-ranked algorithm.The expected average overlap on the VOT-TIR2015 dataset is 0.344, the accuracy is 0.73, and the results of the same test on the VOTTIR2017 dataset are 0.276 and 0.71.The evaluation results of the algorithm in the first three datasets have reached the highest ranking.The experimental results of the ablation study on the LSOTB-TIR dataset show that the algorithm has an obvious gain effect on the baseline tracker.Finally, the qualitative analysis of experimental results on the LSOTB-TIR dataset shows that the algorithm in this study has strong robustness in the attribute of background clutter, fast motion, intensity variation, scale variation, occlusion, out-of-view, deformation, low resolution, and motion blur.It also shows that the fast attention enhancement module and the target adaptive update network of this algorithm positively affect the improvement of the tracking success rate.Conclusion Our algorithm improves the ability of the backbone to capture the features of the infrared target.It also adaptively adjusts the characteristic state of the target through the historical change information of the target.Thus, the problem that infrared target tracking is susceptible to interference in complex environments is solved, and the precision and success rate of long-term infrared target tracking are improved.