注意力增强和目标模型更新的红外目标跟踪算法

汲清波; 陈奎丞; 侯长波; 李子琦; 戚宇飞

doi:10.11834/jig.220459

图像分析和识别 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

注意力增强和目标模型更新的红外目标跟踪算法
Infrared target tracking algorithm based on attention mechanism enhancement and target model update
2023年28卷第9期页码：2856-2871
纸质出版日期： 2023-09-16 ，
DOI： 10.11834/jig.220459
稿件说明：

移动端阅览

汲清波，陈奎丞，侯长波，李子琦，戚宇飞. 2023. 注意力增强和目标模型更新的红外目标跟踪算法. 中国图象图形学报， 28(09):2856-2871

Ji Qingbo， Chen Kuicheng， Hou Changbo， Li Ziqi， Qi Yufei. 2023. Infrared target tracking algorithm based on attention mechanism enhancement and target model update. Journal of Image and Graphics， 28(09):2856-2871
汲清波，陈奎丞，侯长波，李子琦，戚宇飞. 2023. 注意力增强和目标模型更新的红外目标跟踪算法. 中国图象图形学报， 28(09):2856-2871 DOI： 10.11834/jig.220459.

Ji Qingbo， Chen Kuicheng， Hou Changbo， Li Ziqi， Qi Yufei. 2023. Infrared target tracking algorithm based on attention mechanism enhancement and target model update. Journal of Image and Graphics， 28(09):2856-2871 DOI： 10.11834/jig.220459.

摘要

目的

多数以深度学习为基础的红外目标跟踪方法在对比度弱、噪声多的红外场景下，缺少对目标细节信息的利用，而且当跟踪场景中有相似目标且背景杂乱时，大部分跟踪器无法对跟踪的目标进行有效的更新，导致长期跟踪时鲁棒性较差。为解决这些问题，提出一种基于注意力和目标模型自适应更新的红外目标跟踪算法。

方法

首先以无锚框算法为基础，加入针对红外跟踪场景设计的快速注意力增强模块以并行处理红外图像，在不损失原信息的前提下提高红外目标与背景的差异性并增强目标的细节信息，然后将提取的特征融合到主干网络的中间层，最后利用目标模型自适应更新网络，学习红外目标的特征变化趋势，同时对目标的中高层特征进行动态更新。

结果

本文方法在4个红外目标跟踪评估基准上与其他先进算法进行了比较，在LSOTB-TIR（large-scale thermal infrared object tracking benchmark）数据集上的精度为79.0%，归一化精度为71.5%，成功率为66.2%，较第2名在精度和成功率上分别高出4.0%和4.6%；在PTB-TIR（thermal infrared pedestrian tracking benchmark）数据集上的精度为85.1%，成功率为66.9%，较第2名分别高出1.3%和3.6%；在VOT-TIR2015（thermal infrared visual object tracking）和VOT-TIR2017数据集上的期望平均重叠与精确度分别为0.344、0.73和0.276、0.71，本文算法在前3个数据集的测评结果均达到最优。同时，在LSOTB-TIR数据集上的消融实验结果显示，本文方法对基线跟踪器有着明显的增益作用。

结论

本文算法提高了对红外目标特征的捕捉能力，解决了红外目标跟踪易受干扰的问题，能够提升红外目标长期跟踪的精度和成功率。

Abstract

Objective

Most target tracking algorithms are designed based on visible sight scenes. However， in some cases， infrared target tracking has advantages that visible light does not have. Infrared equipment uses the radiation of an object itself to image and does not require additional lighting sources. It can display the target in weak light or dark scenes and has a certain penetration ability. However， infrared images have defects， such as unclear boundaries between targets and backgrounds， blurred images， and cluttered backgrounds. Moreover， some infrared dataset images are rough， negatively impacting the training of data-driven-based deep learning algorithms to a certain extent. Infrared tracking algorithms can be divided into traditional methods and deep learning methods. Traditional methods generally take the idea of correlation filtering as the core. Deep learning methods are mainly divided into the method of a neural network providing target features for correlation filters and the method of calculating the similarity of the image area with the framework of the Siamese network. The feature extraction ability of traditional methods for infrared targets is far inferior to that of deep learning methods. Moreover， the filters trained online cannot adapt to fast-moving or blurred targets， resulting in poor tracking accuracy in scenes with complex backgrounds. At present， most deep-learning-based infrared target tracking methods still lack the use of detailed information on infrared targets in infrared scenes with weak contrast and noise. Most trackers cannot effectively update the tracked target when the tracking scene has similar targets and cluttered background. This scenario results in poor robustness in long-term tracking. Therefore， an infrared target tracking algorithm based on attention and template adaptive update is proposed to solve the problems mentioned.

Method

The Siamese network tracking algorithm takes the target in the first frame as the template and performs similarity calculation on the search area of the subsequent frames to obtain the position of the target with the maximum response. The method has a simple structure and high tracking efficiency. However， most algorithms currently use the anchor-based mechanism， and the preset anchor requires tedious manual debugging to adapt to changes in the scale and aspect ratio of the target. The anchor-free design of the Siamese box adaptive network （SiamBAN） avoids the hyperparameters related to the candidate box. These hyperparameters are flexible and general. Therefore， this study is based on the SiamBAN tracking framework. Then， a fast attention enhancement module designed for infrared tracking scenes is added to process infrared images in parallel. This module mainly includes two parts： The first part is the contrast limited adaptive histogram equalization； the second part is the efficient channel attention module. A three-layer convolutional network connects the two parts to form a residual structure. This structure can improve the difference between the infrared target and the background. It can also enhance the detailed information of the target without losing the original information. The extracted features are proportionally fused into the middle layer of the backbone network to achieve rapid utilization. The target adaptive update network is used to learn the feature change trend of the infrared target while dynamically updating the middle- and high-level features of the target. The target adaptive update network uses the target information of the first frame as the initial template. Then， it superimposes the historical accumulation template and the template of the current frame to calculate the best template of the target in the next frame and realize the continuous use of the historical information of the target.

Result

We compare our infrared target tracking algorithm with 10 state-of-the-art trackers on four infrared target tracking evaluation benchmarks， namely， large-scale thermal infrared object tracking benchmark（LSOTB-TIR）， thermal infrared pedestrian tracking benchmark（PTB-TIR）， thermal infrared visual object tracking（VOT-TIR2015）， and VOT-TIR2017. The precision on the LSOTB-TIR dataset is 79.0%. The normalized precision and the success rate of the first-ranked algorithm are 71.5% and 66.2%， which are 4.0% and 4.6% higher than those of the second-ranked algorithm. On the PTB-TIR dataset， the precision and the success rate of the first-ranked algorithm are 85.1% and 66.9%， which are 1.3% and 3.6% higher than those of the second-ranked algorithm. The expected average overlap on the VOT-TIR2015 dataset is 0.344， the accuracy is 0.73， and the results of the same test on the VOT-TIR2017 dataset are 0.276 and 0.71. The evaluation results of the algorithm in the first three datasets have reached the highest ranking. The experimental results of the ablation study on the LSOTB-TIR dataset show that the algorithm has an obvious gain effect on the baseline tracker. Finally， the qualitative analysis of experimental results on the LSOTB-TIR dataset shows that the algorithm in this study has strong robustness in the attribute of background clutter， fast motion， intensity variation， scale variation， occlusion， out-of-view， deformation， low resolution， and motion blur. It also shows that the fast attention enhancement module and the target adaptive update network of this algorithm positively affect the improvement of the tracking success rate.

Conclusion

Our algorithm improves the ability of the backbone to capture the features of the infrared target. It also adaptively adjusts the characteristic state of the target through the historical change information of the target. Thus， the problem that infrared target tracking is susceptible to interference in complex environments is solved， and the precision and success rate of long-term infrared target tracking are improved.

关键词

红外图像目标跟踪孪生网络无锚框高效注意力自适应更新

Keywords

infrared imagetarget trackingsiamese networkanchor-freeefficient attentionadaptive update

references

Asha C S and Narasimhadhan A V. 2017. Robust infrared target tracking using discriminative and generative approaches. Infrared Physics and Technology， 85： 114-127 ［DOI： 10.1016/j.infrared.2017.05.022http://dx.doi.org/10.1016/j.infrared.2017.05.022］

Bertinetto L， Valmadre J， Henriques J F， Vedaldi A and Torr P H S. 2016. Fully-convolutional Siamese networks for object tracking//Proceedings of the 14th European Conference on Computer Vision （ECCV）. Amsterdam， the Netherlands： Springer： 850-865 ［DOI： 10.1007/978-3-319-48881-3_56http://dx.doi.org/10.1007/978-3-319-48881-3_56］

Chen Z D， Zhong B N， Li G R， Zhang S P and Ji R R. 2020. Siamese box adaptive network for visual tracking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 6667-6676 ［DOI： 10.1109/CVPR42600.2020.00670http://dx.doi.org/10.1109/CVPR42600.2020.00670］

Danelljan M， Bhat G， Khan F S and Felsberg M. 2017. ECO： efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 6931-6939 ［DOI： 10.1109/CVPR.2017.733http://dx.doi.org/10.1109/CVPR.2017.733］

Danelljan M， Bhat G， Khan F S and Felsberg M. 2019. ATOM： accurate tracking by overlap maximization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 4655-4664 ［DOI： 10.1109/CVPR.2019.00479http://dx.doi.org/10.1109/CVPR.2019.00479］

Danelljan M， Häger G， Khan F S and Felsberg M. 2015. Learning spatially regularized correlation filters for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 4310-4318 ［DOI： 10.1109/ICCV.2015.490http://dx.doi.org/10.1109/ICCV.2015.490］

Felsberg M， Berg A， Häger G， Ahlberg J， Kristan M， Matas J， Leonardis A， Čehovin L， Fernández G， Vojíř T， Nebehay G， Pflugfelder R， Lukežič A， Garcia-Martin A， Saffari A， Li A， Montero A S， Zhao B J， Schmid C， Chen D P， Du D W， Khan F S， Porikli F， Zhu G， Zhu G B， Lu H Q， Kieritz H， Li H D， Qi H G， Jeong J C， Cho J I， Lee J Y， Zhu J K， Li J T， Feng J Y， Wang J Q， Kim J W， Lang J C， Martinez J M， Xue K， Alahari K， Ma L， Ke L P， Wen L Y， Bertinetto L， Danelljan M， Arens M， Tang M， Chang M C， Miksik O， Torr P H S， Martin-Nieto R， Laganière R， Hare S， Lyu S W， Zhu S C， Becker S， Hicks S L， Golodetz S， Choi S， Wu T F， Hubner W， Zhao X， Hua Y， Li Y， Lu Y， Li Y Z， Yuan Z J and Hong Z B. 2015. The thermal infrared visual object tracking VOT-TIR2015 challenge results//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop （ICCVW）. Santiago， Chile： IEEE： 639-651 ［DOI： 10.1109/ICCVW.2015.86http://dx.doi.org/10.1109/ICCVW.2015.86］

Kristan M， Leonardis A， Matas J， Felsberg M， Pflugfelder R， Zajc L C， Vojíř T， Häger G， Lukežič A， Eldesokey A， Fernández G， García-Martín Á， Muhic A， Petrosino A， Memarmoghadam A， Vedaldi A， Manzanera A， Tran A， Alatan A， Mocanu B， Chen B Y， Huang C， Xu C S， Sun C， Du D L， Zhang D， Du D W， Mishra D， Gundogdu E， Velasco-Salido E， Khan F S， Battistone F， Subrahmanyam G R K S， Bhat G， Huang G， Bastos G， Seetharaman G， Zhang H L， Li H Q， Lu H C， Drummond I， Valmadre J， Jeong J C， Cho J I， Lee J Y， Noskova J， Zhu J K， Gao J， Liu J Y， Kim J W， Henriques J F， Martínez J M， Zhuang J F， Xing J L， Gao J Y， Chen K， Palaniappan K， Lebeda K， Gao K， Kitani K M， Zhang L， Wang L J， Yang L X， Wen L Y， Bertinetto L， Poostchi M， Danelljan M， Mueller M， Zhang M D， Yang M H， Xie N H， Wang N， Miksik O， Moallem P， Venugopal M P， Senna P， Torr P H S， Wang Q， Yu Q F， Huang Q M， Martín-Nieto R， Bowden R， Liu R S， Tapu R， Hadfield S， Lyu S， Golodetz S， Choi S， Zhang T Z， Zaharia T， Santopietro V， Zou W， Hu W M， Tao W B， Li W B， Zhou W G， Yu X G， Bian X， Li Y， Xing Y F， Fan Y R， Zhu Z， Zhang Z P and He Z Q. 2017. The visual object tracking VOT2017 challenge results//Proceedings of 2017 IEEE International Conference on Computer Vision Workshop （ICCVW）. Venice， Italy： IEEE： 1949-1972 ［DOI： 10.1109/ICCVW.2017.230http://dx.doi.org/10.1109/ICCVW.2017.230］

Li B， Wu W， Wang Q， Zhang F Y， Xing J L and Yan J J. 2019a. SiamRPN++： evolution of Siamese visual tracking with very deep networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 4277-4286 ［DOI： 10.1109/CVPR.2019.00441http://dx.doi.org/10.1109/CVPR.2019.00441］

Li B， Yan J J， Wu W， Zhu Z and Hu X L. 2018a. High performance visual tracking with Siamese region proposal network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 8971-8980 ［DOI： 10.1109/CVPR.2018.00935http://dx.doi.org/10.1109/CVPR.2018.00935］

Li C， Yang D D， Song P and Guo C. 2021. Global-aware siamese network for thermal infrared object tracking. Acta Optica Sinica， 41（6）： #0615002

李畅，杨德东，宋鹏，郭畅. 2021. 基于全局感知孪生网络的红外目标跟踪. 光学学报， 41（6）： #0615002 ［DOI： 10.3788/AOS202141.0615002http://dx.doi.org/10.3788/AOS202141.0615002］

Li F， Tian C， Zuo W M， Zhang L and Yang M H. 2018b. Learning spatial-temporal regularized correlation filters for visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 4904-4913 ［DOI： 10.1109/CVPR.2018.00515http://dx.doi.org/10.1109/CVPR.2018.00515］

Li J H， Zhang P， Wang X W and Huang S Z. 2020. Infrared small-target detection algorithms： a survey. Journal of Image and Graphics， 25（9）： 1739-1753

李俊宏，张萍，王晓玮，黄世泽. 2020. 红外弱小目标检测算法综述. 中国图象图形学报， 25（9）： 1739-1753 ［DOI： 10.11834/jig.190574http://dx.doi.org/10.11834/jig.190574］

Li X， Liu Q， Fan N N， He Z Y and Wang H Z. 2019b. Hierarchical spatial-aware Siamese network for thermal infrared object tracking. Knowledge-Based Systems， 166： 71-81 ［DOI： 10.1016/j.knosys.2018.12.011http://dx.doi.org/10.1016/j.knosys.2018.12.011］

Li X， Ma C， Wu B Y， He Z Y and Yang M H. 2019c. Target-aware deep tracking//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 1369-1378 ［DOI： 10.1109/CVPR.2019.00146http://dx.doi.org/10.1109/CVPR.2019.00146］

Li X， Zha Y F， Zhang T Z， Cui Z， Zuo W M， Hou Z Q， Lu H C and Wang H Z. 2019. Survey of visual object tracking algorithms based on deep learning. Journal of Image and Graphics， 24（12）： 2057-2080

李玺，查宇飞，张天柱，崔振，左旺孟，侯志强，卢湖川，王菡子. 2019. 深度学习的目标跟踪算法综述. 中国图象图形学报， 24（12）： 2057-2080 ［DOI： 10.11834/jig.190372http://dx.doi.org/10.11834/jig.190372］

Li X L and Askar H. 2021. Infrared dim-small target tracking algorithm based on local similarity. Laser and Infrared， 51（5）： 668-674

李鑫隆，艾斯卡尔·艾木都拉. 2021. 基于局部相似的红外小目标跟踪算法. 激光与红外， 51（5）： 668-674 ［DOI： 10.3969/j.issn.1001-5078.2021.05.021http://dx.doi.org/10.3969/j.issn.1001-5078.2021.05.021］

Liu Q， He Z Y， Li X and Zheng Y. 2020a. PTB-TIR： a thermal infrared pedestrian tracking benchmark. IEEE Transactions on Multimedia， 22（3）： 666-675 ［DOI： 10.1109/TMM.2019.2932615http://dx.doi.org/10.1109/TMM.2019.2932615］

Liu Q， Li X， He Z Y， Fan N N， Yuan D， Liu W and Liang Y S. 2020b. Multi-task driven feature models for thermal infrared tracking//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York， USA： AAAI： 11604-11611 ［DOI： 10.1609/aaai.v34i07.6828http://dx.doi.org/10.1609/aaai.v34i07.6828］

Liu Q， Li X， He Z Y， Li C L， Li J， Zhou Z K， Yuan D， Li J， Yang K， Fan N N and Zheng F. 2020c. LSOTB-TIR： a large-scale high-diversity thermal infrared object tracking benchmark//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 3847-3856 ［DOI： 10.1145/3394171.3413922http://dx.doi.org/10.1145/3394171.3413922］

Liu Q， Lu X H， He Z Y， Zhang C K and Chen W S. 2017. Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Systems， 134： 189-198 ［DOI： 10.1016/j.knosys.2017.07.032http://dx.doi.org/10.1016/j.knosys.2017.07.032］

Meng X， Kong H， Tang D Q and Lu T. 2019. Multimodal image captioning through combining reinforced cross entropy loss and stochastic deprecation//Proceedings of 2019 IEEE International Conference on Multimedia and Expo （ICME）. Shanghai， China： IEEE： 1318-1323 ［DOI： 10.1109/ICME.2019.00229http://dx.doi.org/10.1109/ICME.2019.00229］

Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 4293-4302 ［DOI： 10.1109/CVPR.2016.465http://dx.doi.org/10.1109/CVPR.2016.465］

Reza A M. 2004. Realization of the contrast limited adaptive histogram equalization （CLAHE） for real-time image enhancement. Journal of VLSI Signal Processing Systems for Signal， Image and Video Technology， 38（1）： 35-44 ［DOI： 10.1023/b：vlsi.0000028532.53893.82http://dx.doi.org/10.1023/b：vlsi.0000028532.53893.82］

Rezatofighi H， Tsoi N， Gwak J， Sadeghian A， Reid I and Savarese S. 2019. Generalized intersection over union： a metric and a loss for bounding box regression//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 658-666 ［DOI： 10.1109/CVPR.2019.00075http://dx.doi.org/10.1109/CVPR.2019.00075］

Song Y B， Ma C， Wu X H， Gong L J， Bao L C， Zuo W M， Shen C H， Lau R W H and Yang M H. 2018. VITAL： visual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 8990-8999 ［DOI： 10.1109/CVPR.2018.00937http://dx.doi.org/10.1109/CVPR.2018.00937］

Wang C Y， Zhang L J， Li X M， Li Y K and Zhang L Y. 2021. Infrared target tracking based on improved kernel correlation filter algorithm. Electronics Optics and Control， 28（7）： 6-10

王承赟，张龙杰，李相民，李彦宽，张龙云. 2021. 基于改进的核相关滤波算法的红外目标跟踪. 电光与控制， 28（7）： 6-10 ［DOI： 10.3969/j.issn.1671-637X.2021.07.002http://dx.doi.org/10.3969/j.issn.1671-637X.2021.07.002］

Wang H H， Liu Y F and Zhu X. 2021. Research on infrared image target tracking based on anti-occlusion ability. Laser Journal， 42（11）： 102-106

王海晖，刘祎凡，朱旭. 2021. 基于抗遮挡能力的红外图像目标跟踪研究. 激光杂志， 42（11）： 102-106 ［DOI： 10.14016/j.cnki.jgzz.2021.11.102http://dx.doi.org/10.14016/j.cnki.jgzz.2021.11.102］

Wang Q L， Wu B G， Zhu P F， Li P H， Zuo W M and Hu Q H. 2020. ECA-Net： efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 11531-11539 ［DOI： 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155］

Wu Y， Lim J and Yang M H. 2013. Online object tracking： a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Portland， USA： IEEE： 2411-2418 ［DOI： 10.1109/CVPR.2013.312http://dx.doi.org/10.1109/CVPR.2013.312］

Zhang L C， Gonzalez-Garcia A， van de Weijer J， Danelljan M and Khan F S. 2019a. Synthetic data generation for end-to-end thermal infrared tracking. IEEE Transactions on Image Processing， 28（4）： 1837-1850 ［DOI： 10.1109/tip.2018.2879249http://dx.doi.org/10.1109/tip.2018.2879249］

Zhang L C， Gonzalez-Garcia A， Van De Weijer J， Danelljan M and Khan F S. 2019b. Learning the model update for Siamese trackers//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 4009-4018 ［DOI： 10.1109/ICCV.2019.00411http://dx.doi.org/10.1109/ICCV.2019.00411］

文章被引用时，请邮件提醒。

提交

孪生导向锚框RPN网络实时目标跟踪

单帧红外图像多尺度小目标检测技术综述

红外与可见光图像特征动态选择的目标检测网络

隐特征监督的孪生网络弱光光流估计

红外—可见光跨模态的行人检测综述