孪生导向锚框RPN网络实时目标跟踪
Target tracking system based on the Siamese guided anchor region proposal network
- 2021年26卷第2期 页码:415-424
收稿:2019-12-16,
修回:2020-4-11,
录用:2020-4-18,
纸质出版:2021-02-16
DOI: 10.11834/jig.190658
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-12-16,
修回:2020-4-11,
录用:2020-4-18,
纸质出版:2021-02-16
移动端阅览
目的
2
区域推荐网络(region proposal network,RPN)与孪生网络(Siamese)相结合进行视频目标跟踪,显示了较高的准确性。然而,孪生RPN网络(Siamese region proposal network,SiamRPN)目标跟踪器依赖于密集的锚框策略,会产生大量冗余的锚框并影响跟踪的精度和速度。为了解决该问题,本文提出了孪生导向锚框RPN网络(Siamese-guided anchor RPN,Siamese GA-RPN)。
方法
2
Siamese GA-RPN的主要思想是利用语义特征来指导锚框生成。其中导向锚框网络包括位置预测模块和形状预测模块,这两个模块分别利用孪生网络中CNN(convolutional neural network)产生的语义特征预测锚框的位置和长宽尺寸,减少了冗余锚框的产生。然后,进一步设计了特征自适应模块,利用每个锚框的形状信息,通过可变卷积层来修正跟踪目标的原始特征图,降低目标特征与锚框信息的不一致性,提高了目标跟踪的准确性。
结果
2
在3个具有挑战性的视频跟踪基准数据集VOT(video object tracking)2015、VOT2016和VOT2017上进行了跟踪实验,测试了算法在目标快速移动、遮挡和光照等复杂场景下的跟踪性能,并与多种优秀算法在准确性和鲁棒性两个评价指标上进行定量比较。在VOT2015数据集上,本文算法与孪生RPN网络相比,准确性提高了1.72%,鲁棒性提高了5.17%;在VOT2016数据集上,本文算法与孪生RPN网络相比,准确性提高了3.6%,鲁棒性提高了6.6%;在VOT2017数据集上进行实时实验,本文算法表现出了较好的实时跟踪效果。
结论
2
通过孪生导向锚框RPN网络提高了锚框生成的有效性,确保了特征与锚框的一致性,实现了对目标的精确定位,较好地解决了锚框尺寸对目标跟踪精度的影响。在目标尺度发生变化、遮挡、光照条件变化和目标快速运动等复杂场景下仍然表现出了较强的鲁棒性和适应性。
Objective
2
After combining the region proposal network (RPN) with the Siamese network for video target tracking
improved target trackers have been consecutively proposed
all of which have demonstrated relatively high accuracy. Through analysis and comparison
we found that the anchor frame strategy of the RPN module of a Siamese RPN (SiamRPN) generates a large number of anchor frames generated through a sliding window. We then calculate the intersection over union (IOU) between anchor frames to generate candidate regions. Subsequently
we determine the position of target through the classifier and optimize the position of the frame regression. Although this method improves the accuracy of target tracking
it does not consider the semantic features of the target image
resulting in inconsistencies between the anchor frame and the features. It also generates a large number of redundant anchor frames
which exert a certain effect on the accuracy of target tracking
leading to a considerable increase in calculation amount.
Method
2
To solve this problem
this study proposes a Siamese guided anchor RPN (Siamese GA-RPN). The primary idea is to use semantic features to guide the anchoring and then convolve with the frame to be detected to obtain the response score figure. Lastly
end-to-end training is achieved on the target tracking network. The guided anchoring network is designed with location and shape prediction branches. The two branches use the semantic features extracted by the convolutional neural network (CNN) in the Siamese network to predict the locations wherein the center of the objects of interest exist and the scales and aspect ratios at different locations
reducing the generation of redundant anchors. Then
a feature adaptive module is designed. This module uses the variable convolution layer to modify the original feature map of the tracking target on the basis of the shape information of the anchor frame at each position
reducing the inconsistency between the features and the anchors and improving target tracking accuracy.
Result
2
Tracking experiments were performed on three challenging video tracking benchmark datasets: VOT(viedo object tracking)2015
VOT2016
and VOT2017. The algorithm's tracking performance was tested on complex scenes
such as fast target movement
occlusion
and lighting. A quantitative comparison was made on two evaluation indexes: accuracy and robustness. On the VOT2015 dataset
the accuracy of the algorithm was improved by 1.72% and robustness was increased by 5.17% compared with those of the twin RPN network. On the VOT2016 dataset
the accuracy of the algorithm was improved by 3.6% compared with that of the twin RPN network. Meanwhile
robustness was improved by 6.6%. Real-time experiments were performed on the VOT2017 dataset
and the algorithm proposed in this study demonstrates good real-time tracking effect. Simultaneously
this algorithm was compared with the full convolutional Siam (Siam-FC) and Siam-RPN on four video sequences: rainy day
underwater
target occlusion
and poor light. The algorithm developed in this study exhibits good performance in the four scenarios in terms of tracking effect.
Conclusion
2
The anchor frame RPN network proposed in this study improves the effectiveness of anchor frame generation
ensures the consistency of features and anchor frames
achieves the accurate positioning of targets
and solves the problem of anchor frame size target tracking accuracy influences. The experimental results on the three video tracking benchmark data sets show better tracking results
which are better than several top-ranking video tracking algorithms with comprehensive performance
and show good real-time performance. And it can still track the target more accurately in complex video scenes such as change in target scale
occlusion
change in lighting conditions
fast target movement
etc.
which shows strong robustness and adaptability.
Bertinetto L, Valmadre J, Golodetz S, Miksik O and Torr P H S. 2016b. Staple: complementary learners for real-time tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1401-1409[ DOI:10.1109/CVPR.2016.156 http://dx.doi.org/10.1109/CVPR.2016.156 ]
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016a. Fully-convolutional Siamese networks for object tracking//Proceedings of European Conference on Computer Vision. Amsterdam, the Vetherlands: Springer: 850-865[ DOI:10.1007/978-3-319-48881-3_56 http://dx.doi.org/10.1007/978-3-319-48881-3_56 ]
Danelljan M, Bhat G, Khan F S and Felsberg M. 2017. ECO: efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6931-6939[ DOI:10.1109/CVPR.2017.733 http://dx.doi.org/10.1109/CVPR.2017.733 ]
Danelljan M, Häger G, Khan F S and Felsberg M. 2015a. Convolutional features for correlation filter based visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 621-629[ DOI:10.1109/ICCVW.2015.84 http://dx.doi.org/10.1109/ICCVW.2015.84 ]
Danelljan M, Häger G, Khan F S and Felsberg M. 2015b. Learning spatially regularized correlation filters for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4310-4318[ DOI:10.1109/ICCV.2015.490 http://dx.doi.org/10.1109/ICCV.2015.490 ]
Danelljan M, Robinson A, Khan F S and Felsberg M. 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 472-488[ DOI:10.1007/978-3-319-46454-1_29 http://dx.doi.org/10.1007/978-3-319-46454-1_29 ]
Fan H and Xiang J. 2017. Robuot Vsual Tacking via Local-Global Correlation Filter//Proceedings of the Association for the Advance of Artificial Intelligence. Menlo Park, USA: AAAI: 4025-4031[ DOl: 10.5555/3298023.3298153 http://dx.doi.org/10.5555/3298023.3298153 ]
Guo Q, Feng W, Zhou C, Huang R, Wan L and Wang S. 2017. Learning dynamic Siamese network for visual object tracking//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1781-1789[ DOI:10.1109/ICCV.2017.196 http://dx.doi.org/10.1109/ICCV.2017.196 ]
Li B, Yan J J, Wu W, Zhu Z and Hu X L. 2018. High performance visual tracking with Siamese region proposal network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8971-8980[ DOI:10.1109/CVPR.2018.00935 http://dx.doi.org/10.1109/CVPR.2018.00935 ]
Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[ DOI:10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ]
Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4293-4302[ DOI:10.1109/CVPR.2016.465 http://dx.doi.org/10.1109/CVPR.2016.465 ]
Song Y B, Ma C, Wu X H, Gong L J, Bao L C, Zuo W M, Shen C H, Lau R W H and Yang M H. 2018. Vital: visual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8990-8999[ DOI:10.1109/CVPR.2018.00937 http://dx.doi.org/10.1109/CVPR.2018.00937 ]
Tao R, Gavves E and Smeulders A W M. 2016. Siamese instance search for tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1420-1429[ DOI:10.1109/CVPR.2016.158 http://dx.doi.org/10.1109/CVPR.2016.158 ]
Valmadre J, Bertinetto L, Henriques J, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5000-5008[ DOI:10.1109/CVPR.2017.531 http://dx.doi.org/10.1109/CVPR.2017.531 ]
Wang J Q, Chen K, Yang S, Loy C C and Lin D H. 2019a. Region proposal by guided anchoring//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2960-2969[ DOI:10.1109/CVPR.2019.00308 http://dx.doi.org/10.1109/CVPR.2019.00308 ]
Wang L J, Ouyang W L, Wang X G and Lu H C. 2015. Visual tracking with fully convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3119-3127[ DOI:10.1109/ICCV.2015.357 http://dx.doi.org/10.1109/ICCV.2015.357 ]
Wang S Y, Zhou H Y and Yang Y. 2017. Kernel correlation adaptive target tracking based on convolution feature. Journal of Image and Graphics, 22(9): 1230-1239
王守义, 周海英, 杨阳. 2017.基于卷积特征的核相关自适应目标跟踪.中国图象图形学报, 22(9): 1230-1239)[DOI: 10.11834/jig.170009]
Wu Y, Lim J and Yang M H. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1834-1848[DOI: 10.1109/TPAMI.2014.2388226]
Zhang M D, Xing J L, Gao J, Shi X C, Wang Q and Hu W M. 2015. Joint scale-spatial correlation tracking with adaptive rotation estimation//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 595-603[ DOI:10.1109/ICCVW.2015.81 http://dx.doi.org/10.1109/ICCVW.2015.81 ]
Zhang Z P and Peng H W. 2019. Deeper and wider Siamese networks for real-time visual tracking//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4586-4595[ DOI:10.1109/CVPR.2019.00472 http://dx.doi.org/10.1109/CVPR.2019.00472 ]
Zhu G, Porikli F and Li H D. 2016. Beyond local search: tracking objects everywhere with instance-specific proposals//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 943-951[ DOI:10.1109/CVPR.2016.108 http://dx.doi.org/10.1109/CVPR.2016.108 ]
Zhu Z, Wang Q, Li B, Wu W, Yan J J and Hu W M. 2018. Distractor-aware Siamese networks for visual object tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 103-119[ DOI:10.1007/978-3-030-01240-3_7 http://dx.doi.org/10.1007/978-3-030-01240-3_7 ]
相关作者
相关机构
京公网安备11010802024621