自适应尺度突变目标跟踪
Adaptive scale sudden change object tracking
- 2020年25卷第6期 页码:1150-1159
收稿:2019-08-29,
修回:2019-11-15,
录用:2019-11-22,
纸质出版:2020-06-16
DOI: 10.11834/jig.190437
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-08-29,
修回:2019-11-15,
录用:2019-11-22,
纸质出版:2020-06-16
移动端阅览
目的
2
尺度突变是目标跟踪中一项极具挑战性的任务,短时间内目标的尺度发生突变会导致跟踪要素丢失,使得跟踪误差积累导致跟踪漂移,为了更好地解决这一问题,提出了一种先检测后跟踪的自适应尺度突变的跟踪算法(kernelized correlation filter_you only look once,KCF_YOLO)。
方法
2
在跟踪的训练阶段使用相关滤波跟踪器实现快速跟踪,在检测阶段使用YOLO(you only look once)V3神经网络,并设计了自适应的模板更新策略,采用将检测到的物体的相似度与目标模板的颜色特征和图像指纹特征融合后的相似度进行对比的方法,判断目标是否发生遮挡,据此决定是否在当前帧更新目标模板。
结果
2
为证明本文方法的有效性在OTB(object tracking benchmark)2015数据集中具有尺度突变代表性的11个视频序列上进行试验,试验视频序列目标尺度变化为0.19.2倍,结果表明本文方法平均跟踪精度为0.955,平均跟踪速度为36帧/s,与经典尺度自适应跟踪算法比较,精度平均提高31.74%。
结论
2
本文使用相关滤波和神经网络在目标跟踪过程中先检测后跟踪的思想,提高了算法对目标跟踪过程中尺度突变情况的适应能力,实验结果验证了加入检测策略对后续目标尺度发生突变导致跟踪漂移的情况起到了很好的纠正作用,以及自适应模板更新策略的有效性。
Objective
2
Video-based object detection and tracking have always been a research topic of high concern in the academic field of computer vision. Video object tracking has important research significance and broad application prospects in intelligent monitoring
human-computer interaction
robot vision navigation
and other aspects. Although the theoretical research of video object tracking technology has made considerable progress and several achievements have entered the practical stage
research on this technology still faces tremendous challenges
such as scale change
illumination change
motion blur
object deformation
and object occlusion
which result in many difficulties in visual tracking
particularly object scale mutation within a short time. It will lead to the loss of tracking elements
and the accumulation of tracking errors will lead to tracking drift. If the object scale is consistent
then considerable scale information will be lost. Thus
scale mutation is a challenging task in object tracking. To solve this problem
this study proposes an adaptive scale mutation tracking algorithm (kernelized correlation filter_you only look once
KCF_YOLO).
Method
2
The algorithm uses a correlation filter tracker to realize fast tracking in the training phase of tracking and uses you only look once (YOLO) V3 neural network in the detection phase. An adaptive template updating strategy is also designed. This strategy uses the method of comparing the color features of the detected object with the object template and the similarity of the image fingerprint features after fusion to determine whether occlusion occurs and whether the object template should be updated in the current frame. In the first frame of the video
the object is selected
assuming that the category of the object to be tracked is human. The object area is stored as the object template
T
. The object is selected
and it enters the training stage. The KCF algorithm is used for tracking. KCF extracts the multichannel history of gradients features of the object template. In the tracking process
1.5 times of the object template is selected as the object search range of the next frame
considerably reducing the search range. Tracking speed is remarkably improved. When the frame number is a multiple of 20
it enters the detection stage and uses YOLO V3 for object detection. YOLO V3 identifies all the people (
P
1
P
2
P
3
…
P
n
) in the current frame image. All the people to be identified are compared with the object template stored before 20 frames
and their color and image fingerprint features are extracted and compared with similarity (the similarity selection image fingerprint algorithm and color features are combined). If the similarity is greater than the average similarity of the first 20 frames
then the object template will be updated to the person with the greatest similarity. Simultaneously
the scale of the tracking box will be updated in accordance with the YOLO detection to achieve scale adaptation; otherwise
the object is judged as occluded and the template will not be updated. In the tracking phase
the updated or not updated object template is used as the latest status of the object in the tracking process for subsequent tracking. The preceding steps are repeated every 20 frames until the video and tracking end. The color and phase features are complementary. The image fingerprint feature selects the perceptual Hash(PHash) algorithm. After the discrete cosine transformation
the internal information of the image is mostly concentrated in the low-frequency area
reducing the calculation scope to the low-frequency area and losing color information. The color feature counts the distribution of colors in the entire image. The combination ensures the accuracy of similarity. A total of 11 video sequences representative of scale mutation in the object tracking benchmark(OTB)-2015 dataset are tested to prove the effectiveness of the proposed method. The results show that the average tracking accuracy of this algorithm is 0.955
and the average tracking speed is 36 frames/s. The self-made data of object reproduction are completely occluded for 130 frames. The result shows that tracking accuracy is 0.9
proving the validity of the algorithm that combines kernel correlation filtering and the YOLO V3 network. Compared with the classical scale adaptive tracking algorithm
accuracy is improved by 31.74% on average.
Conclusion
2
In this study
we adopt the ideas of correlation filtering and neural network to detect and track targets
improving the adaptability of the algorithm to scale mutation in the object tracking process. The experimental results show that the detection strategy can correct the tracking drift caused by the subsequent scale mutation and ensure the effectiveness of the adaptive template updating strategy. To address the problems of the traditional nuclear correlation filter being unable to deal with a sudden change in object scale within a short time and the slow tracking speed of a neural network
this work establishes a bridge between a correlation filter and a neural network. The tracker that combines a correlation filter and a neural network opens a new way.
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016. Fully-convolutional Siamese networks for object tracking//Proceedings of 2016 European Conferenceon Computer Vision. Amsterdam, The Netherlands: Springer: 850-865[ DOI: 10.1007/978-3-319-48881-3_56 http://dx.doi.org/10.1007/978-3-319-48881-3_56 ]
Bolme D S, Beveridge J R, Draper B A and Lui Y M. 2010. Visual object tracking using adaptive correlation filters//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 2544-2550[ DOI: 10.1109/CVPR.2010.5539960 http://dx.doi.org/10.1109/CVPR.2010.5539960 ]
Choi J W, Chang H J, Yun S, Fischer T, Demiris Y and Choi J Y. 2017. Attentional correlation filter network for adaptive visual tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 4828-4837[ DOI: 10.1109/CVPR.2017.513 http://dx.doi.org/10.1109/CVPR.2017.513 ]
Danelljan M, Häger G, Khan F S and Felsberg M. 2014. Accurate scale estimation for robust visual tracking//Proceedings of British Machine Vision Conference 2014. Nottingham, UK: BMVA Press: 1-11[ DOI: 10.5244/C.28.65 http://dx.doi.org/10.5244/C.28.65 ]
Danelljan M, Häger G, Khan F S and Felsberg M. 2017. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8):1561-1575[DOI:10.1109/TPAMI.2016.2609928]
Henriques J F, Caseiro R, Martins P and Batista J. 2012. Exploiting the circulant structure of tracking-by-detection with kernels//Proceedings of the 12th European conference on Computer Vision. Florence, Italy: Springer: 702-715[ DOI: 10.1007/978-3-642-33765-9_50 http://dx.doi.org/10.1007/978-3-642-33765-9_50 ]
Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583-596[DOI:10.1109/tpami.2014.2345390]
Kalal Z, Mikolajczyk K and Matas J. 2012. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409-1422[DOI:10.1109/TPAMI.2011.239]
Li Y and Zhu J K. 2014. A scale adaptive kernel correlation filter tracker with feature integration//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 254-265[ DOI: 10.1007/978-3-319-16181-5_18 http://dx.doi.org/10.1007/978-3-319-16181-5_18 ]
Nam H, Baek M and Han B. 2016. Modeling and propagating CNNs in a tree structure for visual tracking[EB/OL].[2019-08-20] . https://arxiv.org/pdf/1608.07242.pdf https://arxiv.org/pdf/1608.07242.pdf
Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 4293-4302[ DOI: 10.1109/CVPR.2016.465 http://dx.doi.org/10.1109/CVPR.2016.465 ]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 1-6[ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]
Redmon J and Farhadi A. 2017. Yolo9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6517-6525[ DOI: 10.1109/CVPR.2017.690 http://dx.doi.org/10.1109/CVPR.2017.690 ]
Redmon J and Farhadi A. 2018. Yolov3: an incremental improvement[EB/OL].[2019-08-20] . https://arxiv.org/pdf/1804.02767.pdf https://arxiv.org/pdf/1804.02767.pdf
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031]
Wang L J, Ouyang W L, Wang X G and Lu H C. 2015. Visual tracking with fully convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3119-3127[ DOI: 10.1109/ICCV.2015.357 http://dx.doi.org/10.1109/ICCV.2015.357 ]
相关作者
相关机构
京公网安备11010802024621