Robust visual tracking via fast deep learning

Dai Bo; Hou Zhiqiang; Yu Wangsheng; Hu Dan; Fan Shunyi

doi:10.11834/jig.20161211

Views : 0 下载量: 368 CSCD: 2

PDF
Export
Share
Collection
Album

Robust visual tracking via fast deep learning
Vol. 21, Issue 12, Pages: 1662(2016)
Published Online：29 November 2016，

Published：2016
DOI： 10.11834/jig.20161211
稿件说明：

移动端阅览

DOI：

Dai Bo, Hou Zhiqiang, Yu Wangsheng, Hu Dan, Fan Shunyi. Robust visual tracking via fast deep learning[J]. Journal of Image and Graphics, 2016, 21(12): 1662. DOI： 10.11834/jig.20161211.

摘要

基于深度学习的视觉跟踪算法具有跟踪精度高、适应性强的特点，但是，由于其模型参数多、调参复杂，使得算法的时间复杂度过高。为了提升算法的效率，通过构建新的网络结构、降低模型冗余，提出一种快速深度学习的算法。鲁棒特征的提取是视觉跟踪成功的关键。基于深度学习理论，利用海量数据离线训练深度神经网络，分层提取描述图像的特征；针对网络训练时间复杂度高的问题，通过缩小网络规模得以大幅缓解，实现了在GPU驱动下的快速深度学习；在粒子滤波框架下，结合基于支持向量机的打分器的设计，完成对目标的在线跟踪。该方法精简了特征提取网络的结构，降低了模型复杂度，与其他基于深度学习的算法相比，具有较高的时效性。系统的跟踪帧率总体保持在22帧/s左右。实验结果表明，在目标发生平移、旋转和尺度变化，或存在光照、遮挡和复杂背景干扰时，本文算法能够实现比较稳定和相对快速的目标跟踪。但是，对目标的快速移动和运动模糊的鲁棒性不够高，容易受到相似物体的干扰。

Abstract

Deep learning-based trackers can always achieve high tracking precision and strong adaptability in diff-erent scenarios. However

because the number of the parameter is large and finetuning is challenging

the time complexity is high. To improve efficiency

we proposed a tracker based on fast deep learning through construction of a new network with less redundancy. The feature extractor plays the most important role in a visual tracking system. Based on the theory of deep learning

we proposed a deep neural network to describe essential features of images. Fast deep learning can be achieved by restricting the network size. With the help of GPU(graphics processing unit)

the time complexity of the network training is released to a large extent. Under the framework of particle filter

the proposed method combined the deep learning extractor with a support vector machine scoring professor to distinguish the target from the background. The condensed network structure reduced the complexity of the model. Compared with other deep learning-based trackers

the proposed method can achieve higher efficiency. The frame rate is kept at 22 frames per second on average. Experiments on an open tracking benchmark demonstrate that both the robustness and timeliness of the proposed tracker is promising when the appearance of the target changes contains translation

rotation

and scale

or the interference contains illumination

occlusion

and cluttered background. Unfortunately

the tracker is not robust enough when the target moves fast or the motion blur and some similar objects exist.