摘要：目的 目标跟踪是计算机视觉领域重点研究方向之一，在智能交通、人机交互等方面有着广泛应用。尽管目前基于相关滤波方法由于其高效、鲁棒在该领域取得了显著进展，但特征的选择和表示一直是追踪过程中建立目标外观时的首要考虑因素。为了提高外观模型的鲁棒性，越来越多的跟踪器中引入梯度特征、颜色特征或其他组合特征代替原始灰度单一特征，但是该类方法没有结合特征本身考虑不同特征在模型中所占比重。方法 本文在上述工作的基础上，重点研究特征的选取以及融合方式，通过引入权重向量对特征进行融合设计了基于加权多特征外观模型的追踪器。根据特征的计算方式，本文构造了一项二元一次方程将权重向量的求解转化为确定特征的比例系数，结合特征本身的维度信息，得到方程的有限组整数解集，最后通过实验验证的方式确定最终的比例系数，并将其归一化得到权重向量，进而构建一种新的加权混合特征模型对目标外观建模。结果 采用OTB-100中的100个视频序列，本文算法与其他7种主流算法，包括5种相关滤波类方法，以精确度、平均中心误差、实时性为评价指标进行了对比实验分析。在保证实时性的同时，本文算法在Basketball、DragonBaby、Panda、Lemming等多个数据集上均表现出了更好的追踪结果。在100个视频集上的平均结果与基于多特征融合的尺度自适应跟踪器相比，精确度提高了1.2%。结论 本文基于相关滤波的追踪框架在进行目标的外观描述时引入权重向量，进而提出了加权多特征融合追踪器，使得在复杂动态场景下追踪长度更长，提高了算法的鲁棒性。
Abstract: Objective Visual tracking is one of the important research directions in the field of computer vision and it has a wide range of applications in intelligent transportation, human-computer interaction and other areas. Correlation filter-based trackers (CFTs) have achieved excellent performance due to its efficiency and robustness in tracking field recently. However, due to the influence of lighting, fast motion, background interference, target rotation, scale change, occlusion and other factors, it is still challenging to design a robust tracking algorithm for complex dynamic scenes. In addition, the selection and presentation of features always been the primary considerations for establishing the target appearance model during tracking. In order to improve the robustness of appearance model, more and more trackers introduce gradient feature, color feature, or some other combined features instead of the single gray feature. However, they do not discuss the role of each feature played and relationships in the model with themselves. Method The research on correlation filter theory achieves great improvements, based on these work, the appearance model is used to represent the target and verify the observation. It is the most important part of any tracking algorithm. Moreover, in the representation of appearance, features are the most fundamental and difficult. Therefore, this paper mainly focuses on how to select and combine features. First, gradient feature, color feature and raw pixel were discussed respectively in previous work. As a common descriptor of shape and edge, gradient feature is invariable in translation and light, and performs well in the tracking scene of deformation, light change and partial occlusion. However, when encounter more noise in the background or the target rotation and the target blur, the gradient feature of the target is not obvious, and the description ability of the feature is weakened. While the color of the target and background is usually different, which can be distinguished between them. Based on the above, a new tracking method called weighted multi-feature fusion (WMFF) tracker is proposed via introducing weight vector to fuse multiple feature on appearance model. The model is dominated by gradient features, supplemented by color feature and the original pixels, which can make up the inadequacies of single-gradient feature and give full play to the features of color, making features complementary to each other. In detail, according to the calculation way of each feature, this paper constructs a three-variable linear equation about weights. There is no need to solve for the specific values in equation above, just get their proportional relationship. Take the gradient feature as the criterion, it can transform the solutions of weight vector to determine proportional coefficients of each feature. Therefore, the equation is system of linear equations of two unknowns. Besides, in consideration of the dimension information of feature calculation, the equation has limited integer solution set and the final proportion coefficient is determined by experimental verification on test sequence. Finally, this method normalizes the proportion coefficient as weight vector and builds a new weighted feature-mixing model of target appearance to model. The WMFF tracker adopts a detection based tracking framework, including feature extraction, model construction, filter training, target center detection and model update. Result 100 video sequences from object tracking benchmark (OTB-100) dataset have been adopted in the experiments to compare the performance with the other seven state-of-the-art trackers, including five CFTs. There are 11 different attributes annotated on video sequences, including illumination, occlusion, and scale variation. Comparisons and analysis are performed for these trackers using precision, average center error, average Pascal VOC overlap ratio and median frame per second as evaluation standards. Precision plot and success plot of different datasets are also presented and the performance of different attributes are discussed. Experimental results on benchmark OTB-100 datasets demonstrate that our tracker can achieve both real-time performance and better performance compared with the other methods, especially on Basketball, DragonBaby, Panda and Lemming sequences etc. When the scene is subject to motion blur due to occlusion or deformation, the edge contour even the gradient information of the target are not obvious, resulting in the appearance model constructed by the gradient feature cannot accurately distinguish the target and the background, so it is easy to cause tracking failure. While the WMFF tracker can take advantage of the color feature as a supplement to construct the appearance model in time to obtain a more robust tracking effect when the gradient feature is invalid. It is proved that the color feature is as important as the gradient feature and achieves ideal feature combination effect. The performance of proposed method outperforms the other algorithms on multiple datasets and the average results on OTB-100 datasets show that the precision improves 1.2% compared with a Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration tracker. Conclusion In this paper, the weight vector is introduced to combine features when describing the appearance of the target, and then the weighted multi-feature fusion tracker is proposed based on correlation filter tracking framework. The new hybrid feature HCG is dominated by gradient feature and supplemented by color and gray feature, which can be used to model the target''s appearance. This model can make up for the deficiency of single feature and give full play to the function of each feature. It can not only make features complement each other, but also make the appearance model adapt to multiple complex scenes. The WMFF tracker makes the tracking length longer than other trackers in the complex dynamic scene and improves the robustness of the algorithm.