Current Issue Cover
自适应尺度突变目标跟踪

任俊丽1, 郭浩1, 董亚飞2, 刘茹1, 安居白1, 王妍1(1.大连海事大学信息科学技术学院, 大连 116026;2.北京航空航天大学软件学院, 北京 100191)

摘 要
目的 尺度突变是目标跟踪中一项极具挑战性的任务,短时间内目标的尺度发生突变会导致跟踪要素丢失,使得跟踪误差积累导致跟踪漂移,为了更好地解决这一问题,提出了一种先检测后跟踪的自适应尺度突变的跟踪算法(kernelized correlation filter_you only look once,KCF_YOLO)。方法 在跟踪的训练阶段使用相关滤波跟踪器实现快速跟踪,在检测阶段使用YOLO(you only look once)V3神经网络,并设计了自适应的模板更新策略,采用将检测到的物体的相似度与目标模板的颜色特征和图像指纹特征融合后的相似度进行对比的方法,判断目标是否发生遮挡,据此决定是否在当前帧更新目标模板。结果 为证明本文方法的有效性在OTB(object tracking benchmark)2015数据集中具有尺度突变代表性的11个视频序列上进行试验,试验视频序列目标尺度变化为0.19.2倍,结果表明本文方法平均跟踪精度为0.955,平均跟踪速度为36帧/s,与经典尺度自适应跟踪算法比较,精度平均提高31.74%。结论 本文使用相关滤波和神经网络在目标跟踪过程中先检测后跟踪的思想,提高了算法对目标跟踪过程中尺度突变情况的适应能力,实验结果验证了加入检测策略对后续目标尺度发生突变导致跟踪漂移的情况起到了很好的纠正作用,以及自适应模板更新策略的有效性。
关键词
Adaptive scale sudden change object tracking

Ren Junli1, Guo Hao1, Dong Yafei2, Liu Ru1, An Jubai1, Wang Yan1(1.School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China;2.School of Software, Beihang University, Beijing 100191, China)

Abstract
Objective Video-based object detection and tracking have always been a research topic of high concern in the academic field of computer vision. Video object tracking has important research significance and broad application prospects in intelligent monitoring, human-computer interaction, robot vision navigation, and other aspects. Although the theoretical research of video object tracking technology has made considerable progress and several achievements have entered the practical stage, research on this technology still faces tremendous challenges, such as scale change, illumination change, motion blur, object deformation, and object occlusion, which result in many difficulties in visual tracking, particularly object scale mutation within a short time. It will lead to the loss of tracking elements, and the accumulation of tracking errors will lead to tracking drift. If the object scale is consistent, then considerable scale information will be lost. Thus, scale mutation is a challenging task in object tracking. To solve this problem, this study proposes an adaptive scale mutation tracking algorithm (kernelized correlation filter_you only look once, KCF_YOLO). Method The algorithm uses a correlation filter tracker to realize fast tracking in the training phase of tracking and uses you only look once (YOLO) V3 neural network in the detection phase. An adaptive template updating strategy is also designed. This strategy uses the method of comparing the color features of the detected object with the object template and the similarity of the image fingerprint features after fusion to determine whether occlusion occurs and whether the object template should be updated in the current frame. In the first frame of the video, the object is selected, assuming that the category of the object to be tracked is human. The object area is stored as the object template T. The object is selected, and it enters the training stage. The KCF algorithm is used for tracking. KCF extracts the multichannel history of gradients features of the object template. In the tracking process, 1.5 times of the object template is selected as the object search range of the next frame, considerably reducing the search range. Tracking speed is remarkably improved. When the frame number is a multiple of 20, it enters the detection stage and uses YOLO V3 for object detection. YOLO V3 identifies all the people (P1, P2, P3, …, Pn) in the current frame image. All the people to be identified are compared with the object template stored before 20 frames, and their color and image fingerprint features are extracted and compared with similarity (the similarity selection image fingerprint algorithm and color features are combined). If the similarity is greater than the average similarity of the first 20 frames, then the object template will be updated to the person with the greatest similarity. Simultaneously, the scale of the tracking box will be updated in accordance with the YOLO detection to achieve scale adaptation; otherwise, the object is judged as occluded and the template will not be updated. In the tracking phase, the updated or not updated object template is used as the latest status of the object in the tracking process for subsequent tracking. The preceding steps are repeated every 20 frames until the video and tracking end. The color and phase features are complementary. The image fingerprint feature selects the perceptual Hash(PHash) algorithm. After the discrete cosine transformation, the internal information of the image is mostly concentrated in the low-frequency area, reducing the calculation scope to the low-frequency area and losing color information. The color feature counts the distribution of colors in the entire image. The combination ensures the accuracy of similarity. A total of 11 video sequences representative of scale mutation in the object tracking benchmark(OTB)-2015 dataset are tested to prove the effectiveness of the proposed method. The results show that the average tracking accuracy of this algorithm is 0.955, and the average tracking speed is 36 frames/s. The self-made data of object reproduction are completely occluded for 130 frames. The result shows that tracking accuracy is 0.9, proving the validity of the algorithm that combines kernel correlation filtering and the YOLO V3 network. Compared with the classical scale adaptive tracking algorithm, accuracy is improved by 31.74% on average. Conclusion In this study, we adopt the ideas of correlation filtering and neural network to detect and track targets, improving the adaptability of the algorithm to scale mutation in the object tracking process. The experimental results show that the detection strategy can correct the tracking drift caused by the subsequent scale mutation and ensure the effectiveness of the adaptive template updating strategy. To address the problems of the traditional nuclear correlation filter being unable to deal with a sudden change in object scale within a short time and the slow tracking speed of a neural network, this work establishes a bridge between a correlation filter and a neural network. The tracker that combines a correlation filter and a neural network opens a new way.
Keywords

订阅号|日报