Current Issue Cover
融合特征的卫星视频车辆单目标跟踪

韩鸣飞1,2, 李盛阳1,2,3, 万雪1,2, 轩诗宇1,2, 赵子飞1,2,3, 谭洪1,2, 张万峰1,2(1.中国科学院空间应用工程与技术中心, 北京 100094;2.中国科学院太空应用重点实验室, 北京 100094;3.中国科学院大学, 北京 100049)

摘 要
目的 卫星视频作为新兴遥感数据,可以提供观测区域高分辨率的空间细节信息与丰富的时序变化信息,为交通监测与特定车辆目标跟踪等应用提供了不同于传统视频视角的信息。相较于传统视频数据,卫星视频中的车辆目标分辨率低、尺度小、包含的信息有限。因此,当目标边界不明、存在部分遮挡或者周边环境表观模糊时,现有的目标跟踪器往往存在严重的目标丢失问题。对此,本文提出一种基于特征融合的卫星视频车辆核相关跟踪方法。方法 对车辆目标使用原始像素和方向梯度直方图(histogram of oriented gradient,HOG)方法提取包含互补判别能力的特征,利用核相关目标跟踪器分别得到具备不变性和判别性的响应图;通过响应图融合的方式结合两种特征的互补信息,得到目标位置;使用响应分布指标(response distribution criterion,RDC)判断当前目标特征的稳定性,决定是否更新跟踪器的表征模型。本文使用的相关滤波方法具有计算量小且运算速度快的特点,具备跟踪多个车辆目标的拓展能力。结果 在8个卫星视频序列上与主流的6种相关滤波跟踪器进行比较,实验数据涵盖光照变化、快速转弯、部分遮挡、阴影干扰、道路颜色变化和相似目标临近等情况,使用准确率曲线和成功率曲线的曲线下面积(area under curve,AUC)对车辆跟踪的精度进行评价。结果表明,本文方法较好地均衡了使用不同特征的基础跟踪器(性能排名第2)的判别能力,准确率曲线AUC提高了2.9%,成功率曲线AUC下降了4.1%,成功跟踪车辆目标,不发生丢失,证明了本文方法的先进性和有效性。结论 本文提出的特征融合的卫星视频车辆核相关跟踪方法,均衡了不同特征提取器的互补信息,较好解决了卫星视频中车辆目标信息不足导致的目标丢失问题,提升了精度。
关键词
Integrating multiple features for tracking vehicles in satellite videos

Han Mingfei1,2, Li Shengyang1,2,3, Wan Xue1,2, Xuan Shiyu1,2, Zhao Zifei1,2,3, Tan Hong1,2, Zhang Wanfeng1,2(1.Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China;2.Key Laboratory of Space Utilization, Chinese Academy of Sciences, Beijing 100094, China;3.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Objective Satellite video is a new type of remote sensing system, which is capable of dynamic video and conventional image capturing. Compared with conventional very-high-resolution (VHR) remote sensing systems, a video satellite observes the Earth with a real-time temporal resolution, which has led to studies in the field of traffic density estimation, object detection, and 3D reconstruction. Satellite video has a strong potential in monitoring traffic, animal migration, and ships entering and leaving ports due to its high temporal resolution. Despite much research in the field of conventional video, relatively minimal work has been performed in object tracking for satellite video. Existing object tracking methods primarily emphasize relatively large objects, such as trains and planes. Several researchers have explored replacing or fusing the motion feature for a more accurate prediction of object position. However, few studies have focused on solving the problem caused by the insufficient amount of information of smaller objects, such as vehicles. Tracking vehicles in satellite video has three main challenges. The main challenge is the small size of the target. While the size of a single frame can be as large as 12 000×4 000 pixels, moving targets, such as cars, can be very small and only occupy 10~30 pixels. The second challenge is the lack of clear texture because the vehicle targets contain limited and/or confusing information. The third challenge is that unlike aircraft and ships, vehicles are more likely to appear in situations where the background is complex, which makes tracking the vehicle more challenging. For instance, a vehicle may make quick turns, appear partially to the vehicle, or be marked by instant changes in illumination. Selecting or constructing a single image feature that can handle all the situations mentioned above is difficult. Using multiple complementary image features is proposed by merging them into a unified framework based on a lightweight kernelized correlation filter to tackle these challenges. Method First, two complementary features with certain invariance and discriminative ability, histogram of gradients (HOG) and raw pixels, are used as descriptors of the target image patch. HOG is tied to edge information of vehicles, such as orientations, offering some discriminative ability. A HOG-based tracker can distinguish targets even when partial occlusion occurs or when illumination or road color changes. However, it would be unable to correctly classify the target from similar shapes in its surroundings, suffering from the problems caused by insufficient information. However, the raw pixel feature describes all contents in the image patch without processing, and more information can be kept without post-processing considering the smaller size of vehicles. It is invariant to the plane motion of a rigid object under low-texture information and to tracking vehicles in terms of orientation changes. However, it fails to track vehicles that are partially occluded or in changes of road color and illumination. A response map merging strategy is proposed to fuse the complementary image features by maintaining two trackers, one using the HOG feature to discriminate the target and the other using the raw pixel feature to improve invariance. In this manner, a peak response may arise at a new position, representing invariance and discriminative ability. Finally, restricted by the insufficient information of the target and the discriminative ability of the observation model, responses usually show a multipeak pattern when a disturbance exists. A response distribution criterion-based model updater is exploited to measure the distribution of merged responses. Using a correlation filter facilitates multiple vehicle tracking due to its calculation speed and online training mechanism. Result Our model is compared with six state-of-the-art correlation filter-based models. Experiments are performed on eight satellite videos captured in different locations worldwide under challenging situations, such as illumination variance, quick turn, partial occlusion, and road color change. Precision plot and success plot are adopted for evaluation. Ablation experiments are performed to demonstrate the efficiency of the method proposed, and quantitative assessments show that our method leads to an effective balance between two trackers. Moreover, visualization results of three videos show how our method achieves a balance between the two trackers. Our method outperforms all the six state-of-the-art methods and achieves a balance between the base trackers. Conclusion In this paper, a new tracker fused with complementary image features for vehicle tracking in satellite videos is proposed. To overcome the difficulties posed by the small size of the target and the lack of texture and complex background in satellite video tracking, combining the use of HOG and raw pixel features is proposed by merging the response maps of the two trackers to increase their discriminative and invariance abilities. Experiments on eight satellite videos under challenging circumstances demonstrate that our method outperforms other state-of-the-art algorithms in precision plots and success plots.
Keywords

订阅号|日报