发布时间: 2021-11-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.200094 2021 | Volume 26 | Number 11 遥感图像处理

1. 中国科学院空间应用工程与技术中心, 北京 100094;
2. 中国科学院太空应用重点实验室, 北京 100094;
3. 中国科学院大学, 北京 100049
 收稿日期: 2020-04-03; 修回日期: 2020-12-17; 预印本日期: 2020-12-24 基金项目: 国家自然科学基金项目（41701468，41971329）；遥感信息与图像分析技术国家级重点实验室基金项目（Y8180711WN） 作者简介: 韩鸣飞, 1994年生, 男, 博士研究生, 主要研究方向为视频目标检测和目标跟踪。E-mail: hmf282@gmail.com 李盛阳, 通信作者, 男, 研究员, 主要研究方向为图像/视频智能分析与理解、科学大数据挖掘与分析和航天地面数据系统技术。E-mail: shyli@csu.ac.cn 万雪, 女, 研究员, 主要研究方向为机器视觉、航天航空图像处理和3维场景重建。E-mail: wanxue@csu.ac.cn 轩诗宇, 男, 博士研究生, 主要研究方向为视频目标跟踪、图像智能处理和深度学习应用。E-mail: shiyu_xuan@stu.pku.edu.cn 赵子飞, 男, 博士研究生, 主要研究方向为遥感图像智能处理及应用。E-mail: zhaozifei18@csu.ac.cn 谭洪, 男, 副研究员, 主要研究方向为航天航空遥感图像处理、定标与质量评价。E-mail: tanhong@csu.ac.cn 张万峰, 男, 副研究员, 主要研究方向为航天数据并行处理和高性能计算技术。E-mail: wfzhang@csu.ac.cn *通信作者: 李盛阳  shyli@csu.ac.cn 中图法分类号: P23 文献标识码: A 文章编号: 1006-8961(2021)11-2741-10

# 关键词

Integrating multiple features for tracking vehicles in satellite videos
Han Mingfei1,2, Li Shengyang1,2,3, Wan Xue1,2, Xuan Shiyu1,2, Zhao Zifei1,2,3, Tan Hong1,2, Zhang Wanfeng1,2
1. Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China;
2. Key Laboratory of Space Utilization, Chinese Academy of Sciences, Beijing 100094, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China
Supported by: National Natural Science Foundation of China (41701468, 41971329)

# Abstract

Objective Satellite video is a new type of remote sensing system, which is capable of dynamic video and conventional image capturing. Compared with conventional very-high-resolution (VHR) remote sensing systems, a video satellite observes the Earth with a real-time temporal resolution, which has led to studies in the field of traffic density estimation, object detection, and 3D reconstruction. Satellite video has a strong potential in monitoring traffic, animal migration, and ships entering and leaving ports due to its high temporal resolution. Despite much research in the field of conventional video, relatively minimal work has been performed in object tracking for satellite video. Existing object tracking methods primarily emphasize relatively large objects, such as trains and planes. Several researchers have explored replacing or fusing the motion feature for a more accurate prediction of object position. However, few studies have focused on solving the problem caused by the insufficient amount of information of smaller objects, such as vehicles. Tracking vehicles in satellite video has three main challenges. The main challenge is the small size of the target. While the size of a single frame can be as large as 12 000×4 000 pixels, moving targets, such as cars, can be very small and only occupy 10~30 pixels. The second challenge is the lack of clear texture because the vehicle targets contain limited and/or confusing information. The third challenge is that unlike aircraft and ships, vehicles are more likely to appear in situations where the background is complex, which makes tracking the vehicle more challenging. For instance, a vehicle may make quick turns, appear partially to the vehicle, or be marked by instant changes in illumination. Selecting or constructing a single image feature that can handle all the situations mentioned above is difficult. Using multiple complementary image features is proposed by merging them into a unified framework based on a lightweight kernelized correlation filter to tackle these challenges. Method First, two complementary features with certain invariance and discriminative ability, histogram of gradients (HOG) and raw pixels, are used as descriptors of the target image patch. HOG is tied to edge information of vehicles, such as orientations, offering some discriminative ability. A HOG-based tracker can distinguish targets even when partial occlusion occurs or when illumination or road color changes. However, it would be unable to correctly classify the target from similar shapes in its surroundings, suffering from the problems caused by insufficient information. However, the raw pixel feature describes all contents in the image patch without processing, and more information can be kept without post-processing considering the smaller size of vehicles. It is invariant to the plane motion of a rigid object under low-texture information and to tracking vehicles in terms of orientation changes. However, it fails to track vehicles that are partially occluded or in changes of road color and illumination. A response map merging strategy is proposed to fuse the complementary image features by maintaining two trackers, one using the HOG feature to discriminate the target and the other using the raw pixel feature to improve invariance. In this manner, a peak response may arise at a new position, representing invariance and discriminative ability. Finally, restricted by the insufficient information of the target and the discriminative ability of the observation model, responses usually show a multipeak pattern when a disturbance exists. A response distribution criterion-based model updater is exploited to measure the distribution of merged responses. Using a correlation filter facilitates multiple vehicle tracking due to its calculation speed and online training mechanism. Result Our model is compared with six state-of-the-art correlation filter-based models. Experiments are performed on eight satellite videos captured in different locations worldwide under challenging situations, such as illumination variance, quick turn, partial occlusion, and road color change. Precision plot and success plot are adopted for evaluation. Ablation experiments are performed to demonstrate the efficiency of the method proposed, and quantitative assessments show that our method leads to an effective balance between two trackers. Moreover, visualization results of three videos show how our method achieves a balance between the two trackers. Our method outperforms all the six state-of-the-art methods and achieves a balance between the base trackers. Conclusion In this paper, a new tracker fused with complementary image features for vehicle tracking in satellite videos is proposed. To overcome the difficulties posed by the small size of the target and the lack of texture and complex background in satellite video tracking, combining the use of HOG and raw pixel features is proposed by merging the response maps of the two trackers to increase their discriminative and invariance abilities. Experiments on eight satellite videos under challenging circumstances demonstrate that our method outperforms other state-of-the-art algorithms in precision plots and success plots.

# Key words

object tracking; satellite video; kernelized correlation filter; feature fusion; vehicle tracking

# 1.1 特征提取

HOG是基于局部区域的梯度方向直方图信息，与车辆的边缘信息描述(如车辆方向)密切相关。而可以捕捉目标梯度特征信息的尺度不变特征变换(scale-invariant feature transform，SIFT)特征，由于计算耗时，且对模糊图像提取角点的能力有限，不适用于卫星视频目标跟踪。HOG具有一定的判别性，但存在信息不足问题。例如，在视频中车辆目标突然改变方向后，HOG无法从周围形状与朝向相似的物体中正确区分目标。但是，HOG特征的跟踪器可以在发生部分遮挡、光照变化或道路颜色变化时，正确区分目标与背景干扰。

# 1.2 观测模型

 $\mathop {\min }\limits_a {\left({\boldsymbol{y} - \boldsymbol{Ka}} \right)^{\rm{T}}}\left({\boldsymbol{y} - \boldsymbol{Ka}} \right) + \lambda {\boldsymbol{a}^{\rm{T}}}\boldsymbol{Ka}$ (1)

 $\boldsymbol{F}(\boldsymbol{a})=\frac{\boldsymbol{F}(\boldsymbol{y})}{\boldsymbol{F}\left(\boldsymbol{k}^{x x}\right)+\boldsymbol{\lambda}}$ (2)

 $\boldsymbol{m} = {\boldsymbol{F}^{ - 1}}\left({\boldsymbol{F}\left({{\boldsymbol{k}^{\tilde xz}}} \right) \cdot \boldsymbol{F}\left(\boldsymbol{a} \right)} \right)$ (3)

# 1.3 响应图融合

 $\boldsymbol{m}_{\mathrm{M}}(\boldsymbol{x}, \boldsymbol{y})=\frac{\left(\boldsymbol{m}_{\mathrm{H}}(\boldsymbol{x}, \boldsymbol{y})+\boldsymbol{m}_{\mathrm{G}}(\boldsymbol{x}, \boldsymbol{y})\right)}{2}$ (4)

 $po{s_t} = \max \left({{\boldsymbol{m}_{\rm{M}}}} \right)$ (5)

# 1.4 模型更新

 $\boldsymbol{F}\left(\boldsymbol{a}_{t}\right)= \eta \frac{\boldsymbol{F}(\boldsymbol{y})}{\boldsymbol{F}\left(\boldsymbol{k}^{2 z}\right)+\lambda}+(1-\eta) \boldsymbol{F}\left(\boldsymbol{a}_{t-1}\right)$ (6)

 $\tilde{\boldsymbol{x}}_{t}=\eta \tilde{\boldsymbol{x}}_{t-1}+(1-\eta) \boldsymbol{z}$ (7)

 $RDC = \sqrt {\sum\limits_{i = 1}^t {{{\left({{S_m}(i) - \mu } \right)}^2}} }$ (8)

 $\eta = \left\{ \begin{array}{l} \zeta \;\;\;RDC > r\\ 0\;\;\;\;其他 \end{array} \right.$ (9)

# 2.1 数据集

Table 1 List of data in various situation of Fig. 2

 视频 光照变化 快速转弯 部分遮挡 阴影 路面变色 相似车辆 图 2(a) √ 图 2(b) √ √ √ 图 2(c) √ √ 图 2(d) √ √ 图 2(e) √ √ 图 2(f) √ √ 图 2(g) √ √ √ 图 2(h) √ √ 注：“√”表示包含此类型。

# 2.3 结果与分析

Table 2 Comparison of accuracy among different methods

 方法 AUC/% CLE VOR MoFusion-HOG 91.42 68.38 MoFusion-Raw 63.99 48.21 本文 94.35 64.26 注：加粗字体表示各列最优结果。

# 参考文献

• Bertinetto L, Valmadre J, Golodetz S, Miksik O and Torr P H S. 2016a. Staple: complementary learners for real-time tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1401-1409[DOI: 10.1109/CVPR.2016.156]
• Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016b. Fully-convolutional siamese networks for object tracking//Proceedings of the European Conference on Computer Vision. Amsterdam, the Netherlands: Springer Verlag: 850-865[DOI: 10.1007/978-3-319-48881-3_56]
• Bolme D S, Beveridge J R, Draper B A and Lui Y M. 2010. Visual object tracking using adaptive correlation filters//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2544-2550[DOI: 10.1109/CVPR.2010.5539960]
• Chen T T, Wang R L, Dai B, Liu D X, Song J Z. 2016. Likelihood-field-model-based dynamic vehicle detection and tracking for self-driving. IEEE Transactions on Intelligent Transportation Systems, 17(11): 3142-3158 [DOI:10.1109/tits.2016.2542258]
• Danelljan M, Bhat G, Khan F S and Felsberg M. 2017. ECO: efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6931-6939[DOI: 10.1109/CVPR.2017.733]
• Danelljan M, Robinson A, Khan F S and Felsberg M. 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer Verlag: 472-488[DOI: 10.1007/978-3-319-46454-1_29]
• Danelljan M. 2018. Learning Convolution Operators for Visual Tracking. Linköping: Linköping University Electronic Press: 49-51[EB/OL]. [2020-04-03]. https://www.google.com.au/books/edition/Learning_Convolution_Operators_for_Visua/nSFdDwAAQBAJ?hl=en&gbpv=0&kptab=publisherseries
• Du B, Sun Y J, Cai S H, Wu C, Du Q. 2018. Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame-difference algorithm. IEEE Geoscience and Remote Sensing Letters, 15(2): 168-172 [DOI:10.1109/LGRS.2017.2776899]
• Henriques J F, Caseiro R, Martins P and Batista J. 2012. Exploiting the circulant structure of tracking-by-detection with kernels//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer Verlag: 702-715[DOI: 10.1007/978-3-642-33765-9_50]
• Henriques J F, Caseiro R, Martins P, Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 583-596 [DOI:10.1109/TPAMI.2014.2345390]
• Kalal Z, Mikolajczyk K, Matas J. 2012. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1409-1422 [DOI:10.1109/TPAMI.2011.239]
• Li H C and Man Y Y. 2016. Moving ship detection based on visual saliency for video satellite//Proceedings of 2016 IEEE International Geoscience and Remote Sensing Symposium. Beijing, China: IEEE: 1248-1250[DOI: 10.1109/IGARSS.2016.7729316]
• Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[DOI: 10.1109/ICCV.2015.352]
• Piciarelli C, Micheloni C, Foresti G L. 2008. Trajectory-based anomalous event detection. IEEE Transactions on Circuits and Systems for Video Technology, 18(11): 1544-1554 [DOI:10.1109/TCSVT.2008.2005599]
• Qi Y K, Zhang S P, Qin L, Huang Q M, Yao H X, Lim J, Yang M H. 2019. Hedging deep features for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5): 1116-1130 [DOI:10.1109/TPAMI.2018.2828817]
• Ríos H, Falcón R, González O A, Dzul A. 2019. Continuous sliding-mode control strategies for quadrotor robust tracking: real-time application. IEEE Transactions on Industrial Electronics, 66(2): 1264-1272 [DOI:10.1109/TIE.2018.2831191]
• Shao J, Du B, Wu C, Wu J, Hu R M and Li X L. 2018. VCF: velocity correlation filter, towards space-borne satellite video tracking//Proceedings of 2018 IEEE International Conference on Multimedia and Expo. San Diego, USA: IEEE: 1-6[DOI: 10.1109/ICME.2018.8486451]
• Tao R, Gavves E and Smeulders A W M. 2016. Siamese instance search for tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1420-1429[DOI: 10.1109/CVPR.2016.158]
• Uzkent B, Rangnekar A and Hoffman M J. 2017. Aerial vehicle tracking by adaptive fusion of hyperspectral likelihood maps//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 233-242[DOI: 10.1109/CVPRW.2017.35]
• Valmadre J, Bertinetto L, Henriques J, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5000-5008[DOI: 10.1109/CVPR.2017.531]
• Wan X, Liu J G, Yan H S, Morgan G L K and Sun T. 2016. 3D super resolution scene depth reconstruction based on SkySat video image sequences//2016 IEEE International Geoscience and Remote Sensing Symposium. Beijing, China: IEEE: 6653-6656[DOI: 10.1109/IGARSS.2016.7730737]
• Wang N Y and Yeung D Y. 2014. Ensemble-based tracking: aggregating crowdsourced structured time series data//Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China: JMLR: 2807-2817
• Wu Y, Lim J and Yang M H. 2013. Online object tracking: a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 2411-2418[DOI: 10.1109/CVPR.2013.312]
• Wu Y, Lim J, Yang M H. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1834-1848 [DOI:10.1109/TPAMI.2014.2388226]
• Xuan S Y, Li S Y, Han M F, Wan X, Xia G S. 2020. Object tracking in satellite videos by improved correlation filters with motion estimations. IEEE Transactions on Geoscience and Remote Sensing, 58(2): 1074-1086 [DOI:10.1109/TGRS.2019.2943366]
• Yang T, Wang X W, Yao B W, Li J, Zhang Y N, He Z N, Duan W C. 2016. Small moving vehicle detection in a satellite video of an urban area. Sensors, 16(9): #1528 [DOI:10.3390/s16091528]
• Yao Y J, Wu X H, Zhang L, Shan S G and Zuo W M. 2018. Joint representation and truncated inference learning for correlation filter based tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer Verlag: 560-575[DOI: 10.1007/978-3-030-01240-3_34]
• Zhou J L, Wang R and Ding J W. 2018. Online learning of spatial-temporal convolution response for robust real-time tracking//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 1821-1826[DOI: 10.1109/ICPR.2018.8545048]
• Zhu Z, Wang Q, Li B, Wu W, Yan J J and Hu W M. 2018. Distractor-aware siamese networks for visual object tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer Verlag: 103-119[DOI:10.1007/978-3-030-01240-3_7]