张量核范数回归的目标跟踪

亚森江·木沙; 木合塔尔·克力木; 赵春霞

doi:10.11834/jig.20160611

图像理解和计算机视觉 | 浏览量 : 0 下载量: 543 CSCD: 0

PDF
导出
分享
收藏
专辑

张量核范数回归的目标跟踪
Robust object tracking via tensor nuclear-norm matrix regression
2016年21卷第6期页码：781-795
网络出版：2016-05-30，

纸质出版：2016
DOI： 10.11834/jig.20160611
稿件说明：

移动端阅览

亚森江·木沙, 木合塔尔·克力木, 赵春霞. 张量核范数回归的目标跟踪[J]. 中国图象图形学报, 2016,21(6):781-795. DOI： 10.11834/jig.20160611.

Yasin Musa, Muhtar Kerim, Zhao Chunxia. Robust object tracking via tensor nuclear-norm matrix regression[J]. Journal of Image and Graphics, 2016, 21(6): 781-795. DOI： 10.11834/jig.20160611.

摘要

视觉目标跟踪中

不同时刻的目标状态是利用在线学习的模板数据线性组合近似表示。由于跟踪中目标受到自身或场景中各种复杂干扰因素的影响

跟踪器的建模能力很大程度地依赖模板数据的概括性及其误差的估计精度。很多现有算法以向量形式表示样本信号

而改变其原始数据结构

使得样本数据各元素之间原有的自然关系受到严重破坏;此外

这种数据表述机制会提高数据的维度

而带来一定的计算复杂度和资源浪费。本文以多线性分析的角度更进一步深入研究视频跟踪中的数据表示及其建模机制

为其提供更加紧凑有效的解决方法。本文跟踪框架中

候选样本及其重构信号以张量形式表示

从而保证其数据的原始结构。跟踪器输出候选样本外观状态时

以张量良好的多线性特性来组织跟踪系统的建模任务

利用张量核范数及L范数正则化其目标函数的相关成分

在多任务状态学习假设下充分挖掘各候选样本外观表示任务的独立性及相互依赖关系。用结构化张量表示的数据原型及其多任务观测模型能够较为有效地解决跟踪系统的数据表示及计算复杂度难题。同时

为候选样本外观模型的多任务联合学习提供更加简便有效的解决途径。这样

当跟踪器遇到破坏性较强的噪声干扰时

其张量核范数约束的误差估计机制在多任务联合学习框架下更加充分挖掘目标全面信息

使其更好地适应内在或外在因素所引起的视觉信息变化。在一些公认测试视频上的实验结果表明

本文算法在候选样本外观模型表示方面表现出更为鲁棒的性能。因而和一些优秀的同类算法相比

本文算法在各测试序列中跟踪到的目标图像块平均中心位置误差和平均重叠率分别达到4.2和0.82

体现出更好的跟踪精度。大量实验验证本文算法的张量核范数回归模型及其误差估计机制能够构造出目标每一时刻状态更接近的最佳样本信号

在多任务学习框架下严格探测每一个候选样本的真实状态信息

从而较好地解决模型退化和跟踪漂移问题。

Abstract

In visual object tracking

the state of the target in every video frame is linearly represented using several online learned templates. The modeling ability of the tracker greatly depends on the generalizability of the template data and its error estimation precision because of the complex interference factors that are caused by the target itself or the scenes. Many existing algorithms have been used to represent the samples in vector form and to change factitiously the original data structure such that the natural relationship between each data pixel of a sample is extremely damaged. In addition

such data expression mechanism may enlarge the data dimensionality that significantly intensifies the computational complexity and wastes much resources. This paper investigates the data representation and observation modeling mechanism of the video tracking framework and provides a more compact and effective solution based on multilinear analysis. In our framework

the candidate samples and their reconstructed signals are expressed in tensor form to maintain the original structure of the data. When the tracker outputs the candidate appearance models

the modeling tasks of the tracking system are organized using the excellent multilinear characteristics of the tensor structures. The objective function is regularized using the tensor nuclear norm and the L norm in order to excavate fully the independences and interdependences of the observation models with a multitask state learning assumption. The structured tensor form used in the data prototypes and observation models can effectively address the data representation problems and computational complexities in the tracking system. This form also provides a more simple and effective solution for the multitask joint learning of the candidate appearance models. When the tracker meets any destructive noise interferences

its tensor nuclear norm constraint mechanism of error estimation in a multitask joint learning framework fully excavates the most comprehensive information of the target

thereby allowing the tracker to adapt to various visual information changes that result from intrinsic or extrinsic factors. The experiment results on several challenging image sequences demonstrate that the proposed method achieves more robust performance in object model representation. Therefore

the average center location error and the average overlap rate of tracked image patches in all image sequences is reached better results (4.2 and 0.82 respectively) compared with several state-of-the-art tracking algorithms. Extensive experiments are performed to validate our algorithm. The tensor nuclear norm regression model and the error estimation mechanism of our algorithm can achieve the most desired candidate states that are greatly similar to actual object states in real time. The tracker strictly detects the true state of each candidate in the multitask learning framework

thereby providing a better solution to the model degradation and drifting problems.