矩阵低秩表示的目标跟踪算法

亚森江·木沙; 木合塔尔·克力木

doi:10.11834/jig.170083

图像分析和识别 | 浏览量 : 0 下载量: 5 CSCD: 0

PDF
导出
分享
收藏
专辑

矩阵低秩表示的目标跟踪算法
Object tracking algorithm based on matrix low-rank representation
2018年23卷第5期页码：674-687
收稿：2017-03-22，

修回：2017-11-14，

纸质出版：2018-05-16
DOI： 10.11834/jig.170083
稿件说明：

移动端阅览

亚森江·木沙, 木合塔尔·克力木. 矩阵低秩表示的目标跟踪算法[J]. 中国图象图形学报, 2018,23(5):674-687. DOI： 10.11834/jig.170083.

Musa Yasin, Kerim Muhtar. Object tracking algorithm based on matrix low-rank representation[J]. Journal of Image and Graphics, 2018, 23(5): 674-687. DOI： 10.11834/jig.170083.

摘要

目的

目标跟踪中，遮挡、强烈光照及运动模糊等干扰对跟踪精度的影响较大，其为目标外观的观测建模精度带来一定的困难。此外，很多现有算法在观测建模中都以向量形式表示样本数据，使得样本数据原有结构及其各像素的潜在关系被有意改变，从而导致观测模型数据维度及计算复杂度的提高。

方法

本文通过深入研究跟踪框架的观测建模问题，提出一种新颖的基于矩阵低秩表示的观测建模方法及其相应的似然度测度函数，使得跟踪算法能够充分挖掘样本数据的潜在特征结构，从而更加精确探测目标在遮挡或强烈光照等各种复杂干扰下的外观变化。同时，以矩阵形式表述样本信号的数据格式，使得其视觉特征的空间分布保留完好，并有效降低数据维度和计算复杂度。

结果

本文跟踪算法在富有挑战性干扰因素的跟踪环境中体现出更为鲁棒的跟踪性能，能够较好地解决跟踪中遮挡或强烈光照所引起的模型退化和漂移等问题。在10个经典测试视频中，本文跟踪算法的平均中心点误差为5.29像素，平均跟踪重叠率为78%，平均跟踪成功率为98.28%，均优于其他同类算法。

结论

本文以2维矩阵数据原型为载体，提出了一种新的多任务观测建模框架和最大似然度估计模型。实验数据的定性与定量分析结果表明，本文算法与一些优秀的同类算法相比，其跟踪建模精度达到相同甚至更高的水平。

Abstract

Objective

Visual object tracking is a significant computer vision task that can be applied to many domains

such as military

robotics

intelligent visual surveillance

human-computer interaction

and medical diagnosis. A large variety of trackers that have been proposed in the literature in the past decades have delivered satisfactory performances. Despite the success of researching on this topic

visual object tracking still suffers from difficulties in handling complex object appearance changes caused by factors such as illumination

partial occlusion

shape deformation

background clutter

low contrast

specularities

camera motion

and at least seven more aspects. Generally

visual tracking is a search (or classification) problem that continuously infers the state of a target in video sequences

aims to identify the candidate while it matches to the target template accurately

and returns it as a tracking result. Constructing an effective and high-performance tracker has two core issues. The first is the issue of representative feature learning and high-level modeling. The second is the problem of filtering and efficient searching. Given that the target states in every video frame are represented using several online learned feature templates

the modeling capability of the tracker will significantly depend on the generalizability of template data and accurate model representation with error estimation precision because of the complex interference factors caused by the target itself or the scene conditions. In addition

the relationship between each data pixel is significantly damaged while its original data structures are being changed because the sample data are intentionally forced into vector form in most existing algorithms. Moreover

the computational complexity with high data dimensionality must be increased. Therefore

designing an effective model representation mechanism of the 2D appearance of moving objects with the appropriate data expression is the key issue for the success of a visual tracker.

Method

In this study

the appearance model representation problem of generative-model-based visual object tracking algorithm is investigated in depth. In a prior work

we formulated the observation model via tensor (3D array) nuclear norm regularization. The tracker is called tensor nuclear norm regression-based tracker (TNRT) and has achieved favorable results in many tracking environments. However

the TNRT requires high hardware conditions and graphics processing unit computing demands

which will lead to slow tracking speeds if some practical uses require low hardware conditions. Therefore

we redesign a novel matrix low-rank representation-based observation model and its corresponding likelihood measurement function

as well as maintain several good properties of the TNRT algorithm

such as multitask joint learning

nuclear norm regularization-based model representation

and original data structures of sample signals. In the proposed tracking framework

several critical feature templates (dictionary or subspace) are learned from online data using the incremental principal component analysis algorithm. Then

in accordance with the appearance information of an incoming video frame

the proposed appearance modeling mechanism will use the feature templates to represent the target candidate linearly with independent and identically distributed Gaussian-Laplacian mixture noise by adopting the multitask joint learning strategy. Subsequently

the matrix nuclear norm and weighted

$$ {\rm L}_1$$

-norm-based joint maximum likelihood function measure the distances between target candidates and feature subspace scrupulously. Given that the intrinsic data structures of samples are guaranteed using the matrix form and the spatial distributions of visual features remain intact

the proposed multitask observation modeling via matrix low-rank regularization-based objective function will construct more accurate and flexible sample signals than

$$ {\rm L}_1$$

$$ {\rm L}_2$$

or other hybrid regularization-based model representation methods. Then

in every frame

the identical likelihood measurement function of our algorithm measures each candidate sample with obvious comparability. Finally

the tracker is able to explore the potential characteristics of the sample data fully and further detect the complex appearance changes of the target with some challenging disturbances

such as occlusion or strong illuminations. Meanwhile

the observation model

which formulates matrix-form-based data prototypes

can improve the tracking speed remarkably with its distinctly reduced data dimensionality and low computational complexity.

Result

Although the pixels of residual data always show similar grayscale intensities and share some spatial information with 2D data prototypes

such as block-shaped linking areas

the conventional observation model using

$$ {\rm L}_1$$

$$ {\rm L}_2$$

or other hybrid regularization-based model representation methods cannot fully examine the potential structure of residual data. In comparison to these traditional methods

the matrix low-rank regression model (MLRM) more precisely explores the residual data and further detects the spatial characteristics of reconstruction error. In other words

the MLRM significantly discovers the low-rank characteristics of the residual matrix. In this study

we aim to evaluate our proposed tracking algorithm systematically and experimentally on 10 public video fragments that cover the previously mentioned challenging noisy factors and compare it with several state-of-the-art algorithms commonly cited in influential literature. We indicate that each tracker can be evaluated objectively using survival curves

such as average center point error (ACE)

average overlap rate (AOR)

and average success rate (ASR). Our tracking algorithm reflects the favorable robustness in these noisy environments and obtains the best results in each video sequence

with ACE

AOR

and ASR of 5.29 pixels

78%

and 98.28%

respectively.

Conclusion

In this study

a novel multitask matrix low-rank model representation method and its corresponding maximum likelihood estimation function are designed. The analysis of a large variety of circumstances in several public video sequences provides objective insight into the strengths and weaknesses of each tracker. The appearance modeling mechanism and maximum likelihood estimation function of the proposed MLRM algorithm play critical roles and achieve favorable tracking results in several challenging video sequences. Qualitative and quantitative experimental evaluations of a number of challenging noisy environments indicate that the proposed MLRM algorithm can reflect the best robustness to elevate the model degradation or drifting problem caused by occlusion and strong illumination and can achieve the same or even better results when compared with several state-of-the-art algorithms.

关键词

Keywords

references

Ross D A, Lim J, Lin R S, et al. Incremental learning for robust visual tracking[J]. International Journal of Computer Vision, 2008, 77(1-3):125-141.[DOI:10.1007/s11263-007-0075-7]

Wang D, Lu H C, Yang M H. Least soft-threshold squares tracking[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR: IEEE, 2013: 2371-2378. [ DOI:10.1109/CVPR.2013.307 http://dx.doi.org/10.1109/CVPR.2013.307 ]

Zhong W, Lu H C, Yang M H. Robust object tracking via sparsity-based collaborative model[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: IEEE, 2012: 1838-1845. [ DOI:10.1109/CVPR.2012.6247882 http://dx.doi.org/10.1109/CVPR.2012.6247882 ]

Zhang T Z, Ghanem B, Liu S, et al. Robust visual tracking via multi-task sparse learning[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: IEEE, 2012: 2042-2049. [ DOI:10.1109/CVPR.2012.6247908 http://dx.doi.org/10.1109/CVPR.2012.6247908 ]

Yang M, Zhang L, Yang J, et al. Regularized robust coding for face recognition[J]. IEEE Transactions on Image Processing, 2013, 22(5):1753-1766.[DOI:10.1109/TIP.2012.2235849]

Luo L, Yang J, Qian J J, et al. Nuclear norm regularized sparse coding[C]//Proceedings of the 201422nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014: 1834-1839. [ DOI:10.1109/ICPR.2014.321 http://dx.doi.org/10.1109/ICPR.2014.321 ]

Wang H J, Ge H J, Zhang S Y, et al. Object tracking via online low rank representation[J]. Journal of Xidian University:Natural Science, 2016, 43(5):98-104.

王海军, 葛红娟, 张圣燕等.在线低秩表示的目标跟踪算法[J].西安电子科技大学学报:自然科学版, 2016, 43(5):98-104. [DOI:10.3969/j.issn.1001-2400.2016.05.018]

Chen Y, Wu F, Jing X Y. Online object tracking with robust low-rank sparse representation[J]. Computer Engineering and Design, 2016, 37(4):1062-1066.

陈芸, 吴飞, 荆晓远等.鲁棒低秩稀疏表示的在线目标跟踪[J].计算机工程与设计, 2016, 37(4):1062-1066. [DOI:10.16208/j.issn1000-7024.2016.04.041]

Yasin M, Muhtar K, Zhao C X. Robust object tracking via tensor nuclear-norm matrix regression[J]. Journal of Image and Graphics, 2016, 21(6):781-795.

亚森江·木沙, 木合塔尔·克力木, 赵春霞.张量核范数回归的目标跟踪[J].中国图象图形学报, 2016, 21(6):781-795. [DOI:10.11834/jig.20160611]

Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends in Machine Learning, 2010, 3(1):1-122.[DOI:10.1561/2200000016]

Cai J F, Candès E J, Shen Z W. A singular value thresholding algorithm for matrix completion[J]. SIAM Journal on Optimization, 2010, 20(4):1956-1982.[DOI:10.1137/080738970]

Lin Z C, Chen M M, Ma Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices[J]. arXiv e-print, 2010, arXiv:1009.5055v1.

Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: 2012: 1822-1829. [ DOI:10.1109/CVPR.2012.6247880 http://dx.doi.org/10.1109/CVPR.2012.6247880 ]

Wang D, Lu H C, Yang M H. Online object tracking with sparse prototypes[J]. IEEE Transactions on Image Processing, 2013, 22(1):314-325.[DOI:10.1109/TIP.2012.2202677]