Current Issue Cover
矩阵低秩表示的目标跟踪算法

亚森江·木沙, 木合塔尔·克力木(新疆大学机械工程学院, 乌鲁木齐 830046)

摘 要
目的 目标跟踪中,遮挡、强烈光照及运动模糊等干扰对跟踪精度的影响较大,其为目标外观的观测建模精度带来一定的困难。此外,很多现有算法在观测建模中都以向量形式表示样本数据,使得样本数据原有结构及其各像素的潜在关系被有意改变,从而导致观测模型数据维度及计算复杂度的提高。方法 本文通过深入研究跟踪框架的观测建模问题,提出一种新颖的基于矩阵低秩表示的观测建模方法及其相应的似然度测度函数,使得跟踪算法能够充分挖掘样本数据的潜在特征结构,从而更加精确探测目标在遮挡或强烈光照等各种复杂干扰下的外观变化。同时,以矩阵形式表述样本信号的数据格式,使得其视觉特征的空间分布保留完好,并有效降低数据维度和计算复杂度。结果 本文跟踪算法在富有挑战性干扰因素的跟踪环境中体现出更为鲁棒的跟踪性能,能够较好地解决跟踪中遮挡或强烈光照所引起的模型退化和漂移等问题。在10个经典测试视频中,本文跟踪算法的平均中心点误差为5.29像素,平均跟踪重叠率为78%,平均跟踪成功率为98.28%,均优于其他同类算法。结论 本文以2维矩阵数据原型为载体,提出了一种新的多任务观测建模框架和最大似然度估计模型。实验数据的定性与定量分析结果表明,本文算法与一些优秀的同类算法相比,其跟踪建模精度达到相同甚至更高的水平。
关键词
Object tracking algorithm based on matrix low-rank representation

Yasin Musa, Muhtar Kerim(School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China)

Abstract
Objective Visual object tracking is a significant computer vision task that can be applied to many domains, such as military, robotics, intelligent visual surveillance, human-computer interaction, and medical diagnosis. A large variety of trackers that have been proposed in the literature in the past decades have delivered satisfactory performances. Despite the success of researching on this topic, visual object tracking still suffers from difficulties in handling complex object appearance changes caused by factors such as illumination, partial occlusion, shape deformation, background clutter, low contrast, specularities, camera motion, and at least seven more aspects. Generally, visual tracking is a search (or classification) problem that continuously infers the state of a target in video sequences, aims to identify the candidate while it matches to the target template accurately, and returns it as a tracking result. Constructing an effective and high-performance tracker has two core issues. The first is the issue of representative feature learning and high-level modeling. The second is the problem of filtering and efficient searching. Given that the target states in every video frame are represented using several online learned feature templates, the modeling capability of the tracker will significantly depend on the generalizability of template data and accurate model representation with error estimation precision because of the complex interference factors caused by the target itself or the scene conditions. In addition, the relationship between each data pixel is significantly damaged while its original data structures are being changed because the sample data are intentionally forced into vector form in most existing algorithms. Moreover, the computational complexity with high data dimensionality must be increased. Therefore, designing an effective model representation mechanism of the 2D appearance of moving objects with the appropriate data expression is the key issue for the success of a visual tracker. Method In this study, the appearance model representation problem of generative-model-based visual object tracking algorithm is investigated in depth. In a prior work, we formulated the observation model via tensor (3D array) nuclear norm regularization. The tracker is called tensor nuclear norm regression-based tracker (TNRT) and has achieved favorable results in many tracking environments. However, the TNRT requires high hardware conditions and graphics processing unit computing demands, which will lead to slow tracking speeds if some practical uses require low hardware conditions. Therefore, we redesign a novel matrix low-rank representation-based observation model and its corresponding likelihood measurement function, as well as maintain several good properties of the TNRT algorithm, such as multitask joint learning, nuclear norm regularization-based model representation, and original data structures of sample signals. In the proposed tracking framework, several critical feature templates (dictionary or subspace) are learned from online data using the incremental principal component analysis algorithm. Then, in accordance with the appearance information of an incoming video frame, the proposed appearance modeling mechanism will use the feature templates to represent the target candidate linearly with independent and identically distributed Gaussian-Laplacian mixture noise by adopting the multitask joint learning strategy. Subsequently, the matrix nuclear norm and weighted L1-norm-based joint maximum likelihood function measure the distances between target candidates and feature subspace scrupulously. Given that the intrinsic data structures of samples are guaranteed using the matrix form and the spatial distributions of visual features remain intact, the proposed multitask observation modeling via matrix low-rank regularization-based objective function will construct more accurate and flexible sample signals than L1, L2, or other hybrid regularization-based model representation methods. Then, in every frame, the identical likelihood measurement function of our algorithm measures each candidate sample with obvious comparability. Finally, the tracker is able to explore the potential characteristics of the sample data fully and further detect the complex appearance changes of the target with some challenging disturbances, such as occlusion or strong illuminations. Meanwhile, the observation model, which formulates matrix-form-based data prototypes, can improve the tracking speed remarkably with its distinctly reduced data dimensionality and low computational complexity. Result Although the pixels of residual data always show similar grayscale intensities and share some spatial information with 2D data prototypes, such as block-shaped linking areas, the conventional observation model using L1, L2, or other hybrid regularization-based model representation methods cannot fully examine the potential structure of residual data. In comparison to these traditional methods, the matrix low-rank regression model (MLRM) more precisely explores the residual data and further detects the spatial characteristics of reconstruction error. In other words, the MLRM significantly discovers the low-rank characteristics of the residual matrix. In this study, we aim to evaluate our proposed tracking algorithm systematically and experimentally on 10 public video fragments that cover the previously mentioned challenging noisy factors and compare it with several state-of-the-art algorithms commonly cited in influential literature. We indicate that each tracker can be evaluated objectively using survival curves, such as average center point error (ACE), average overlap rate (AOR), and average success rate (ASR). Our tracking algorithm reflects the favorable robustness in these noisy environments and obtains the best results in each video sequence, with ACE, AOR, and ASR of 5.29 pixels, 78%, and 98.28%, respectively. Conclusion In this study, a novel multitask matrix low-rank model representation method and its corresponding maximum likelihood estimation function are designed. The analysis of a large variety of circumstances in several public video sequences provides objective insight into the strengths and weaknesses of each tracker. The appearance modeling mechanism and maximum likelihood estimation function of the proposed MLRM algorithm play critical roles and achieve favorable tracking results in several challenging video sequences. Qualitative and quantitative experimental evaluations of a number of challenging noisy environments indicate that the proposed MLRM algorithm can reflect the best robustness to elevate the model degradation or drifting problem caused by occlusion and strong illumination and can achieve the same or even better results when compared with several state-of-the-art algorithms.
Keywords

订阅号|日报