发布时间: 2020-04-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.190320
2020 | Volume 25 | Number 4

图像分析和识别

自适应在线判别外观学习的分层关联多目标跟踪

方岚, 于凤芹

江南大学物联网工程学院, 无锡 214122

收稿日期: 2019-07-03; 修回日期: 2019-09-22; 预印本日期: 2019-09-29

基金项目: 国家自然科学基金项目(61573168);中央高校基本科研业务费专项资金项目(JUSRP51733B)

第一作者简介: 方岚, 1995年生, 女, 硕士研究生, 主要研究方向为信号与图像处理。E-mail:18861820969@163.com;
于凤芹, 女, 教授, 主要研究方向为图像跟踪与识别、语音信号时频分析。E-mail:yufq@jiangnan.edu.cn.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2020)04-0708-13

摘要

目的复杂场景下目标频繁且长时间的遮挡、跟踪目标外观相似引起身份转换等问题给多目标跟踪带来许多挑战。针对多目标跟踪在复杂场景中因长时间遮挡引起身份转换和轨迹分段的问题，提出一种基于自适应在线判别外观学习的分层关联多目标跟踪算法。方法利用轨迹置信度将多目标跟踪分为局部关联和全局关联两个层次。在局部关联中，置信度高的可靠轨迹利用外观、位置-大小相似度与当前帧检测点进行关联；在全局关联中，置信度低的不可靠轨迹引入运动模型和有效关联范围进一步关联分段的轨迹。在提取目标外观特征时引入增量线性可判别分析方法以解决身份转换问题，依据新增样本与目标样本均值的外观特征差异自适应地更新目标外观模型。结果在公开数据集2D MOT2015中的PETS09-S2L1、TUD-Stadmitte、Town-Center 3个数据集中与当前10种多目标跟踪算法进行比较，该方法对各个数据集身份转换和轨迹分段都有减少，其中在Town-Center数据集中，身份转换减少了60个，轨迹分段减少了84个，跟踪准确度提高了5.2%以上。结论本文多目标跟踪方法，能够在复杂场景中稳定有效地实现多目标跟踪，减少轨迹分段现象，其中引入的在线线性可判别外观学习对遮挡产生的身份转换具有良好的解决效果。

关键词

多目标跟踪; 局部关联; 全局关联; 轨迹置信度; 增量线性可判别分析

Multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association

Fang Lan, Yu Fengqin

School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China

Supported by: National Natural Science Foundation of China (61573168)

Abstract

Objective Multi-object tracking is an important research topic in computer vision. Although several previous studies have dealt with varieties of particular problems in multi-object tracking, many challenges are still observed, such as object detection errors, missed detection, frequent and long-term occlusion of objects in complex scenes, and identity switches of tracking objects with similar appearance. All of which are easy to lead to trajectory drift or tracking interruption. With the improvement of object detection, the object tracking method based on detection shows good performance. The key of tracking-by-detection algorithm is the data association between detection points, which mainly consists of two types, namely, frame-by-frame association and multi-frame association. Frame-by-frame data association refers to the association between detection points in the two consecutive frames, which is carried out according to the properties of detection points, such as appearance, location, and size. Tracking drift or failure is likely to occur when object is blocked, misdetected or similar appearance exist due to that the frame-by-frame data association only contains the information of the previous two frames. Multi-frame data association establishes a relational model by using object detection information of multiple frames rather than only previous two frames. This condition can effectively reduce the object error association and deal with occlusion. However, if the occlusion time is longer than the time segment needed for multi-frame data association, the detection points before and after still cannot be successfully associated, and the tracking will also be interrupted. Moreover, this method needs all detection information before tracking, which cannot meet the real-time requirement. Aiming at the problems of ID switches and trajectory fragmentation caused by long-term occlusion, an online multi-object tracking algorithm based on adaptive online discriminative appearance learning and hierarchical association is proposed for multi-object tracking in complex scenes. This process combines the low-level appearance, position-size characteristics used in local association, and high-level motion model established in global association and can meet the real-time tracking requirement. Method In this study, multi-object tracking is divided into two stages according to track confidence:local association and global association. The establishment of the object robust appearance model is the key to local association and global association. An online incremental linear discriminant analysis(ILDA) method was introduced to discriminate the appearances of objects and adaptively update the object appearance models based on the difference value between the new sample and the mean of object samples to address the problem of identity switches. The reliable tracklet with high confidence in the local association stage is associated with the current frame detections by low-level properties of detection points:appearance and position-size similarity, which allows reliable trajectories to grow constantly. The unreliable tracklet with low confidence in the global association stage resulted from long-term occlusion is further associated. In this stage, the candidate object consists of two kinds. One is the detection points that are not associated in local association, and the other one is continuous trajectory with high confidence meeting the time condition. The end time of trajectory is before the current time. When we associate detection points that reappear after long-term occlusion, only appearance similarity is utilized within a validation range without the position-size property due to the unreliable motion dynamics of unreliable objects. At the same time, introducing a valid association range is related to the trajectory confidence. Once the track confidence is reduced, the valid association range is increased because the distance between a drifting track and the corresponding object can grow large if the track drift persists. This condition allows us to reassign drifting tracks to detections of reappearing objects, which is even distant from the corresponding tracks. When two track fragments are associated, a motion model is introduced to determine whether the two trajectories belong to the same object. In this condition, the average velocity vector angle of the two track fragments is larger than a threshold, indicating that it may include unreliable tracks. Thus, we only consider appearance similarity between the pair. Otherwise, we combine the appearance, position size, and motion similarity to make an association between the pair. If two track fragments are associated successfully, the linear interpolation is used to fill the lost interval of this object. Thus, the two trajectory fragments can be connected effectively. Result We compared our method with 10 state-of-the-art multi-object tracking algorithms, including five offline tracking approaches and five online tracking methods on three public datasets, namely, PETS09-S2L1, TUD-Stadmitte, and Town-Center. The quantitative evaluation metrics contained multi-object tracking accuracy (MOTA), multi-object tracking precision (MOTP), the number of identity switches (IDS), the ratio of mostly tracked trajectories (MT), the ratio of mostly lost trajectories (ML), and the number of track fragmentation (Frag). The experiment results illustrate that our tracking method outperforms in MOTA and MOTP compared with selected online multi-object tracking methods, which include two tracking approaches based on hierarchical association. In addition, the proposed approach performs almost the same or even better when compared with offline tracking methods. In the PETS09-S2L1 data set, the proposed approaches are superior to other comparators in MOTP, IDS, and Frag. MOTP increased by 6.1%, IDS reduced by 5, and Frag reduced by 21. In TUD-Stadmitte dataset, IDS reduced by 4. Compared with online tracking approaches, the MOTP and MOTA increased by 36.3% and 11.1%, respectively. In Town-Center dataset, MOTA and MT increased by 5.2% and 16.9%, respectively. IDS and Frag reduced by 60 and 84, respectively, and ML decreased by 1.5%. Conclusion In this study, we take the idea of hierarchical data association, proposing a multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association. The experiment results indicate that our method has a good solution to the problems of ID switches and trajectory fragmentation caused by long-term occlusion in complex scenes.

Key words

multi-object tracking; local association; global association; track confidence; incremental linear discriminant analysis(ILDA)

0 引言

多目标跟踪是计算机视觉领域的一个重要研究内容，在视频监控、运动分析和机器人导航等领域有着重要应用。多目标跟踪目前已经有大量的研究成果，然而仍然面临着很多挑战，如目标检测的误检、漏检、复杂场景下目标频繁和长时间被遮挡、跟踪目标外观相似发生身份转换等，都易导致轨迹漂移或者跟踪中断。

随着目标检测的性能提高，基于检测的目标跟踪方法展现了很好的性能。检测跟踪的研究关键是检测点间的数据关联，目前数据关联主要有逐帧关联和多帧关联两类方法。逐帧关联是利用前后两帧中的检测点进行关联，根据检测点的自身属性如外观、位置、大小等特征，进行相邻帧的数据关联。逐帧关联因只有前后两帧的信息，当发生目标遮挡、误检及存在与目标相似物体时，易发生跟踪漂移或失败。Khan等人(2005)提出了利用马尔可夫链蒙特卡罗(Markov chain Monte Carlo，MCMC)采样步长来代替传统的粒子滤波中的重要采样步长的方法，该粒子滤波器有助于处理复杂场景中的大量目标跟踪问题；Shu等人(2012)基于特定的人体局部支持向量机(support vector machine，SVM)分类器，该分类器能够在动态变化的外观和背景中捕捉人体的关节，能够在检测和跟踪阶段处理部分遮挡；Huang等人(2013)利用保守双阈值连接连续帧之间的检测响应形成初始轨迹，将其考虑为最大后验概率(maximum a posteriori, MAP)问题, 然后再关联高度分段的轨迹，在跟踪精度方面优于以前的先进算法；McLaughlin等人(2013)提出一种双阶段在线跟踪算法框架，在第1阶段中提出一种遮挡鲁棒的外观相似度计算方法，在2阶段中利用在线学习运动模型连接分段的轨迹，该框架能够处理部分遮挡和连接中断的轨迹。

多帧关联是利用目标多帧的检测信息同时建立关系模型，而不仅利用前后两帧的检测信息。这样能够有效减少目标的错误关联和处理遮挡。但是如果遮挡时间比多帧数据关联所需要的时间片段还长，则前后检测点依然无法成功关联，跟踪同样会中断。且该方法跟踪前需要全部检测信息，无法满足在线需求。针对这个问题，Zhang等人(2008)在一个滑动窗口内进行数据关联，实现在线跟踪；Zamir等人(2012)以全局的方式将运动和外观结合起来，将整个时间跨度合并在一起，每次解决一个对象的数据关联问题，同时隐式地合并其他对象；Yang和Nevatia(2012)将多目标跟踪转化为数据最优解求解问题，建立一个全局能量函数，通过迭代使代价函数能量最小化得到目标轨迹，但没有解决目标外观相似且距离相近的身份转换问题；齐美彬等人(2017)将关联问题视为广义最大团图问题，将每个检测点视为图的一个节点，通过不断迭代寻找能量最小的子图解决数据关联问题，但对于遮挡、漏检等情况并未进行分析；张丽娟和周治平(2018)提出基于网络流的分层关联多目标跟踪，将多目标跟踪分成2个阶段，首先将检测器输出检测响应利用双阈值法形成初始轨迹片, 然后将轨迹片间的连接问题转为一个有向无环图的求解, 利用最小费用流法求解进一步的数据关联。

基于上述研究方法，本文吸取分层数据关联的思想，利用轨迹置信度将目标跟踪分成两个层次，分层进行检测—跟踪关联。目标外观模型的建立是局部关联和全局关联的关键，针对身份转换问题，采用在线增量线性可判别分析(incremental linear discriminant analysis，ILDA)外观学习，依据新增样本与目标样本均值的外观特征差异值自适应地更新目标的外观模型。局部关联阶段利用置信度高的可靠轨迹与当前帧检测点进行关联，关联代价矩阵由外观、位置—大小相似度计算，关联成功能够使得可靠轨迹有效增长；全局关联阶段利用低置信度的不可靠轨迹与当前帧局部未关联检测点及满足时间条件的高置信度的连续轨迹进行关联。该阶段用来处理因长时间遮挡产生的轨迹分段问题。

1 自适应在线ILDA外观学习的分层关联多目标跟踪算法原理

1.1 轨迹置信度

轨迹置信度是构造的轨迹与目标真实轨迹间的匹配程度。置信度高的可靠轨迹有以下几个要求：1)较短的轨迹往往是不可靠的，长轨迹更可能是目标的正确轨迹；2)被其他轨迹严重遮挡的轨迹不适合作为可靠的轨迹；3)轨迹与关联的检测之间有高相似度表明该轨迹是可靠的。综上要求，为计算轨迹的置信度，本文考虑了轨迹的外观相似度和观测连续性，定义轨迹$r_i$在$t$帧处的轨迹置信度${conf}\left({r}_{i}^{t}\right)$为近期$T_L$帧中外观和观测连续置信度的平均值, 即

${conf}\left(r_{i}^{t}\right)=\frac{1}{T_{L}} \sum\limits_{k=t-T_{L}+1}^{t} {conf}_{{\rm a}}\left(r_{i}^{k}\right) \cdot {conf}_{{\rm o}}\left(r_{i}^{k}\right)$

(1)

式中，${conf}_{{\rm a}}\left(r_{i}^{k}\right)$、${conf}_{{\rm o}}\left(r_{i}^{k}\right)$分别为轨迹$r_{i}$在k帧处的外观置信度和观测连续置信度。

外观置信度${conf}_{{\rm a}}\left(r_{i}^{k}\right)∈[0, 1]$定义为

${conf}_{\mathrm{a}}\left(r_{i}^{t}\right)=\left[1+\exp \left\{-\beta\left(\psi_{\mathrm{a}}\left(r_{i}^{t}\right)-\tau_{\mathrm{a}}\right)\right\}\right]^{-1}$

(2)

式中，$β$是sigmoid函数${conf}_{\mathrm{a}}$的斜率参数，$\psi_{\mathrm{a}}(·)$是轨迹外观相似度分数，相似度分数越高，轨迹置信度越大，外观相似度分数$\psi_{\mathrm{a}}(·)$计算和阈值$\tau_{\mathrm{a}}$将在1.3节中详细介绍。因此，当一个轨迹被正确地关联或预测($\psi_{\mathrm{a}}≥\tau_{\mathrm{a}}$)，其外观置信度${conf}_{\mathrm{a}}$接近于1，否则，$\psi_{\mathrm{a}}$会由于sigmoid函数的性质快速减小。

观测连续置信度${conf}_{{\rm o}}\left(r_{i}^{t}\right)∈[0, 1]$定义为

${conf}_{\mathrm{o}}\left(r_{i}^{t}\right)=\left[1+\exp \left(\eta_{r_{i}}^{t}-\delta_{\mathrm{o}}\right)\right]^{-1}$

(3)

式中，${\eta}_{r_{i}}^{t}$是到$t$帧为止轨迹$r_{i}$连续丢失观测(检测响应)的帧数，$\delta_{\mathrm{o}}$是短期遮挡的容差。所以，当轨迹$r_{i}$的观测连续丢失超过$\delta_{\mathrm{o}}$帧时，观测连续置信度会迅速减小。

本文轨迹置信度的值在[0, 1]范围内，并将轨迹置信度${conf}\left(r_{i}^{t}\right) \geqslant t h_{\mathrm{conf}}$(本文$t h_{\mathrm{conf}}=0.7$)认为是一个可靠轨迹${r}_{h i}$，否则，被视为一个可能产生漂移或中断的不可靠轨迹${r}_{l i}$。图 1显示了数据集PETS 2009 S2L1中3号目标在遮挡下的置信度变化，其中3号目标因路牌遮挡轨迹置信度降低，但离开遮挡物后(113帧以后)，通过关联置信度逐渐增加。

图 1 3号目标在遮挡下的置信度变化

Fig. 1 Confidence variation of track ID3 under occlusion

1.2 自适应在线ILDA外观学习

外观模型在局部关联和全局关联非常重要，当跟踪目标外观非常相似，例如跟踪足球队的队员或者跟踪车辆时，多个目标间难以区分，甚至当遮挡发生后易出现身份互换的问题。为了更好地区分目标外观，在长期遮挡后能够准确跟踪目标，本文采用在线增量线性可判别分析方法，更好地在局部关联和全局关联中建立目标外观模型。

Pang等人(2005)提出了增量线性判别分析，通过寻找到一个最佳的投影矩阵$\boldsymbol{W}$，把大量的数据经过线性投影后分为多个目标，且不需要提前准备目标的所有样本来训练目标模板，而是在线地用少量的跟踪结果作为样本训练出模板。假设有$k$个目标，每个目标$i (1≤i≤k)$有$n_i$个样本，每个目标的特征向量为$\boldsymbol{f}_{i}^{j}\left(1 \leqslant j \leqslant n_{i}\right)$，则通过最大化分类训练集构造的投影矩阵为

$\hat{\boldsymbol{W}}=\arg \max\limits_{\boldsymbol{W}} \frac{\left|\boldsymbol{W}^{\mathrm{T}} \boldsymbol{S}_{\mathrm{B}} \boldsymbol{W}\right|}{\left|\boldsymbol{W}^{\mathrm{T}} \boldsymbol{S}_{\mathrm{W}} \boldsymbol{W}\right|}$

(4)

式中，$\hat{\boldsymbol{W}}$为最大化分类后的投影矩阵，$\boldsymbol{S}_{\mathrm{B}}$、$\boldsymbol{S}_{\mathrm{W}}$分别表示目标的类间离散度矩阵(between-class scatter matrix)和类内离散度矩阵(within-class scatter matrix)，分别计算为

$\boldsymbol{S}_{\mathrm{B}}=\sum\limits_{i=1}^{k} n_{i}\left(\boldsymbol{m}_{i}-\boldsymbol{\mu}\right)\left(\boldsymbol{m}_{i}-\boldsymbol{\mu}\right)^{\mathrm{T}}$

(5)

$\boldsymbol{S}_{\mathrm{W}}=\sum\limits_{i=1}^{k} \sum\limits_{j=1}^{n_{i}}\left(\boldsymbol{f}_{i}^{j}-\boldsymbol{m}_{i}\right)\left(\boldsymbol{f}_{i}^{j}-\boldsymbol{m}_{i}\right)^{\mathrm{T}}$

(6)

式中，$\boldsymbol{m}_{i}=\frac{1}{n_{i}} \sum\limits_{j=1}^{n_{i}} \boldsymbol{f}_{i}^{j}$为目标$i$的均值向量，$\boldsymbol{\mu}=\frac{1}{n_{i}} \sum\limits_{i=1}^{k} n_{i} \boldsymbol{m}_{i}$为目标总体的均值向量。投影矩阵$\boldsymbol{W}$可由特征向量矩阵$\boldsymbol{S}_{\mathrm{W}}^{-1} \boldsymbol{S}_{\mathrm{B}}$进行特征分解获得其特征向量，将其特征值按照从大到小排列，取前$k-1$个特征向量构成投影矩阵$\boldsymbol{W}$。实验将小特征值对应的特征向量删除，以达到降维和减小噪声的目的，有助于增强各目标的区别度。

考虑跟踪目标外观变化的情况，需要用新的增量样本不断更新投影矩阵$\boldsymbol{W}$，以适应不同的外观变化。假设$P$个目标中有$L$个增量样本$\boldsymbol{Y}=\left\{\boldsymbol{y}_{i}\right\}(i=1, \cdots, L)$，且有$l_i$个属于目标$C_{i}(i=1, \cdots, L)$，由于新目标的引入，用$\boldsymbol{m}_{i}^{y}=\frac{1}{l} \sum\limits_{j=1}^{l_{i}} \boldsymbol{y}_{j}^{(i)}$和$\boldsymbol{\mu}^{y}=\frac{1}{L} \sum\limits_{i=1}^{P} l_{i} \boldsymbol{m}_{i}^{y}$分别代表目标$C_i$新增样本的均值和所有新增样本的均值，$\boldsymbol{y}_{j}^{(i)}$为属于目标$C_i$的新增样本，则新增样本的类间散度矩阵$\boldsymbol{S}_{\mathrm{B}}^{y}=\sum\limits_{i=1}^{P} l_{i}\left(\boldsymbol{y}_{i}-\boldsymbol{\mu}^{y}\right) \times\left(\boldsymbol{y}_{i}-\boldsymbol{\mu}^{y}\right)^{\mathrm{T}}$和类内散度矩阵$\boldsymbol{S}_{W}^{y}=\sum\limits_{i=1}^{P} \sum\limits_{j=1}^{l_{i}}(\boldsymbol{y}_{j}^{(i)}-\boldsymbol{m}_{i}^{y})\left(\boldsymbol{y}_{j}^{(i)}-\boldsymbol{m}_{i}^{y}\right)^{\mathrm{T}}$可计算得到。

为了更节约、有效地更新ILDA模型参数，本文提出一种自适应更新投影矩阵等参数的机制，即只有当前帧关联的检测$z_{j}^{t}$投影后的特征$\boldsymbol{W} \boldsymbol{f}\left(z_{j}^{t}\right)$与目标均值向量$\boldsymbol{m}_{i}$差异大于$\varDelta$时才会更新目标模板，其余则不更新，$\varDelta$为差异阈值。对于总目标集合$\boldsymbol{\varOmega}$可以分为3类：由于引入新样本需要更新的目标集合$\boldsymbol{\varPhi}$、没有引入新样本不需要更新的目标集合$\boldsymbol{\varPsi}$、新引入的目标集合$\boldsymbol{\varGamma}$。假设目标总数从$M$更新到$T(T≥M, T≥P)$，各目标样本总数更新为$N_{i}^{\prime}=N_{i}+l_{i}(i=1, \cdots, T)$，如果目标$C_i∈\boldsymbol{\varPsi}$，则新增样本数$l_i=0$；如果目标$C_i∈\boldsymbol{\varGamma}$，则原始样本数$N_{i}=0$。通过以上假设，对于目标$C_i$的均值$\boldsymbol{m}_{i}^{\prime}$和所有样本的均值$\boldsymbol{\mu}^{\prime}$分别更新为

$\boldsymbol{m}_{i}^{\prime}=\frac{N_{i} \boldsymbol{m}_{i}+l_{i} \boldsymbol{m}_{i}^{y}}{N_{i}^{\prime}}, i=1, \cdots, T$

(7)

$\boldsymbol{\mu}^{\prime}=\frac{N \boldsymbol{\mu}+L \boldsymbol{\mu}^{y}}{N+L}$

(8)

分别将$\boldsymbol{\mu}^{\prime}$和$\boldsymbol{\mu}^{\prime}$代入式(5)和式(6)求得更新后的类间离散度矩阵$\boldsymbol{S}_{\mathrm{B}}^{\prime}=\sum\limits_{i=1}^{T} N_{i}^{\prime}\left(\boldsymbol{m}_{i}^{\prime}-\boldsymbol{\mu}^{\prime}\right)\left(\boldsymbol{m}_{i}^{\prime}-\boldsymbol{\mu}^{\prime}\right)^{\mathrm{T}}$和类内离散度矩阵$\boldsymbol{S}_{\mathrm{W}}^{\prime}=\sum\limits_{i=1}^{T} \sum\limits_{j=1}^{N_{j}^{\prime}}\left(\boldsymbol{x}_{i}^{\prime(j)}-\boldsymbol{m}_{j}^{\prime}\right) \times\left(\boldsymbol{x}_{i}^{\prime(j)}-\boldsymbol{m}_{j}^{\prime}\right)^{\mathrm{T}}$，训练样本集合$\boldsymbol{X}^{\prime}=\left\{\boldsymbol{x}_{i}^{\prime}\right\}_{i=1}^{N+L}=\left\{\boldsymbol{x}_{i}\right\}_{i=1}^{N}+\left\{\boldsymbol{y}_{i}\right\}_{i=1}^{L}$。更新后的投影矩阵$\boldsymbol{W}^{\prime}\boldsymbol{S}_{\mathrm{B}}^{\prime}$由$\boldsymbol{S}_{\mathrm{W}}^{\prime-1} \times$进行特征分解后的特征向量得到。图 2为ILDA的仿真实验结果。图 2(b)中两个目标虽然交换了位置，但是ID保持不变，表明了ILDA减少身份转换的有效性。

图 2 ILDA仿真实验结果

Fig. 2 Simulation experiment results of the effect of ILDA

((a) with ILDA; (b) without ILDA)

1.3 可靠轨迹的局部关联

在局部关联阶段，具有高置信度的可靠轨迹$r_{hi}$顺序地与当前帧$t$中的一组检测$\boldsymbol{Z}_{t}$进行关联，不断增长可靠轨迹，构建出健壮的连续轨迹，在接下来的全局关联中匹配时，可以极大地减少模糊性和计算成本。

在当前帧$t$中，该关联的输入对是$\left(\begin{array}{ll}r_{h i}^{t-1} &, z_{j}^{t}\end{array}\right)$，局部关联阶段的相似度$\varLambda_{1}\left(r_{h i}^{t-1}, z_{j}^{t}\right)$由位置—大小相似度$\varLambda_{\mathrm{ps}}\left(r_{h i}^{t-1}, z_{j}^{t}\right)$和外观相似度$\varLambda_{\mathrm{a}}\left(r_{h i}^{t-1}, z_{j}^{t}\right)$两个部分构成

${\varLambda}_{1}\left({r}_{h i}^{t-1}, z_{j}^{t}\right)={\varLambda}_{\mathrm{ps}}\left({r}_{h i}^{t-1}, z_{j}^{t}\right) {\varLambda}_{\mathrm{a}}\left({r}_{h i}^{t-1}, z_{j}^{t}\right)$

(9)

式中，位置—大小相似度$\varLambda_{\mathrm{ps}}\left(r_{h i}, z_{j}\right)$定义为

$\varLambda_{\mathrm{ps}}\left(r_{h i}, z_{j}\right)=\frac{\boldsymbol{B}\left(r_{h i}\right) \cap \boldsymbol{B}\left(z_{j}\right)}{\boldsymbol{B}\left(r_{h i}\right) \cup \boldsymbol{B}\left(z_{j}\right)}$

(10)

式中，$\boldsymbol{B}(\cdot)=(x, y, w, h)$是一个轨迹或一个检测的边界框，$\boldsymbol{B}\left(r_{h i}\right) \cap \boldsymbol{B}\left(z_{j}\right)、\boldsymbol{B}\left(r_{h i}\right) \cup \boldsymbol{B}\left(z_{j}\right)$分别代表轨迹和检测边界框的交集和并集。

目标外观模型由检测边界框内的外观特征构成，尺寸统一调整为96×32像素，本文用目标的HSV(hue, saturation, value)颜色直方图训练外观特征，用巴氏系数$\rho(\cdot, \cdot)$计算两个模板的相似性，$\boldsymbol{f}_{z_{j}^{t}}$为检测$z_{j}^{t}$的特征向量，$\boldsymbol{m}_{i\left(r_{h i}\right)}$是可靠轨迹$r_{h i}$的均值向量，检测$z_{j}^{t}$与可靠轨迹$r_{h i}$的外观相似度$\varLambda_{\mathrm{a}}\left(r_{h i}, z_{j}\right)$定义为

$\varLambda_{\mathrm{a}}\left(r_{h i}, z_{j}\right)=\left\{\begin{array}{ll}\psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right) & \psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right) \geqslant \tau_{\mathrm{a}} \\ 0 & 其他\end{array}\right.$

(11)

$\psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right)=\rho\left(\boldsymbol{W} \boldsymbol{m}_{i\left(r_{h i}\right)}, \boldsymbol{W} \boldsymbol{f}_{z_{j}^{t}}\right)$

(12)

式中，$\boldsymbol{W}$为1.2节中利用式(4)求得的投影矩阵，$\boldsymbol{W} \boldsymbol{m}_{i\left(r_{h i}\right)}$为可靠轨迹的均值向量$ \boldsymbol{m}_{i\left(r_{h i}\right)}$在投影空间的特征，$\boldsymbol{W} \boldsymbol{f}_{z_{j}^{t}}$为检测结果$z_{j}^{t}$的HSV特征$\boldsymbol{f}_{z_{j}^{t}}$在投影空间投影后的特征。$\tau_{\mathrm{a}}$是通过仅考虑可靠轨迹与检测来防止错误关联的阈值。

当在$t$帧有$h$个可靠轨迹和$m$个检测结果时，关联代价矩阵$\boldsymbol{S}_{h \times m}$表示为

$\begin{aligned} \boldsymbol{S}=\left[s_{i j}\right]_{h \times m, s_{i j}} &=-\ln \left(\boldsymbol{\varLambda}_{l}\left(r_{h i}, z_{j}^{t}\right)\right) \\ z_{j}^{t} & \in \boldsymbol{Z}_{t} \end{aligned}$

(13)

从式(13)中可以看出，两者相似度越大则关联代价越小，设定阈值$-\ln \left(\theta_{1}\right)$，当可靠轨迹$r_{h i}$与检测$z_{j}^{t}$的关联代价$s_{ij}$小于该阈值时，认为是可以关联的，然后通过匈牙利算法(Kuhn，1955)得到二分匹配的最优结果。将局部关联后还未匹配的检测点集合定义为$\boldsymbol{Z}_{u}^{t}$，将局部未关联检测点保存在其中。

1.4 不可靠轨迹的全局关联

在全局关联阶段，容易产生轨迹分段置信度较低的不可靠轨迹$r_{lj}$与其他满足时间条件的高置信度轨迹(低置信度轨迹的结尾帧小于高置信度轨迹的开始帧)及当前帧检测点进行关联。由于关联事件是互斥的，全局阶段关联的检测点是局部关联阶段中未匹配的检测点$y_{j}^{t}, y_{j}^{t} \in \boldsymbol{Z}_{u}^{t}$。假设存在$h$个高置信度、$l$个低置信度轨迹及$n$个检测点，考虑以下3种事件：A——低置信度轨迹${r}_{l i}$与局部关联未匹配检测点$y_{j}^{t}$关联；B——低置信度轨迹${r}_{l i}$与高置信度轨迹$r_{hj}$关联；C——低置信度轨迹${r}_{l i}$是终止轨迹。下面分别对这3种事件讨论关联代价矩阵的计算。

事件A在复杂的情况下，目标被长时间遮挡后，同时可能改变了它们的运动，基于传统的简单运动模型(如匀速运动)的在线跟踪方法容易产生轨迹漂移，所以本文在该种情况下考虑了不可靠轨迹的不可靠运动模型，在全局关联阶段关联不可靠轨迹与局部关联阶段未匹配的检测点$y_{j}^{t} \in \boldsymbol{Z}_{u}^{t}$时，相似度仅考虑有效范围内该输入对的外观项

$\varLambda_{g_{1}}\left(r_{l i}^{t}, y_{j}^{t}\right)=\left\{\begin{array}{ll}\varLambda_{\mathrm{a}}\left(r_{l i}^{t}, \gamma_{j}^{t}\right) & {dist}\left(r_{l i}^{t}, d_{j}^{t}\right) \geqslant \varLambda_{r_{l i}}^{t} \\ 0 & 其他\end{array}\right.$

(14)

式中，${dist}\left(r_{l i}^{t}, d_{j}^{t}\right)$是不可靠轨迹$r_{li}^{t}$与检测$y_{j}^{t}$的距离，$\varLambda_{r_{l i}}^{t}=\alpha \cdot \frac{1}{w_{r_{i}}^{t}} \cdot\left(1-{conf}\left(r_{i}^{t}\right)\right)$是不可靠轨迹$r_{li}^{t}$的有效关联范围，轨迹宽度$w_{ri}^{t}$越小，有效关联范围越大，因为运动目标越靠近摄像机移动比例越大。同时，当轨迹置信度$conf(r_{i}^{t})$降低时，有效关联范围也会增大，由于轨迹漂移一直持续，则漂移轨迹与关联对象之间的距离将会变大。因此，利用式(14)可以将漂移轨迹重新分配给重新出现的远离相应轨迹的检测。事件A关联代价矩阵为$\boldsymbol{A}=\left[a_{i j}\right], a_{i j}=-\ln \left(\varLambda_{g_{1}}\left(r_{l i}, y_{j}^{t}\right)\right), y_{j}^{t} \in \boldsymbol{Z}_{u}^{t}$。

在事件B关联不可靠的轨迹与其他高置信度轨迹时，不可靠轨迹通常由于长时间遮挡可能会发生轨迹碎片化的问题，为连接这些碎片轨迹，相似度$\varLambda_{g_{2}}\left(r_{l i}^{t}, r_{h j}^{t}\right)$由外观、位置—大小和运动3部分构成

$ \begin{array}{c} \varLambda_{g_{2}}\left(r_{l i}^{t}, r_{h j}^{t}\right)=\\ \left\{\begin{array}{ll} \varLambda_{\mathrm{a}}\left(r_{l i}^{t}, r_{h j}^{t}\right) & \cos \left\langle\overline{\boldsymbol{v}}_{r_{l i}}^{t}, \overline{\boldsymbol{v}}_{r_{l j}}^{t}\right\rangle<\delta \\ \varLambda_{\mathrm{ps}}\left(r_{l i}^{t}, r_{h j}^{t}\right) \varLambda_{\mathrm{a}}\left(r_{l i}^{t}, r_{h j}^{t}\right) \varLambda_{\mathrm{m}}\left(r_{l i}^{t}, r_{h j}^{t}\right) & 其他 \end{array}\right. \end{array} $

(15)

式中，$\varLambda_{\mathrm{a}}\left(r_{l i}^{t}, r_{h j}^{t}\right)$是不可靠轨迹$r_{li}^{t}$的尾部$r_{l i}^{\mathrm{T}}$和可靠轨迹$r_{h j}^{t}$的头部$r_{h j}^{\mathrm{H}}$检测点的外观相似度，$\cos \left\langle\overline{\boldsymbol{v}}_{r_{l i}}^{t}\right., \left.\overline{\boldsymbol{v}}_{r_{h j}}^{t}\right\rangle$为两个轨迹平均速度向量夹角余弦值，本文设定阈值$\delta=\sqrt{3} / 2$。当两个轨迹片段的速度向量夹角余弦值小于设定的阈值，表明两个轨迹片段运动特征相差较大，其可能为不同的运动目标所产生的轨迹片段，也可能是同一目标在长时间遮挡后，运动模式发生了改变。相似于事件A中的不可靠轨迹的不可靠运动模型，在此情况下相似度计算只考虑外观项。$\varLambda_{\mathrm{m}}$是轨迹间的运动相似度，轨迹$r_{li}^{t}$和轨迹$r_{hj}^{t}$的运动相似度$\varLambda_{\mathrm{m}}\left(r_{l i}^{t}, r_{h j}^{t}\right)$定义为

$\varLambda_{\mathrm{m}}\left(r_{l i}^{t}, r_{h j}^{t}\right)=\frac{1}{2} \exp \left(\frac{-d}{\sigma}\right)$

(16)

$d=e\left(r_{l i}^{t}, r_{h j}^{t}\right)+e\left(r_{h j}^{t}, r_{l i}^{t}\right)$

(17)

式中，$e\left(r_{l i}^{t}, r_{h j}^{t}\right)=\left\|p_{r_{l i}^{t}}^{\mathrm{T}}+v_{r_{l i}^{t}}^{\mathrm{F}} \varTheta-p_{r_{h j}^{t}}^{\mathrm{H}}\right\|_{2}$为前向误差，$p_{r_{l i}^{t}}^{\mathrm{T}}$为不可靠轨迹尾部的$r_{l i}^{\mathrm{T}}$位置，$v_{r_{l i}}^{\mathrm{F}}$为不可靠轨迹${r}_{l i}^{t}$的前向速度，同理，$e\left(r_{h j}^{t}, r_{l i}^{t}\right)=\left\|p_{r_{h j}^{t}}^{\mathrm{H}}+v_{r_{h j}^{t}}^{\mathrm{B}} \varTheta-p_{r_{l i}^{t}}^{\mathrm{T}}\right\|_{2}$为后向误差，$p_{r_{h j}^{t}}^{\mathrm{H}}$为可靠轨迹头部${r}_{h j}^{\mathrm{H}}$的位置，$v_{r_{h j}^{t}}^{\mathrm{B}}$为可靠轨迹$r_{h j}^{t}$的后向速度，$\varTheta$为时间间隔。$σ=18$。当计算位置—大小相似度$\varLambda_{\mathrm{ps}}\left(r_{l i}^{t}, r_{h j}^{t}\right)$时，通过使用结束帧$t_{e}^{i}$处的位置和平均速度来估计当前丢失轨迹的状态$r_{li}^{t}$。当两个轨迹片段$r_{li}^{t}$，$r_{hj}^{t}$可以关联时，表示为同一轨迹，关联后用线性插值法填补轨迹中丢失的对象。事件B关联代价矩阵为$\boldsymbol{B}=\left[b_{i j}\right]_{i \times h}, b_{i j}=-\ln \left(\varLambda_{g_{2}}\left(r_{l i}, r_{h i}\right)\right)$。

事件C关联代价矩阵$\boldsymbol{C}=\operatorname{diag}\left[c_{1}, \cdots, c_{l}\right], c_{i}=-\ln \left(1-{con} f\left(r_{l i}\right)\right)$。则包含3种事件的关联代价矩阵$\boldsymbol{G}$定义为

$\boldsymbol{G}_{(l+n) \times(h+l)}=\left[\begin{array}{ll}\boldsymbol{B}_{l \times h} & \boldsymbol{C}_{l \times l} \\ -\ln \left({\theta}_{2}\right)_{n \times h} & \boldsymbol{A}_{n \times l}\end{array}\right]$

(18)

式中，阈值$θ_2$和局部关联阶段$θ_1$一致，该阈值同样适用于全局关联利用匈牙利算法找到最优匹配结果。全局关联后仍有未匹配关联的检测点保存在候选新目标集合$\boldsymbol{Z}_{c}^{t}$中，视为一个候选新目标，若该目标此后连续5帧有匹配，则判断为一个新目标，否则删除该候选新目标。

全局关联后将已关联的轨迹位置和速度用已关联的检测$z_{j}^{t}$来更新，目标的大小用过去几帧中已关联的检测平均大小来更新，轨迹置信度由关联的检测$z_{j}^{t}$根据式(1)进行更新。使用基于恒定速度运动模型的卡尔曼滤波器(Kalman，1960)，预测未关联轨迹的下一帧运动状态。假设在很短的相邻帧中目标做匀速直线运动，这种预测在简单的情况下是有效的。

2 算法实现步骤

本文算法步骤如下：

输入：视频帧序列。

输出：多目标跟踪结果。

1) 轨迹预处理。读入视频，对视频序列前5帧进行预处理，形成初始轨迹。

2) 训练ILDA外观模型。用初始轨迹样本训练出初始的ILDA外观模型参数：目标均值向量$\boldsymbol{m}_{i}$、目标总体均值向量$\boldsymbol{\mu}$、类间离散度矩阵$\boldsymbol{S}_{\mathrm{B}}$、类内离散度矩阵$\boldsymbol{S}_{\mathrm{W}}$、投影矩阵$\boldsymbol{W}$。

3)轨迹置信度判断。计算轨迹置信度$conf(r_{i}^{t})$，与阈值$th_{{\rm conf}}=0.7$比较，判断是否大于阈值，如果是，转步骤4)；如果不是，转步骤5)。

4)局部关联。输入可靠轨迹$r_{h i}$与当前帧检测结果$z_{j}^{t}$，进行局部关联。由式(13)计算关联代价矩阵，利用匈牙利算法得到匹配结果，将未关联检测点保存在集合$\boldsymbol{Z}_{u}^{t}$中。

5)全局关联。输入不可靠轨迹、满足时间条件的高置信度轨迹和当前帧局部未关联的检测结果$y_{j}^{t}∈\boldsymbol{Z}_{u}^{t}$，进行全局关联。利用式(18)计算全局关联代价矩阵，再次利用匈牙利算法找到最优匹配结果，将仍未关联的检测点保存在候选新目标集合$\boldsymbol{Z}_{c}^{t}$中。

6)判断是否为结尾帧，如果是则结束，如果不是，转步骤7)。

7)轨迹更新。将已关联的轨迹状态(位置、大小、速度)和轨迹置信度用关联的检测点来更新，使用卡尔曼滤波器预测未关联的轨迹下一帧状态进行更新。

8)ILDA外观模型更新。将最新关联的检测结果投影特征$\boldsymbol{W} \boldsymbol{f}\left(z_{j}^{t}\right)$与目标模板均值$\boldsymbol{m}_{i}$进行比较，若差异大于10%，则用该检测结果对目标模板进行更新，转步骤3)。

本文算法实现流程图如图 3所示。

图 3 本文算法流程图

Fig. 3 Flow chart of the proposed algorithm

3 仿真实验结果与分析

本实验运行环境为Windows10 64位操作系统，Intel Xeon E5 CPU处理器，16 GB内存，MATLAB R2016a仿真系统。在Multiple Object Tracking Benchmark上的公共数据集2D MOT 2015上进行仿真实验，验证本文算法的性能。本文算法在目标检测部分采用聚合通道特征(aggregate channel feature，ACF)算法(Dollár等，2014)进行目标检测。选取该数据集中8段测试视频，分为相机运动拍摄和静止拍摄2种，视角包含高、中、低3种，并且有场景亮度变化、目标非线性运动、遮挡、目标形变等多种情况。

实验中涉及的参数有5个：轨迹置信度阈值$th_{{\rm conf}}$、式(11)中仅靠可靠轨迹与检测来防止错误关联的阈值$\tau_{{a}}$、局部关联代价矩阵相似度阈值$-\ln \left(\theta_{1}\right)$、全局关联代价矩阵相似度阈值$-\ln \left(\theta_{2}\right)$、ILDA模型更新的差异度阈值$\varDelta$，分别设置如下：$th_{{\rm conf}}=0.7$、$\tau_{{a}}$=0.5、$θ_1=θ_2=0.7、\varDelta=10\%$。所有参数均通过实验得到，并且对于所有数据集保持不变。

3.1 定性分析

本文跟踪算法分为局部关联和全局关联两个层次，第1层局部关联连接可靠轨迹和当前帧匹配的检测点，直接增长可靠轨迹，在全局关联中，连接不可靠轨迹与局部关联中未关联的检测点和连接两个分段的轨迹，以减少轨迹分段的问题。本文在数据集PETS09-S2L1中进行分层关联的仿真实验，实验结果如图 4所示。3号目标因多次发生遮挡，当目标重新出现时，被重新检测为一个新目标，如图 4(a)所示，存在轨迹分段的现象。在图 4(b)中，分段的轨迹在全局关联中进一步连接，从而得到3号目标经过144帧的长轨迹。

图 4 分层关联仿真实验结果(PETS09-S2L1)

Fig. 4 Simulation experiment results of hierarchical association(PETS09-S2L1)

((a) local association; (b) global association)

图 5显示了2D MOT 2015数据库中7段视频的部分跟踪结果，左下角为帧数，目标方框颜色由红到蓝表示轨迹置信度由高到低，目标ID在中心显示。其中图 5(a)PETS09-S2L1视频序列，相机静止且角度较高，场景密度较低但存在目标非线性运动，如3号和12号目标，短时间内运动方向更改多次，第281帧时13号目标被场景电线杆遮挡，轨迹置信度降低，第324帧时12、3、13号目标均发生多次非线性运动，13号目标遮挡结束后仍然被跟踪且ID不变，轨迹置信度升高；图 5(b)ETH-Bahnhof视频序列，相机是运动的，目标密度为中等，1号目标第8帧开始被跟踪，到第21帧时1号目标被3号目标完全遮挡，此时1号目标轨迹置信度降低，第27帧遮挡结束1号目标重新出现后仍然能被准确跟踪且ID不变；图 5(c)ETH-Jelmoli视频序列，相机为运动的，目标密度中等，且室外场景光线较亮，视频中第27帧的8号目标开始被跟踪，第37帧时8号目标穿过密集人群，目标互相遮挡严重，此时8号目标ID仍不变，到第45帧8号目标离开密集人群，仍然可以保持准确跟踪且ID不变；图 5(d)TUD-Stadtmitte视频序列，相机为静止，人群密度中等，7号目标处于静止状态，2号目标在第25帧时穿过密集人群，由于遮挡轨迹置信度降低，遮挡结束后第70帧2号目标仍能准确跟踪，轨迹置信度升高，其余4、5、7号目标均保持稳定跟踪；图 5(e)ADL-Rundle-6视频序列，相机静止，角度低，目标密度高，目标间相互遮挡严重，目标都可以保持稳定的跟踪，相互遮挡时没有发生身份互换；图 5(f)ADL-Rundle-8视频序列，相机运动，目标密度高，且为夜间场景，32号目标从相机由远到近都一直保持稳定跟踪；图 5(g)Venice-1视频序列，相机静止，目标密度高，场景较暗，且目标尺度较小，目标都能保持稳定跟踪，但由于场景光线原因出现目标误检(图 5(g)17号、18号目标)。

图 5 2D MOT 2015仿真实验结果

Fig. 5 Simulation experiment results of 2D MOT 2015 ((a) PETS09-S2L1; (b) ETH-Bahnhof; (c) ETH-Jelmoli; (d) TUD-Stadtmitte; (e) ADL-Rundle-6; (f) ADL-Rundle-8; (g) Venice-1)

仿真实验结果表明，本文算法能够在不同场景中对同一目标进行稳定跟踪，且对遮挡(场景背景遮挡、目标间的遮挡)后再次出现的目标能够保持持续稳定地跟踪且ID不变，对于目标距离较近且相互间遮挡时，目标间ID没有转换，身份转换问题减少。但本文算法由于基于检测结果的跟踪，在复杂场景中目标检测的误检、漏检致使出现少量错误跟踪情况。

3.2 定量分析

使用公共标准CLEAR MOT(Bernardin和Stiefelhagen，2008)来定量地评估算法性能，该标准包含了多种评估指标：多目标跟踪精度(multi-object tracking precision, MOTP↑)、多目标跟踪准确度(multi-object tracking accuracy, MOTA↑)。除此之外，还利用了Wu和Nevatia(2006)提供的几种常用评估指标：跟踪轨迹标签转换次数(identity switch, IDS↓)、与真值相比在整个序列中超过80%视频帧被跟踪的轨迹数量比(mostly tracked, MT↑)、在整个序列中低于20%视频帧被跟踪的轨迹数量比(mostly lost, ML↓)及轨迹分段的数量(fragmentation, Frag↓)。↑表示数据越高越好，↓表示数据越小越好，本文算法与5种在线跟踪算法：Zhu等人(2016)、Tesfaye等人(2016)、Possegger等人(2014)、Wang等人(2016)、Yamaguchi等人(2011)和5种离线跟踪算法：张丽娟和周治平(2018)、齐美彬等人(2017)、Pirsiavash等人(2011)、Li等人(2009)、Milan等人(2014)定量对比结果如表 1所示，实验数据由相关文献提供。

表 1 算法仿真实验结果比较
Table 1 Comparison of simulation experiment results of algorithms

下载CSV

序列名	类型	算法	MOTP/%	MOTA/%	IDS	MT/%	ML/%	Frag
PETS09-S2L1	在线	Zhu等人(2016)	-	-	35	94.7	0.0	25
	在线	Tesfaye等人(2016)	56.8	90.0	15	89.5	0.0	-
	离线	张丽娟和周治平(2018)	75.6	95.6	12	100.0	0.0	3
	离线	Milan等人(2014)	80.2	90.6	11	91.3	4.3	-
	在线	本文	86.3	93.6	6	94.7	0.0	3
TUD-Stadmitte	在线	Zhu等人(2016)	-	-	13	40.0	0.0	19
	在线	Tesfaye等人(2016)	52.6	72.4	10	60.0	0.0	-
	离线	张丽娟和周治平(2018)	90.7	89.3	10	90.0	0.0	2
	离线	Wang等人(2016)	72.6	67.6	96	50.0	0.0	-
	在线	本文	88.9	83.5	6	90.0	0.0	4
Town-Center	在线	Possegger等人(2014)	68.6	70.7	157	56.3	7.4	321
	在线	Yamaguchi等人(2011)	74.5	67.2	146	65.8	6.5	173
	在线	Li等人(2009)	71.7	66.6	302	58.1	6.5	492
	离线	齐美彬等人(2017)	70.6	79.7	-	-	-	-
	在线	本文	72.6	84.9	86	82.7	4.0	89
注：加粗字体为每列最优值，“-”表示原文献没有此项数据。

从表 1中可以看出，本文算法与5种在线跟踪算法相比，在MOTP和MOTA方面提供了更好的性能。此外，本文方法与4种离线跟踪方法显示出几乎相同甚至更好的性能。在PETS09-S2L1视频序列中，行人密度较小，但存在许多非线性运动，本文算法在MOTP、IDS、Frag指标上具有优势，MOTA较张丽娟和周治平(2018)算法低2%；在TUD-Stadmitte视频序列中，由于处于复杂的室外环境中，本文算法是基于检测的在线跟踪，出现目标检测的误检、漏检，影响跟踪的准确性，使得本文算法MOTP和MOTA较张丽娟和周治平(2018)算法分别低1.8%、5.8%，但在IDS指标上本文算法仍然保持优势，减少了4个以上；在Town-Center视频序列中，目标很多，彼此之间距离很近且相互间频繁重叠遮挡，本文算法在MOTA指标上较其他算法有明显的提高，提高了5.2%以上，且IDS、MT、ML、Frag指标均达到最优，其中IDS减少了60个以上，Frag减少了84个以上。值得注意的是，本文算法的改进是在不使用未来帧并且不使用多个特征(颜色、形状、纹理)的情况下实现的。综合8段测试视频序列共5 483帧的视频跟踪结果，得到本文算法平均跟踪准确度为83.4%，平均跟踪精度为81.1%。

本文算法在不同ILDA外观模型更新的差异阈值下的仿真实验结果如表 2所示。视频序列选取ETH-Bahnhof，共1 000帧，视频分辨率为640×480像素，帧率为14帧/s。从表 2中可以看出，在设置阈值为10%时，处理速度下降到0.281 s/帧，节约了29.9%的时间，是所选阈值中降幅最大的。比较各个鲁棒性参数(MOTP，MOTA，IDS)，在阈值为10%时，各个鲁棒性参数变化很小，当阈值设为30%时，MOTP降低了6.71%，MOTA降低了9.57%，IDS增加了44.1%，鲁棒性已明显降低。故本文将差异阈值设为10%。

表 2 不同ILDA外观模型更新阈值$\varDelta$下仿真实验结果(ETH-Bahnhof)
Table 2 Simulation experiment results under different ILDA updating threshold $\varDelta$(ETH-Bahnhof)

下载CSV

阈值/%	速度/(s/帧)	MOTP/%	MOTA/%	IDS
无	0.401	79.0	80.4	14
10	0.281	78.8	80.4	17
20	0.277	74.0	74.3	21
30	0.263	71.1	71.8	29

表 3为本文算法在不同数据集中的运行时间，目标密度为平均每帧的目标个数，从表 3中可以看出，本文算法的跟踪速度基于检测的数目，在低密度数据集(ETH-Bahnhof、PETS09-S2L1、ETH-Jelmoli、TUD-Stadtmitte)上的平均跟踪时间为0.261 s/帧，在高密度数据集(ADL-Rundle-6、Venice-1、ADL-Rundle-8、Town-Center)上的平均跟踪时间为0.545 s/帧，其中外观模型建立占据30%的计算时间。综上仿真实验结果，本文算法可以在复杂场景下稳定跟踪多个目标，且适于实时应用。

表 3 本文算法在不同数据集中的运行时间
Table 3 The running time of proposed algorithm under different data sets

下载CSV

数据集	ETH-Bahnhof	PETS09-S2L1	ETH-Jelmoli	TUD-Stadtmitte	ADL-Rundle-6	Venice-1	ADL-Rundle-8	Town-Center
目标密度	5.4	5.6	5.8	6.5	9.5	10.1	10.4	15.9
检测/s	0.027	0.194	0.085	0.116	0.263	0.313	0.223	0.328
跟踪/s	0.281	0.245	0.232	0.285	0.420	0.536	0.575	0.649
总时间/s	0.308	0.439	0.317	0.401	0.683	0.849	0.798	0.977

综上仿真实验结果表明，本文算法在身份转换、轨迹分段处理问题上优于其他算法，这归因于本文采用ILDA外观学习建立和自适应地更新外观模型，能够较好地处理跟踪目标距离较近且被长时间遮挡的情况，不易于产生身份转换；当两个目标距离相近且互相产生遮挡，此时轨迹置信度降低变为不可靠轨迹，在关联不可靠轨迹时，本文加入运动模型，对于两个外观相似的目标其运动特征存在差异，结合运动和外观模型进行相似度比较，有效减少身份转换的问题；在本文全局关联中，对于长时间遮挡后重新出现的跟踪目标，基于它的不可靠运动模型本文算法没有用简单匀速直线运动模型去预测，而是引入轨迹置信度加以判断，当置信度较低，其有效关联范围增大，可以使其关联到偏离相应轨迹较远的目标，能更好地处理因长时间遮挡产生的轨迹分段现象。

4 结论

本文提出了一种自适应在线ILDA外观学习的分层关联多目标跟踪方法，根据轨迹置信度的高低区分为可靠轨迹和不可靠轨迹，利用可靠轨迹的局部关联和不可靠轨迹的全局关联顺序地进行检测—跟踪关联。在全局关联阶段，引入运动模型进一步关联高度分段的轨迹，有效减少轨迹分段现象；在构建目标外观模型时采用了ILDA外观学习方法，能够在复杂场景中有效区分相互关联的多个对象，减少身份转换问题。在2D MOT 2015数据集上的仿真实验表明，该算法较当前大多数多目标跟踪算法能够有效稳定地跟踪目标并大大减少了身份转换和轨迹分段现象。

由于本文算法是基于检测的跟踪，今后研究工作的重点是在复杂场景中进一步提高跟踪算法的准确性，同时推广到多摄像机多目标的跟踪。

参考文献

Bernardin K, Stiefelhagen R. 2008. Evaluating multiple object tracking performance:the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008(1): 1-10 [DOI:10.1155/2008/246309]

Dollár P, Appel R, Belongie S, Perona P. 2014. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8): 1532-1545 [DOI:10.1109/TPAMI.2014.2300479]

Huang C, Li Y, Nevatia R. 2013. Multiple target tracking by learning-based hierarchical association of detection responses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4): 898-910 [DOI:10.1109/TPAMI.2012.159]

Kalman R E. 1960. A New approach to linear filtering and prediction problems. Journal of Basic Engineering Transactions, 82(1): 35-45 [DOI:10.1115/1.3662552]

Khan Z, Balch T, Dellaert F. 2005. MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11): 1805-1819 [DOI:10.1109/TPAMI.2005.223]

Kuhn H W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1/2): 83-97 [DOI:10.1002/nav.3800020109]

Li Y, Huang C and Nevatia R. 2009. Learning to associate: hybrid boosted multi-target tracker for crowded scene//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE: 2953-2960[DOI: 10.1109/CVPR.2009.5206735]

McLaughlin N, Del Rincon J M and Miller P. 2013. Online multiperson tracking with occlusion reasoning and unsupervised track motion model//Proceedings of the 10th IEEE International Conference on Advanced Video and Signal Based Surveillance. Krakow, Poland: IEEE: 37-42[DOI: 10.1109/AVSS.2013.6636613]

Milan A, Roth S, Schindler K. 2014. Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1): 58-72 [DOI:10.1109/tpami.2013.103]

Pang S N, Ozawa S, Kasabov N. 2005. Incremental linear discriminant analysis for classification of data streams. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(5): 905-914 [DOI:10.1109/TSMCB.2005.847744]

Pirsiavash H, Ramanan D and Fowlkes C C. 2011. Globally-optimal greedy algorithms for tracking a variable number of objects//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1201-1208[DOI: 10.1109/CVPR.2011.5995604]

Possegger H, Mauthner T, Roth P M and Bischof H. 2014. Occlusion geodesics for online multi-object tracking//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 1306-1313[DOI: 10.1109/CVPR.2014.170]

Qi M B, Yue Z L, Shu K, Jiang J G. 2017. Multi-object tracking using hierarchical data association based on generalized correlation clustering graphs. Acta Automatica Sinica, 43(1): 152-160 (齐美彬, 岳周龙, 疏坤, 蒋建国. 2017. 基于广义关联聚类图的分层关联多目标跟踪. 自动化学报, 43(1): 152-160) [DOI:10.16383/j.aas.2017.c150519]

Shu G, Dehghan A, Oreifej O, Hand E and Shah M. 2012. Part-based multiple-person tracking with partial occlusion handling//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1815-1821[DOI: 10.1109/CVPR.2012.6247879]

Tesfaye Y T, Zemene E, Pelillo M, Prati A. 2016. Multi-object tracking using dominant sets. IET Computer Vision, 10(4): 289-297 [DOI:10.1049/iet-cvi.2015.0297]

Wang B, Wang L, Shuai B, Zuo Z, Liu T, Chan K L and Wang G. 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, NV, USA: IEEE: 386-393[DOI: 10.1109/CVPRW.2016.55]

Wu B and Nevatia R. 2006. Tracking of multiple, partially occluded humans based on static body part detection//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 951-958[DOI: 10.1109/CVPR.2006.312]

Yamaguchi K, Berg A C, Ortiz L E and Berg T L. 2011. Who are you with and where are you going?//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Providence, RI, USA: IEEE: 1345-1352[DOI: 10.1109/CVPR.2011.5995468]

Yang B and Nevatia R. 2012. An online learned CRF model for multi-target tracking//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 2034-2041[DOI: 10.1109/CVPR.2012.6247907]

Zamir A R, Dehghan A and Shah M. 2012. GMCP-Tracker: global multi-object tracking using generalized minimum clique graphs//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 343-356[DOI: 10.1007/978-3-642-33709-3_25]

Zhang L, Li Y and Nevatia R. 2008. Global data association for multi-object tracking using network flows//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587584]

Zhang L J, Zhou Z P. 2018. Multiple target tracking using hierarchical data association based on network flows. Journal of Computer-Aided Design and Computer Graphics, 30(9): 1670-1677 (张丽娟, 周治平. 2018. 基于网络流的分层关联多目标跟踪. 计算机辅助设计与图形学学报, 30(9): 1670-1677) [DOI:10.3724/SP.J.1089.2018.16906]

Zhu S H, Sun C J, Shi Z. 2016. Multi-target tracking via hierarchical association learning. Neurocomputing, 208: 365-372 [DOI:10.1016/j.neucom.2016.02.071]