|
发布时间: 2020-04-16 |
图像分析和识别 |
|
|
收稿日期: 2019-07-03; 修回日期: 2019-09-22; 预印本日期: 2019-09-29
基金项目: 国家自然科学基金项目(61573168);中央高校基本科研业务费专项资金项目(JUSRP51733B)
第一作者简介:
方岚, 1995年生, 女, 硕士研究生, 主要研究方向为信号与图像处理。E-mail:18861820969@163.com;
于凤芹, 女, 教授, 主要研究方向为图像跟踪与识别、语音信号时频分析。E-mail:yufq@jiangnan.edu.cn.
中图法分类号: TP391
文献标识码: A
文章编号: 1006-8961(2020)04-0708-13
|
摘要
目的 复杂场景下目标频繁且长时间的遮挡、跟踪目标外观相似引起身份转换等问题给多目标跟踪带来许多挑战。针对多目标跟踪在复杂场景中因长时间遮挡引起身份转换和轨迹分段的问题,提出一种基于自适应在线判别外观学习的分层关联多目标跟踪算法。方法 利用轨迹置信度将多目标跟踪分为局部关联和全局关联两个层次。在局部关联中,置信度高的可靠轨迹利用外观、位置-大小相似度与当前帧检测点进行关联;在全局关联中,置信度低的不可靠轨迹引入运动模型和有效关联范围进一步关联分段的轨迹。在提取目标外观特征时引入增量线性可判别分析方法以解决身份转换问题,依据新增样本与目标样本均值的外观特征差异自适应地更新目标外观模型。结果 在公开数据集2D MOT2015中的PETS09-S2L1、TUD-Stadmitte、Town-Center 3个数据集中与当前10种多目标跟踪算法进行比较,该方法对各个数据集身份转换和轨迹分段都有减少,其中在Town-Center数据集中,身份转换减少了60个,轨迹分段减少了84个,跟踪准确度提高了5.2%以上。结论 本文多目标跟踪方法,能够在复杂场景中稳定有效地实现多目标跟踪,减少轨迹分段现象,其中引入的在线线性可判别外观学习对遮挡产生的身份转换具有良好的解决效果。
关键词
多目标跟踪; 局部关联; 全局关联; 轨迹置信度; 增量线性可判别分析
Abstract
Objective Multi-object tracking is an important research topic in computer vision. Although several previous studies have dealt with varieties of particular problems in multi-object tracking, many challenges are still observed, such as object detection errors, missed detection, frequent and long-term occlusion of objects in complex scenes, and identity switches of tracking objects with similar appearance. All of which are easy to lead to trajectory drift or tracking interruption. With the improvement of object detection, the object tracking method based on detection shows good performance. The key of tracking-by-detection algorithm is the data association between detection points, which mainly consists of two types, namely, frame-by-frame association and multi-frame association. Frame-by-frame data association refers to the association between detection points in the two consecutive frames, which is carried out according to the properties of detection points, such as appearance, location, and size. Tracking drift or failure is likely to occur when object is blocked, misdetected or similar appearance exist due to that the frame-by-frame data association only contains the information of the previous two frames. Multi-frame data association establishes a relational model by using object detection information of multiple frames rather than only previous two frames. This condition can effectively reduce the object error association and deal with occlusion. However, if the occlusion time is longer than the time segment needed for multi-frame data association, the detection points before and after still cannot be successfully associated, and the tracking will also be interrupted. Moreover, this method needs all detection information before tracking, which cannot meet the real-time requirement. Aiming at the problems of ID switches and trajectory fragmentation caused by long-term occlusion, an online multi-object tracking algorithm based on adaptive online discriminative appearance learning and hierarchical association is proposed for multi-object tracking in complex scenes. This process combines the low-level appearance, position-size characteristics used in local association, and high-level motion model established in global association and can meet the real-time tracking requirement. Method In this study, multi-object tracking is divided into two stages according to track confidence:local association and global association. The establishment of the object robust appearance model is the key to local association and global association. An online incremental linear discriminant analysis(ILDA) method was introduced to discriminate the appearances of objects and adaptively update the object appearance models based on the difference value between the new sample and the mean of object samples to address the problem of identity switches. The reliable tracklet with high confidence in the local association stage is associated with the current frame detections by low-level properties of detection points:appearance and position-size similarity, which allows reliable trajectories to grow constantly. The unreliable tracklet with low confidence in the global association stage resulted from long-term occlusion is further associated. In this stage, the candidate object consists of two kinds. One is the detection points that are not associated in local association, and the other one is continuous trajectory with high confidence meeting the time condition. The end time of trajectory is before the current time. When we associate detection points that reappear after long-term occlusion, only appearance similarity is utilized within a validation range without the position-size property due to the unreliable motion dynamics of unreliable objects. At the same time, introducing a valid association range is related to the trajectory confidence. Once the track confidence is reduced, the valid association range is increased because the distance between a drifting track and the corresponding object can grow large if the track drift persists. This condition allows us to reassign drifting tracks to detections of reappearing objects, which is even distant from the corresponding tracks. When two track fragments are associated, a motion model is introduced to determine whether the two trajectories belong to the same object. In this condition, the average velocity vector angle of the two track fragments is larger than a threshold, indicating that it may include unreliable tracks. Thus, we only consider appearance similarity between the pair. Otherwise, we combine the appearance, position size, and motion similarity to make an association between the pair. If two track fragments are associated successfully, the linear interpolation is used to fill the lost interval of this object. Thus, the two trajectory fragments can be connected effectively. Result We compared our method with 10 state-of-the-art multi-object tracking algorithms, including five offline tracking approaches and five online tracking methods on three public datasets, namely, PETS09-S2L1, TUD-Stadmitte, and Town-Center. The quantitative evaluation metrics contained multi-object tracking accuracy (MOTA), multi-object tracking precision (MOTP), the number of identity switches (IDS), the ratio of mostly tracked trajectories (MT), the ratio of mostly lost trajectories (ML), and the number of track fragmentation (Frag). The experiment results illustrate that our tracking method outperforms in MOTA and MOTP compared with selected online multi-object tracking methods, which include two tracking approaches based on hierarchical association. In addition, the proposed approach performs almost the same or even better when compared with offline tracking methods. In the PETS09-S2L1 data set, the proposed approaches are superior to other comparators in MOTP, IDS, and Frag. MOTP increased by 6.1%, IDS reduced by 5, and Frag reduced by 21. In TUD-Stadmitte dataset, IDS reduced by 4. Compared with online tracking approaches, the MOTP and MOTA increased by 36.3% and 11.1%, respectively. In Town-Center dataset, MOTA and MT increased by 5.2% and 16.9%, respectively. IDS and Frag reduced by 60 and 84, respectively, and ML decreased by 1.5%. Conclusion In this study, we take the idea of hierarchical data association, proposing a multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association. The experiment results indicate that our method has a good solution to the problems of ID switches and trajectory fragmentation caused by long-term occlusion in complex scenes.
Key words
multi-object tracking; local association; global association; track confidence; incremental linear discriminant analysis(ILDA)
0 引言
多目标跟踪是计算机视觉领域的一个重要研究内容,在视频监控、运动分析和机器人导航等领域有着重要应用。多目标跟踪目前已经有大量的研究成果,然而仍然面临着很多挑战,如目标检测的误检、漏检、复杂场景下目标频繁和长时间被遮挡、跟踪目标外观相似发生身份转换等,都易导致轨迹漂移或者跟踪中断。
随着目标检测的性能提高,基于检测的目标跟踪方法展现了很好的性能。检测跟踪的研究关键是检测点间的数据关联,目前数据关联主要有逐帧关联和多帧关联两类方法。逐帧关联是利用前后两帧中的检测点进行关联,根据检测点的自身属性如外观、位置、大小等特征,进行相邻帧的数据关联。逐帧关联因只有前后两帧的信息,当发生目标遮挡、误检及存在与目标相似物体时,易发生跟踪漂移或失败。Khan等人(2005)提出了利用马尔可夫链蒙特卡罗(Markov chain Monte Carlo,MCMC)采样步长来代替传统的粒子滤波中的重要采样步长的方法,该粒子滤波器有助于处理复杂场景中的大量目标跟踪问题;Shu等人(2012)基于特定的人体局部支持向量机(support vector machine,SVM)分类器,该分类器能够在动态变化的外观和背景中捕捉人体的关节,能够在检测和跟踪阶段处理部分遮挡;Huang等人(2013)利用保守双阈值连接连续帧之间的检测响应形成初始轨迹,将其考虑为最大后验概率(maximum a posteriori, MAP)问题, 然后再关联高度分段的轨迹,在跟踪精度方面优于以前的先进算法;McLaughlin等人(2013)提出一种双阶段在线跟踪算法框架,在第1阶段中提出一种遮挡鲁棒的外观相似度计算方法,在2阶段中利用在线学习运动模型连接分段的轨迹,该框架能够处理部分遮挡和连接中断的轨迹。
多帧关联是利用目标多帧的检测信息同时建立关系模型,而不仅利用前后两帧的检测信息。这样能够有效减少目标的错误关联和处理遮挡。但是如果遮挡时间比多帧数据关联所需要的时间片段还长,则前后检测点依然无法成功关联,跟踪同样会中断。且该方法跟踪前需要全部检测信息,无法满足在线需求。针对这个问题,Zhang等人(2008)在一个滑动窗口内进行数据关联,实现在线跟踪;Zamir等人(2012)以全局的方式将运动和外观结合起来,将整个时间跨度合并在一起,每次解决一个对象的数据关联问题,同时隐式地合并其他对象;Yang和Nevatia(2012)将多目标跟踪转化为数据最优解求解问题,建立一个全局能量函数,通过迭代使代价函数能量最小化得到目标轨迹,但没有解决目标外观相似且距离相近的身份转换问题;齐美彬等人(2017)将关联问题视为广义最大团图问题,将每个检测点视为图的一个节点,通过不断迭代寻找能量最小的子图解决数据关联问题,但对于遮挡、漏检等情况并未进行分析;张丽娟和周治平(2018)提出基于网络流的分层关联多目标跟踪,将多目标跟踪分成2个阶段,首先将检测器输出检测响应利用双阈值法形成初始轨迹片, 然后将轨迹片间的连接问题转为一个有向无环图的求解, 利用最小费用流法求解进一步的数据关联。
基于上述研究方法,本文吸取分层数据关联的思想,利用轨迹置信度将目标跟踪分成两个层次,分层进行检测—跟踪关联。目标外观模型的建立是局部关联和全局关联的关键,针对身份转换问题,采用在线增量线性可判别分析(incremental linear discriminant analysis,ILDA)外观学习,依据新增样本与目标样本均值的外观特征差异值自适应地更新目标的外观模型。局部关联阶段利用置信度高的可靠轨迹与当前帧检测点进行关联,关联代价矩阵由外观、位置—大小相似度计算,关联成功能够使得可靠轨迹有效增长;全局关联阶段利用低置信度的不可靠轨迹与当前帧局部未关联检测点及满足时间条件的高置信度的连续轨迹进行关联。该阶段用来处理因长时间遮挡产生的轨迹分段问题。
1 自适应在线ILDA外观学习的分层关联多目标跟踪算法原理
1.1 轨迹置信度
轨迹置信度是构造的轨迹与目标真实轨迹间的匹配程度。置信度高的可靠轨迹有以下几个要求:1)较短的轨迹往往是不可靠的,长轨迹更可能是目标的正确轨迹;2)被其他轨迹严重遮挡的轨迹不适合作为可靠的轨迹;3)轨迹与关联的检测之间有高相似度表明该轨迹是可靠的。综上要求,为计算轨迹的置信度,本文考虑了轨迹的外观相似度和观测连续性,定义轨迹
${conf}\left(r_{i}^{t}\right)=\frac{1}{T_{L}} \sum\limits_{k=t-T_{L}+1}^{t} {conf}_{{\rm a}}\left(r_{i}^{k}\right) \cdot {conf}_{{\rm o}}\left(r_{i}^{k}\right)$ | (1) |
式中,
外观置信度
${conf}_{\mathrm{a}}\left(r_{i}^{t}\right)=\left[1+\exp \left\{-\beta\left(\psi_{\mathrm{a}}\left(r_{i}^{t}\right)-\tau_{\mathrm{a}}\right)\right\}\right]^{-1}$ | (2) |
式中,
观测连续置信度
${conf}_{\mathrm{o}}\left(r_{i}^{t}\right)=\left[1+\exp \left(\eta_{r_{i}}^{t}-\delta_{\mathrm{o}}\right)\right]^{-1}$ | (3) |
式中,
本文轨迹置信度的值在[0, 1]范围内,并将轨迹置信度
1.2 自适应在线ILDA外观学习
外观模型在局部关联和全局关联非常重要,当跟踪目标外观非常相似,例如跟踪足球队的队员或者跟踪车辆时,多个目标间难以区分,甚至当遮挡发生后易出现身份互换的问题。为了更好地区分目标外观,在长期遮挡后能够准确跟踪目标,本文采用在线增量线性可判别分析方法,更好地在局部关联和全局关联中建立目标外观模型。
Pang等人(2005)提出了增量线性判别分析,通过寻找到一个最佳的投影矩阵
$\hat{\boldsymbol{W}}=\arg \max\limits_{\boldsymbol{W}} \frac{\left|\boldsymbol{W}^{\mathrm{T}} \boldsymbol{S}_{\mathrm{B}} \boldsymbol{W}\right|}{\left|\boldsymbol{W}^{\mathrm{T}} \boldsymbol{S}_{\mathrm{W}} \boldsymbol{W}\right|}$ | (4) |
式中,
$\boldsymbol{S}_{\mathrm{B}}=\sum\limits_{i=1}^{k} n_{i}\left(\boldsymbol{m}_{i}-\boldsymbol{\mu}\right)\left(\boldsymbol{m}_{i}-\boldsymbol{\mu}\right)^{\mathrm{T}}$ | (5) |
$\boldsymbol{S}_{\mathrm{W}}=\sum\limits_{i=1}^{k} \sum\limits_{j=1}^{n_{i}}\left(\boldsymbol{f}_{i}^{j}-\boldsymbol{m}_{i}\right)\left(\boldsymbol{f}_{i}^{j}-\boldsymbol{m}_{i}\right)^{\mathrm{T}}$ | (6) |
式中,
考虑跟踪目标外观变化的情况,需要用新的增量样本不断更新投影矩阵
为了更节约、有效地更新ILDA模型参数,本文提出一种自适应更新投影矩阵等参数的机制,即只有当前帧关联的检测
$\boldsymbol{m}_{i}^{\prime}=\frac{N_{i} \boldsymbol{m}_{i}+l_{i} \boldsymbol{m}_{i}^{y}}{N_{i}^{\prime}}, i=1, \cdots, T$ | (7) |
$\boldsymbol{\mu}^{\prime}=\frac{N \boldsymbol{\mu}+L \boldsymbol{\mu}^{y}}{N+L}$ | (8) |
分别将
1.3 可靠轨迹的局部关联
在局部关联阶段,具有高置信度的可靠轨迹
在当前帧
${\varLambda}_{1}\left({r}_{h i}^{t-1}, z_{j}^{t}\right)={\varLambda}_{\mathrm{ps}}\left({r}_{h i}^{t-1}, z_{j}^{t}\right) {\varLambda}_{\mathrm{a}}\left({r}_{h i}^{t-1}, z_{j}^{t}\right)$ | (9) |
式中,位置—大小相似度
$\varLambda_{\mathrm{ps}}\left(r_{h i}, z_{j}\right)=\frac{\boldsymbol{B}\left(r_{h i}\right) \cap \boldsymbol{B}\left(z_{j}\right)}{\boldsymbol{B}\left(r_{h i}\right) \cup \boldsymbol{B}\left(z_{j}\right)}$ | (10) |
式中,
目标外观模型由检测边界框内的外观特征构成,尺寸统一调整为96×32像素,本文用目标的HSV(hue, saturation, value)颜色直方图训练外观特征,用巴氏系数
$\varLambda_{\mathrm{a}}\left(r_{h i}, z_{j}\right)=\left\{\begin{array}{ll}\psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right) & \psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right) \geqslant \tau_{\mathrm{a}} \\ 0 & 其他\end{array}\right.$ | (11) |
$\psi_{\mathrm{a}}\left(r_{h i}, z_{j}\right)=\rho\left(\boldsymbol{W} \boldsymbol{m}_{i\left(r_{h i}\right)}, \boldsymbol{W} \boldsymbol{f}_{z_{j}^{t}}\right)$ | (12) |
式中,
当在
$\begin{aligned} \boldsymbol{S}=\left[s_{i j}\right]_{h \times m, s_{i j}} &=-\ln \left(\boldsymbol{\varLambda}_{l}\left(r_{h i}, z_{j}^{t}\right)\right) \\ z_{j}^{t} & \in \boldsymbol{Z}_{t} \end{aligned}$ | (13) |
从式(13)中可以看出,两者相似度越大则关联代价越小,设定阈值
1.4 不可靠轨迹的全局关联
在全局关联阶段,容易产生轨迹分段置信度较低的不可靠轨迹
事件A在复杂的情况下,目标被长时间遮挡后,同时可能改变了它们的运动,基于传统的简单运动模型(如匀速运动)的在线跟踪方法容易产生轨迹漂移,所以本文在该种情况下考虑了不可靠轨迹的不可靠运动模型,在全局关联阶段关联不可靠轨迹与局部关联阶段未匹配的检测点
$\varLambda_{g_{1}}\left(r_{l i}^{t}, y_{j}^{t}\right)=\left\{\begin{array}{ll}\varLambda_{\mathrm{a}}\left(r_{l i}^{t}, \gamma_{j}^{t}\right) & {dist}\left(r_{l i}^{t}, d_{j}^{t}\right) \geqslant \varLambda_{r_{l i}}^{t} \\ 0 & 其他\end{array}\right.$ | (14) |
式中,
在事件B关联不可靠的轨迹与其他高置信度轨迹时,不可靠轨迹通常由于长时间遮挡可能会发生轨迹碎片化的问题,为连接这些碎片轨迹,相似度
$ \begin{array}{c} \varLambda_{g_{2}}\left(r_{l i}^{t}, r_{h j}^{t}\right)=\\ \left\{\begin{array}{ll} \varLambda_{\mathrm{a}}\left(r_{l i}^{t}, r_{h j}^{t}\right) & \cos \left\langle\overline{\boldsymbol{v}}_{r_{l i}}^{t}, \overline{\boldsymbol{v}}_{r_{l j}}^{t}\right\rangle<\delta \\ \varLambda_{\mathrm{ps}}\left(r_{l i}^{t}, r_{h j}^{t}\right) \varLambda_{\mathrm{a}}\left(r_{l i}^{t}, r_{h j}^{t}\right) \varLambda_{\mathrm{m}}\left(r_{l i}^{t}, r_{h j}^{t}\right) & 其他 \end{array}\right. \end{array} $ | (15) |
式中,
$\varLambda_{\mathrm{m}}\left(r_{l i}^{t}, r_{h j}^{t}\right)=\frac{1}{2} \exp \left(\frac{-d}{\sigma}\right)$ | (16) |
$d=e\left(r_{l i}^{t}, r_{h j}^{t}\right)+e\left(r_{h j}^{t}, r_{l i}^{t}\right)$ | (17) |
式中,
事件C关联代价矩阵
$\boldsymbol{G}_{(l+n) \times(h+l)}=\left[\begin{array}{ll}\boldsymbol{B}_{l \times h} & \boldsymbol{C}_{l \times l} \\ -\ln \left({\theta}_{2}\right)_{n \times h} & \boldsymbol{A}_{n \times l}\end{array}\right]$ | (18) |
式中,阈值
全局关联后将已关联的轨迹位置和速度用已关联的检测
2 算法实现步骤
本文算法步骤如下:
输入:视频帧序列。
输出:多目标跟踪结果。
1) 轨迹预处理。读入视频,对视频序列前5帧进行预处理,形成初始轨迹。
2) 训练ILDA外观模型。用初始轨迹样本训练出初始的ILDA外观模型参数:目标均值向量
3)轨迹置信度判断。计算轨迹置信度
4)局部关联。输入可靠轨迹
5)全局关联。输入不可靠轨迹、满足时间条件的高置信度轨迹和当前帧局部未关联的检测结果
6)判断是否为结尾帧,如果是则结束,如果不是,转步骤7)。
7)轨迹更新。将已关联的轨迹状态(位置、大小、速度)和轨迹置信度用关联的检测点来更新,使用卡尔曼滤波器预测未关联的轨迹下一帧状态进行更新。
8)ILDA外观模型更新。将最新关联的检测结果投影特征
本文算法实现流程图如图 3所示。
3 仿真实验结果与分析
本实验运行环境为Windows10 64位操作系统,Intel Xeon E5 CPU处理器,16 GB内存,MATLAB R2016a仿真系统。在Multiple Object Tracking Benchmark上的公共数据集2D MOT 2015上进行仿真实验,验证本文算法的性能。本文算法在目标检测部分采用聚合通道特征(aggregate channel feature,ACF)算法(Dollár等,2014)进行目标检测。选取该数据集中8段测试视频,分为相机运动拍摄和静止拍摄2种,视角包含高、中、低3种,并且有场景亮度变化、目标非线性运动、遮挡、目标形变等多种情况。
实验中涉及的参数有5个:轨迹置信度阈值
3.1 定性分析
本文跟踪算法分为局部关联和全局关联两个层次,第1层局部关联连接可靠轨迹和当前帧匹配的检测点,直接增长可靠轨迹,在全局关联中,连接不可靠轨迹与局部关联中未关联的检测点和连接两个分段的轨迹,以减少轨迹分段的问题。本文在数据集PETS09-S2L1中进行分层关联的仿真实验,实验结果如图 4所示。3号目标因多次发生遮挡,当目标重新出现时,被重新检测为一个新目标,如图 4(a)所示,存在轨迹分段的现象。在图 4(b)中,分段的轨迹在全局关联中进一步连接,从而得到3号目标经过144帧的长轨迹。
图 5显示了2D MOT 2015数据库中7段视频的部分跟踪结果,左下角为帧数,目标方框颜色由红到蓝表示轨迹置信度由高到低,目标ID在中心显示。其中图 5(a)PETS09-S2L1视频序列,相机静止且角度较高,场景密度较低但存在目标非线性运动,如3号和12号目标,短时间内运动方向更改多次,第281帧时13号目标被场景电线杆遮挡,轨迹置信度降低,第324帧时12、3、13号目标均发生多次非线性运动,13号目标遮挡结束后仍然被跟踪且ID不变,轨迹置信度升高;图 5(b)ETH-Bahnhof视频序列,相机是运动的,目标密度为中等,1号目标第8帧开始被跟踪,到第21帧时1号目标被3号目标完全遮挡,此时1号目标轨迹置信度降低,第27帧遮挡结束1号目标重新出现后仍然能被准确跟踪且ID不变;图 5(c)ETH-Jelmoli视频序列,相机为运动的,目标密度中等,且室外场景光线较亮,视频中第27帧的8号目标开始被跟踪,第37帧时8号目标穿过密集人群,目标互相遮挡严重,此时8号目标ID仍不变,到第45帧8号目标离开密集人群,仍然可以保持准确跟踪且ID不变;图 5(d)TUD-Stadtmitte视频序列,相机为静止,人群密度中等,7号目标处于静止状态,2号目标在第25帧时穿过密集人群,由于遮挡轨迹置信度降低,遮挡结束后第70帧2号目标仍能准确跟踪,轨迹置信度升高,其余4、5、7号目标均保持稳定跟踪;图 5(e)ADL-Rundle-6视频序列,相机静止,角度低,目标密度高,目标间相互遮挡严重,目标都可以保持稳定的跟踪,相互遮挡时没有发生身份互换;图 5(f)ADL-Rundle-8视频序列,相机运动,目标密度高,且为夜间场景,32号目标从相机由远到近都一直保持稳定跟踪;图 5(g)Venice-1视频序列,相机静止,目标密度高,场景较暗,且目标尺度较小,目标都能保持稳定跟踪,但由于场景光线原因出现目标误检(图 5(g)17号、18号目标)。
仿真实验结果表明,本文算法能够在不同场景中对同一目标进行稳定跟踪,且对遮挡(场景背景遮挡、目标间的遮挡)后再次出现的目标能够保持持续稳定地跟踪且ID不变,对于目标距离较近且相互间遮挡时,目标间ID没有转换,身份转换问题减少。但本文算法由于基于检测结果的跟踪,在复杂场景中目标检测的误检、漏检致使出现少量错误跟踪情况。
3.2 定量分析
使用公共标准CLEAR MOT(Bernardin和Stiefelhagen,2008)来定量地评估算法性能,该标准包含了多种评估指标:多目标跟踪精度(multi-object tracking precision, MOTP↑)、多目标跟踪准确度(multi-object tracking accuracy, MOTA↑)。除此之外,还利用了Wu和Nevatia(2006)提供的几种常用评估指标:跟踪轨迹标签转换次数(identity switch, IDS↓)、与真值相比在整个序列中超过80%视频帧被跟踪的轨迹数量比(mostly tracked, MT↑)、在整个序列中低于20%视频帧被跟踪的轨迹数量比(mostly lost, ML↓)及轨迹分段的数量(fragmentation, Frag↓)。↑表示数据越高越好,↓表示数据越小越好,本文算法与5种在线跟踪算法:Zhu等人(2016)、Tesfaye等人(2016)、Possegger等人(2014)、Wang等人(2016)、Yamaguchi等人(2011)和5种离线跟踪算法:张丽娟和周治平(2018)、齐美彬等人(2017)、Pirsiavash等人(2011)、Li等人(2009)、Milan等人(2014)定量对比结果如表 1所示,实验数据由相关文献提供。
表 1
算法仿真实验结果比较
Table 1
Comparison of simulation experiment results of algorithms
序列名 | 类型 | 算法 | MOTP/% | MOTA/% | IDS | MT/% | ML/% | Frag |
PETS09-S2L1 | 在线 | Zhu等人(2016) | - | - | 35 | 94.7 | 0.0 | 25 |
在线 | Tesfaye等人(2016) | 56.8 | 90.0 | 15 | 89.5 | 0.0 | - | |
离线 | 张丽娟和周治平(2018) | 75.6 | 95.6 | 12 | 100.0 | 0.0 | 3 | |
离线 | Milan等人(2014) | 80.2 | 90.6 | 11 | 91.3 | 4.3 | - | |
在线 | 本文 | 86.3 | 93.6 | 6 | 94.7 | 0.0 | 3 | |
TUD-Stadmitte | 在线 | Zhu等人(2016) | - | - | 13 | 40.0 | 0.0 | 19 |
在线 | Tesfaye等人(2016) | 52.6 | 72.4 | 10 | 60.0 | 0.0 | - | |
离线 | 张丽娟和周治平(2018) | 90.7 | 89.3 | 10 | 90.0 | 0.0 | 2 | |
离线 | Wang等人(2016) | 72.6 | 67.6 | 96 | 50.0 | 0.0 | - | |
在线 | 本文 | 88.9 | 83.5 | 6 | 90.0 | 0.0 | 4 | |
Town-Center | 在线 | Possegger等人(2014) | 68.6 | 70.7 | 157 | 56.3 | 7.4 | 321 |
在线 | Yamaguchi等人(2011) | 74.5 | 67.2 | 146 | 65.8 | 6.5 | 173 | |
在线 | Li等人(2009) | 71.7 | 66.6 | 302 | 58.1 | 6.5 | 492 | |
离线 | 齐美彬等人(2017) | 70.6 | 79.7 | - | - | - | - | |
在线 | 本文 | 72.6 | 84.9 | 86 | 82.7 | 4.0 | 89 | |
注:加粗字体为每列最优值,“-”表示原文献没有此项数据。 |
从表 1中可以看出,本文算法与5种在线跟踪算法相比,在MOTP和MOTA方面提供了更好的性能。此外,本文方法与4种离线跟踪方法显示出几乎相同甚至更好的性能。在PETS09-S2L1视频序列中,行人密度较小,但存在许多非线性运动,本文算法在MOTP、IDS、Frag指标上具有优势,MOTA较张丽娟和周治平(2018)算法低2%;在TUD-Stadmitte视频序列中,由于处于复杂的室外环境中,本文算法是基于检测的在线跟踪,出现目标检测的误检、漏检,影响跟踪的准确性,使得本文算法MOTP和MOTA较张丽娟和周治平(2018)算法分别低1.8%、5.8%,但在IDS指标上本文算法仍然保持优势,减少了4个以上;在Town-Center视频序列中,目标很多,彼此之间距离很近且相互间频繁重叠遮挡,本文算法在MOTA指标上较其他算法有明显的提高,提高了5.2%以上,且IDS、MT、ML、Frag指标均达到最优,其中IDS减少了60个以上,Frag减少了84个以上。值得注意的是,本文算法的改进是在不使用未来帧并且不使用多个特征(颜色、形状、纹理)的情况下实现的。综合8段测试视频序列共5 483帧的视频跟踪结果,得到本文算法平均跟踪准确度为83.4%,平均跟踪精度为81.1%。
本文算法在不同ILDA外观模型更新的差异阈值下的仿真实验结果如表 2所示。视频序列选取ETH-Bahnhof,共1 000帧,视频分辨率为640×480像素,帧率为14帧/s。从表 2中可以看出,在设置阈值为10%时,处理速度下降到0.281 s/帧,节约了29.9%的时间,是所选阈值中降幅最大的。比较各个鲁棒性参数(MOTP,MOTA,IDS),在阈值为10%时,各个鲁棒性参数变化很小,当阈值设为30%时,MOTP降低了6.71%,MOTA降低了9.57%,IDS增加了44.1%,鲁棒性已明显降低。故本文将差异阈值设为10%。
表 2
不同ILDA外观模型更新阈值
Table 2
Simulation experiment results under different ILDA updating threshold
阈值/% | 速度/(s/帧) | MOTP/% | MOTA/% | IDS |
无 | 0.401 | 79.0 | 80.4 | 14 |
10 | 0.281 | 78.8 | 80.4 | 17 |
20 | 0.277 | 74.0 | 74.3 | 21 |
30 | 0.263 | 71.1 | 71.8 | 29 |
表 3为本文算法在不同数据集中的运行时间,目标密度为平均每帧的目标个数,从表 3中可以看出,本文算法的跟踪速度基于检测的数目,在低密度数据集(ETH-Bahnhof、PETS09-S2L1、ETH-Jelmoli、TUD-Stadtmitte)上的平均跟踪时间为0.261 s/帧,在高密度数据集(ADL-Rundle-6、Venice-1、ADL-Rundle-8、Town-Center)上的平均跟踪时间为0.545 s/帧,其中外观模型建立占据30%的计算时间。综上仿真实验结果,本文算法可以在复杂场景下稳定跟踪多个目标,且适于实时应用。
表 3
本文算法在不同数据集中的运行时间
Table 3
The running time of proposed algorithm under different data sets
数据集 | ETH-Bahnhof | PETS09-S2L1 | ETH-Jelmoli | TUD-Stadtmitte | ADL-Rundle-6 | Venice-1 | ADL-Rundle-8 | Town-Center |
目标密度 | 5.4 | 5.6 | 5.8 | 6.5 | 9.5 | 10.1 | 10.4 | 15.9 |
检测/s | 0.027 | 0.194 | 0.085 | 0.116 | 0.263 | 0.313 | 0.223 | 0.328 |
跟踪/s | 0.281 | 0.245 | 0.232 | 0.285 | 0.420 | 0.536 | 0.575 | 0.649 |
总时间/s | 0.308 | 0.439 | 0.317 | 0.401 | 0.683 | 0.849 | 0.798 | 0.977 |
综上仿真实验结果表明,本文算法在身份转换、轨迹分段处理问题上优于其他算法,这归因于本文采用ILDA外观学习建立和自适应地更新外观模型,能够较好地处理跟踪目标距离较近且被长时间遮挡的情况,不易于产生身份转换;当两个目标距离相近且互相产生遮挡,此时轨迹置信度降低变为不可靠轨迹,在关联不可靠轨迹时,本文加入运动模型,对于两个外观相似的目标其运动特征存在差异,结合运动和外观模型进行相似度比较,有效减少身份转换的问题;在本文全局关联中,对于长时间遮挡后重新出现的跟踪目标,基于它的不可靠运动模型本文算法没有用简单匀速直线运动模型去预测,而是引入轨迹置信度加以判断,当置信度较低,其有效关联范围增大,可以使其关联到偏离相应轨迹较远的目标,能更好地处理因长时间遮挡产生的轨迹分段现象。
4 结论
本文提出了一种自适应在线ILDA外观学习的分层关联多目标跟踪方法,根据轨迹置信度的高低区分为可靠轨迹和不可靠轨迹,利用可靠轨迹的局部关联和不可靠轨迹的全局关联顺序地进行检测—跟踪关联。在全局关联阶段,引入运动模型进一步关联高度分段的轨迹,有效减少轨迹分段现象;在构建目标外观模型时采用了ILDA外观学习方法,能够在复杂场景中有效区分相互关联的多个对象,减少身份转换问题。在2D MOT 2015数据集上的仿真实验表明,该算法较当前大多数多目标跟踪算法能够有效稳定地跟踪目标并大大减少了身份转换和轨迹分段现象。
由于本文算法是基于检测的跟踪,今后研究工作的重点是在复杂场景中进一步提高跟踪算法的准确性,同时推广到多摄像机多目标的跟踪。
参考文献
-
Bernardin K, Stiefelhagen R. 2008. Evaluating multiple object tracking performance:the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008(1): 1-10 [DOI:10.1155/2008/246309]
-
Dollár P, Appel R, Belongie S, Perona P. 2014. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8): 1532-1545 [DOI:10.1109/TPAMI.2014.2300479]
-
Huang C, Li Y, Nevatia R. 2013. Multiple target tracking by learning-based hierarchical association of detection responses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4): 898-910 [DOI:10.1109/TPAMI.2012.159]
-
Kalman R E. 1960. A New approach to linear filtering and prediction problems. Journal of Basic Engineering Transactions, 82(1): 35-45 [DOI:10.1115/1.3662552]
-
Khan Z, Balch T, Dellaert F. 2005. MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11): 1805-1819 [DOI:10.1109/TPAMI.2005.223]
-
Kuhn H W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1/2): 83-97 [DOI:10.1002/nav.3800020109]
-
Li Y, Huang C and Nevatia R. 2009. Learning to associate: hybrid boosted multi-target tracker for crowded scene//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE: 2953-2960[DOI: 10.1109/CVPR.2009.5206735]
-
McLaughlin N, Del Rincon J M and Miller P. 2013. Online multiperson tracking with occlusion reasoning and unsupervised track motion model//Proceedings of the 10th IEEE International Conference on Advanced Video and Signal Based Surveillance. Krakow, Poland: IEEE: 37-42[DOI: 10.1109/AVSS.2013.6636613]
-
Milan A, Roth S, Schindler K. 2014. Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1): 58-72 [DOI:10.1109/tpami.2013.103]
-
Pang S N, Ozawa S, Kasabov N. 2005. Incremental linear discriminant analysis for classification of data streams. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(5): 905-914 [DOI:10.1109/TSMCB.2005.847744]
-
Pirsiavash H, Ramanan D and Fowlkes C C. 2011. Globally-optimal greedy algorithms for tracking a variable number of objects//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1201-1208[DOI: 10.1109/CVPR.2011.5995604]
-
Possegger H, Mauthner T, Roth P M and Bischof H. 2014. Occlusion geodesics for online multi-object tracking//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 1306-1313[DOI: 10.1109/CVPR.2014.170]
-
Qi M B, Yue Z L, Shu K, Jiang J G. 2017. Multi-object tracking using hierarchical data association based on generalized correlation clustering graphs. Acta Automatica Sinica, 43(1): 152-160 (齐美彬, 岳周龙, 疏坤, 蒋建国. 2017. 基于广义关联聚类图的分层关联多目标跟踪. 自动化学报, 43(1): 152-160) [DOI:10.16383/j.aas.2017.c150519]
-
Shu G, Dehghan A, Oreifej O, Hand E and Shah M. 2012. Part-based multiple-person tracking with partial occlusion handling//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 1815-1821[DOI: 10.1109/CVPR.2012.6247879]
-
Tesfaye Y T, Zemene E, Pelillo M, Prati A. 2016. Multi-object tracking using dominant sets. IET Computer Vision, 10(4): 289-297 [DOI:10.1049/iet-cvi.2015.0297]
-
Wang B, Wang L, Shuai B, Zuo Z, Liu T, Chan K L and Wang G. 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, NV, USA: IEEE: 386-393[DOI: 10.1109/CVPRW.2016.55]
-
Wu B and Nevatia R. 2006. Tracking of multiple, partially occluded humans based on static body part detection//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 951-958[DOI: 10.1109/CVPR.2006.312]
-
Yamaguchi K, Berg A C, Ortiz L E and Berg T L. 2011. Who are you with and where are you going?//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Providence, RI, USA: IEEE: 1345-1352[DOI: 10.1109/CVPR.2011.5995468]
-
Yang B and Nevatia R. 2012. An online learned CRF model for multi-target tracking//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 2034-2041[DOI: 10.1109/CVPR.2012.6247907]
-
Zamir A R, Dehghan A and Shah M. 2012. GMCP-Tracker: global multi-object tracking using generalized minimum clique graphs//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 343-356[DOI: 10.1007/978-3-642-33709-3_25]
-
Zhang L, Li Y and Nevatia R. 2008. Global data association for multi-object tracking using network flows//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587584]
-
Zhang L J, Zhou Z P. 2018. Multiple target tracking using hierarchical data association based on network flows. Journal of Computer-Aided Design and Computer Graphics, 30(9): 1670-1677 (张丽娟, 周治平. 2018. 基于网络流的分层关联多目标跟踪. 计算机辅助设计与图形学学报, 30(9): 1670-1677) [DOI:10.3724/SP.J.1089.2018.16906]
-
Zhu S H, Sun C J, Shi Z. 2016. Multi-target tracking via hierarchical association learning. Neurocomputing, 208: 365-372 [DOI:10.1016/j.neucom.2016.02.071]