目的 多目标跟踪是计算机视觉领域中一项重要研究内容。复杂场景下目标频繁和长时间的遮挡、跟踪目标外观相似引起身份转换等问题给多目标跟踪带来许多挑战。针对多目标跟踪在复杂场景中因长时间遮挡引起身份转换和轨迹分段的问题，提出一种基于自适应在线判别外观学习的分层关联多目标跟踪算法。方法 首先利用轨迹置信度将多目标跟踪分为局部关联和全局关联两个层次。在第一层局部关联中，置信度高的可靠轨迹利用外观、位置-大小相似度与当前帧检测点进行关联，在第二层全局关联中，置信度低的不可靠轨迹引入运动模型和有效关联范围进一步关联分段的轨迹。在提取目标外观特征时引入增量线性可判别分析方法（Incremental Linear Discriminant Analysis）以解决身份转换问题，依据新增样本与目标样本均值的外观特征差异自适应地更新目标外观模型。结果 在公开数据集2D MOT2015 中的PETS09-S2L1、TUD-Stadmitte、Town-Center 3个数据集中与当前10种多目标跟踪算法进行了比较，该方法对各个数据集身份转换和轨迹分段都有减少，其中在Town-Center数据集中，身份转换减少了60个，轨迹分段减少了84个，跟踪准确度提高了5.2％以上。结论 本文所提出的一种自适应增量线性可判别外观学习的分层关联在线多目标跟踪方法，能够在复杂场景中稳定有效地实现多目标跟踪，减少了轨迹分段现象，其中引入的在线线性可判别外观学习对遮挡产生的身份转换具有良好的解决效果。
Multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association
Fang Lan,Yu Fengqin(School of Internet of Things Engineering,Jiangnan University)
Objective Multi-object tracking is an important research topic in the field of computer vision. Although lots of previous research have dealt with varieties of particular problems in multi-object tracking, there are still many challenges, such as object detection errors, missed detection, frequent and long-term occlusion of objects in complex scenes, identity switches of tracking objects with similar appearance, etc., all of which are easy to lead to trajectory drift or tracking interruption. With the improvement of object detection, the object tracking method based on detection shows good performance. The key of tracking-by-detection algorithm is the data association between detection points, which mainly consists of two types: frame-by-frame association and multi-frame association. Frame-by-frame data association refers to the association between detection points in the two consecutive frames, which is carried out according to the properties of detection points, such as appearance, location and size. Due to the frame-by-frame data association only contains the information of the previous two frames, tracking drift or failure is likely to occur when object is blocked, misdetected or similar appearance exist. Multi-frame data association establish a relational model by using object detection information of multiple frames rather than only previous two frames. This can effectively reduce the object error association and deal with occlusion. However, if the occlusion time is longer than the time segment needed for multi-frame data association, the detection points before and after still cannot be successfully associated, and the tracking will also be interrupted. Besides that, this method needs all detection information before tracking, which cannot meet the real-time requirement. Aiming at the problems of ID switches and trajectory fragmentation caused by long-term occlusion, an online multi-object tracking algorithm based on adaptive online discriminative appearance learning and hierarchical association is proposed for multi-object tracking in complex scenes, which combines the low-level appearance, position-size characteristics used in local association and high-level motion model established in global association and can meet the real-time tracking requirement. Method In this study, according to track confidence, multi-object tracking is divided into two stages: local association and global association. The establishment of the object robust appearance model is the key to local association and global association. For addressing the problem of identity switches, an online Incremental Linear Discriminant Analysis method (ILDA) was introduced to discriminate the appearances of objects and adaptively update the object appearance models based on the difference value between the new sample and the mean of object samples. In the local association stage, the reliable tracklet with high confidence is associated with the current frame detections by low-level properties of detection points: appearance and position-size similarity, which allows reliable trajectories to grow constantly. In the global association stage, the unreliable tracklet with low confidence resulted from long-term occlusion is further associated. In this stage, the candidate object consist of two kinds, one is the detection points that are not associated in local association, the other one is continuous trajectory with high confidence meeting the time condition, the end time of trajectory is before the current time. When we associate detection points that reappear after long-term occlusion, due to the unreliable motion dynamics of unreliable objects, only appearance similarity is utilized within a validation range without the position-size property. At the same time, introducing an valid association range is related to the trajectory confidence, once the track confidence is reduced, the valid association range is increased because the distance between a drifting track and the corresponding object can grow larger if the track drift persists. This allows us to reassign drifting tracks to detections of reappearing objects which is even far away from the corresponding tracks. When two track fragments are associated, a motion model is introduced to determine whether the two trajectories belong to the same object. In the condition of that, the average velocity vector angle of two track fragments is larger than a threshold, indicating that it may include unreliable tracks, thus we only consider appearance similarity between the pair, otherwise, we combine appearance, position-size and motion similarity to make an association between the pair. If two track fragments are associated successfully, the linear interpolation is used to fill lost interval of this object, so that two trajectory fragments can be connected effectively. Result We compared our method with 10 state-of-the-art multi-object tracking algorithms, including five offline tracking approaches and five online tracking methods on 3 public datasets, namely, PETS09-S2L1, TUD-Stadmitte and Town-Center. The quantitative evaluation metrics contained multi-object tracking accuracy (MOTA), multi-object tracking precision (MOTP), the number of identity switches (IDS), the ratio of more than 80% of the video frames tracked (MT), the ratio of lower than 20% of the video frames tracked (ML) and the number of trajectory fragments (FG). The experiment results illustrate that our tracking method outperforms in MOTA and MOTP compared with selected online multi-object tracking methods, which include two tracking approaches based on hierarchical association. In addition, the proposed approach preform almost the same or even better when compared with offline tracking methods. In PETS09-S2L1 data set, the proposed approaches is superior to other comparators in MOTP, IDS and FG, MOTP increased by 6.1％, IDS reduced 5, FG reduced 21; In TUD-Stadmitte data set, IDS reduced 4. Compared with online tracking approaches, the MOTP and MOTA increased by 36.3％,11.1％respectively. In Town-Center data set, MOTA and MT increased by 5.2％, 16.9％respectively, IDS and FG reduced 60, 84 respectively, ML decreased by 1.5％. Conclusion In this study, we take the idea of hierarchical data association, proposing a multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association. The experiment results indicate that our method has a good solution to the problems of ID switches and trajectory fragmentation caused by long-term occlusion in complex scenes.