结合时空一致性的FairMOT跟踪算法优化

彭嘉淇; 王涛; 陈柯安; 林巍峣

doi:10.11834/jig.220116

目标智能检测 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

结合时空一致性的FairMOT跟踪算法优化
Spatio-temporal consistency based FairMOT tracking algorithm optimization
2022年27卷第9期页码：2749-2760
纸质出版日期： 2022-09-16 ，

录用日期： 2022-04-29
DOI： 10.11834/jig.220116
稿件说明：

移动端阅览

彭嘉淇, 王涛, 陈柯安, 林巍峣. 结合时空一致性的FairMOT跟踪算法优化[J]. 中国图象图形学报, 2022,27(9):2749-2760.

Jiaqi Peng, Tao Wang, Kean Chen, Weiyao Lin. Spatio-temporal consistency based FairMOT tracking algorithm optimization[J]. Journal of Image and Graphics, 2022,27(9):2749-2760.
彭嘉淇, 王涛, 陈柯安, 林巍峣. 结合时空一致性的FairMOT跟踪算法优化[J]. 中国图象图形学报, 2022,27(9):2749-2760. DOI： 10.11834/jig.220116.

Jiaqi Peng, Tao Wang, Kean Chen, Weiyao Lin. Spatio-temporal consistency based FairMOT tracking algorithm optimization[J]. Journal of Image and Graphics, 2022,27(9):2749-2760. DOI： 10.11834/jig.220116.

摘要

目的

视频多目标跟踪(multiple object tracking，MOT)是计算机视觉中的一项重要任务，现有研究分别针对目标检测和目标关联部分进行改进，均忽视了多目标跟踪中的不一致问题。不一致问题主要包括3方面，即目标检测框中心与身份特征中心不一致、帧间目标响应不一致以及训练测试过程中相似度度量方式不一致。为了解决上述不一致问题，本文提出一种基于时空一致性的多目标跟踪方法，以提升跟踪的准确度。

方法

从空间、时间以及特征维度对上述不一致性进行修正。对于目标检测框中心与身份特征中心不一致，针对每个目标检测框中心到特征中心之间的空间差异，在偏移后的位置上提取目标的ReID(re-identification)特征；对帧间响应不一致，使用空间相关计算相邻帧之间的运动偏移信息，基于该偏移信息对前一帧的目标响应进行变换后得到帧间一致性响应信息，然后对目标响应进行增强；对训练和测试过程中的相似度度量不一致，提出特征正交损失函数，在训练时考虑目标两两之间的相似关系。

结果

在3个数据集上与现有方法进行比较。在MOT17、MOT20和Hieve数据集中，MOTA(multiple object tracking accuracy)值分别为71.2%、60.2%和36.1%，相比改进前的FairMOT算法分别提高了1.6%、3.2%和1.1%。与大多数其他现有方法对比，本文方法的MT(mostly tracked)比例更高，ML(mostly lost)比例更低，跟踪的整体性能更好。同时，在MOT17数据集中进行对比实验验证融合算法的有效性，结果表明提出的方法显著改善了多目标跟踪中的不一致问题。

结论

本文提出的一致性跟踪方法，使特征在时间、空间以及训练测试中达成了更好的一致性，使多目标跟踪结果更加准确。

Abstract

Objective

Video-based multiple object tracking is one of the essential tasks in computer vision like automatic driving and intelligent video surveillance system. Most of the multiple object tracking methods tend to obtain object detection results first. The integrated strategies are used to link detection bounding boxes and form object trajectories. Current object detection contexts have been developing recently. But

the challenging inconsistency issues are required to be resolved in multiple object tracking

which affected the multi-objects tracking accuracy. The multi-objects tracking inconsistency can be classified into three types as mentioned below: 1) the inconsistency between the centers of the object bounding boxes and those object identity features. Many multiple object tracking methods are extracted the object re-identification(ReID) features at the object bounding boxes centers and these features are used to in associate with objects. However

those oriented ReID features are incapable to reflect the appearance of objects accurately due to the occlusion. The offsets are appeared between the best ReID feature extraction positions and bounding box centers. Current feature extraction strategy will lead to the spatial consistency problem. 2) The inconsistency of the object center response between consecutive frames. Some objects can be detected and tracked in the contexted frames due to the occlusion in videos. It causes consecutive frames loss and the inconsistency between the object-center-responsed heatmaps of two consecutive frames. 3) The inconsistency of the similarity assessment in the training process and testing process. Most of association step is considered as a classification problem and the cross entropy loss is used to train the model while the inter-object relations are ignored in the testing process. The feature cosine similarities of each pair of objects are used to associate them. To improve the accuracy of tracking

we facilitate a multiple object tracking method based on consistency optimization.

Method

These inconsistencies issues are validated based on spatial

temporal and featured scales. In view of the inconsistency between the centers of the object detection bounding boxes and the identity features

we predict the offsets from the centers of the detection bounding boxes to the feature centers for each object. To predict the best ReID feature extraction positions

we use the object centers and the offsets as well. We extract the ReID features of objects at those predicted positions and use these features to reflect objects. In view of the inconsistency of the response between frames

the spatial correlation module is used to calculate the offset information between adjacent frames. Based on the offset information

the object center response of the previous frame is transformed by deformable convolution to obtain the inter-frame consistency response information

which is enhanced to the current frame. To resolve the inconsistency of similarity measures in training and test process

we develop a feature orthogonal loss function

which considers the similarity relationship between the two objects during training. To detect and track objects

we integrate these three improved consistency results with FairMOT method.

Result

The performance of our method is compared to existing methods on three datasets. The comparative results are illustrated as following: 1) our multiple object tracking accuracy(MOTA) value is 71.2% on the MOT17 dataset

which is increased by 1.6% in comparison with the FairMOT method without consistency improvement; 2) the MOTA value is 60.2% on the MOT20 dataset

which is increased by 3.2%; 3) the MOTA value is 36.1% on the Hieve dataset

which is increased by 1.1%. At the same time

we conduct ablation studies on MOT17 dataset to verify the effectiveness of different components in our method

which shows that the proposed method improves the consistency in multiple objects tracking significantly. In ablation studies

we find that the identity switch numbers are decreased via the added ReID feature extraction position offsets and the feature orthogonal loss function. The model-based extraction position offsets can get the object appearance features at the right positions and the feature orthogonal loss function can learn the object appearance features in the right way. We also visualize the predicted ReID feature extraction positions and object bounding boxes centers

and the visualization results show that our predicted positions are closer to the object appearance features rather than the physical centers

which is feasible to the extraction position offsets.

Conclusion

Our multiple objects tracking method can achieve the spatio-temporal consistency of object features better in training and testing

which makes the model track objects more accurately.

关键词

多目标跟踪(MOT)一致性特征提取位置偏移特征正交损失帧间增强

Keywords

multiple object tracking (MOT)consistencyfeature extraction position offsetfeature orthogonal lossinter-frame enhancement

references

Bergmann P, Meinhardt T and Leal-Taixé L. 2019. Tracking without bells and whistles//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 941-951 [DOI: 10.1109/iccv.2019.00103http://dx.doi.org/10.1109/iccv.2019.00103]

Bewley A, Ge Z Y, Ott L, Ramos F and Upcroft B. 2016. Simple online and realtime tracking//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, USA: IEEE: 3464-3468 [DOI: 10.1109/ICIP.2016.7533003http://dx.doi.org/10.1109/ICIP.2016.7533003]

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]

Gao B. 2021. Research on Multi-Target Tracking Algorithm Fusing Full Convolutional Twin Network and ReID. Qinhuangdao: Yanshan University

高博. 2021. 融合全卷积孪生网络和ReID网络的多目标跟踪算法研究. 秦皇岛: 燕山大学

Li J H, Gao X and Jiang T T. 2020. Graph networks for multiple object tracking//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 708-717 [DOI: 10.1109/wacv45572.2020.9093347http://dx.doi.org/10.1109/wacv45572.2020.9093347]

Lin W Y, Liu H B, Liu S Z, Li Y X, Qian R, Wang T, Xu N, Xiong H K, Qi G J and Sebe N. 2021. Human in events: a large-scale benchmark for human-centric video analysis in complex events [EB/OL]. [2021-05-14].https://arxiv.org/pdf/2005.04490.pdfhttps://arxiv.org/pdf/2005.04490.pdf

Meinhardt T, Kirillov A, Leal-Taixé L and Feichtenhofer C. 2022. TrackFormer: multi-object tracking with transformers [EB/OL]. [2022-04-29].https://arxiv.org/pdf/2101.02702.pdfhttps://arxiv.org/pdf/2101.02702.pdf

Pang B, Li Y Z, Zhang Y F, Li M C and Lu C W. 2020. TubeTK: adopting tubes to track multi-object in a one-step training model//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6307-6317 [DOI: 10.1109/cvpr42600.2020.00634http://dx.doi.org/10.1109/cvpr42600.2020.00634]

Pang J M, Qiu L L, Li X, Chen H F, Li Q, Darrel T and Yu F. 2021. Quasi-Dense similarity learning for multiple object tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 164-173 [DOI: 10.1109/cvpr46437.2021.00023http://dx.doi.org/10.1109/cvpr46437.2021.00023]

Peng J L, Wang C A, Wan F B, Wu Y, Wang Y B, Tai Y, Wang C J, Li J L, Huang F Y and Fu Y W. 2020. Chained-Tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 145-161 [DOI: 10.1007/978-3-030-58548-8_9http://dx.doi.org/10.1007/978-3-030-58548-8_9]

Qin J, Huang C and Xu J. 2021. End-to-End Multiple Object Tracking with Siamese Networks//Mantoro T, Lee M, Ayu M A, Wong K W, Hidayanto A N. eds. Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham

Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031]

Wang Y X, Kitani K and Weng X S. 2021. Joint object detection and multi-object tracking with graph neural networks//Proceedings of 2021 IEEE International Conference on Robotics and Automation. Xi′an, China: IEEE: 13708-13715 [DOI: 10.1109/icra48506.2021.9561110http://dx.doi.org/10.1109/icra48506.2021.9561110]

Wang Z D, Zheng L, Liu Y X, Li Y L and Wang S J. 2020. Towards real-time multi-object tracking//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 107-122 [DOI: 10.1007/978-3-030-58621-8_7http://dx.doi.org/10.1007/978-3-030-58621-8_7]

Wojke N, Bewley A and Paulus D. 2017. Simple online and realtime tracking with a deep association metric//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE: 3645-3649 [DOI: 10.1109/ICIP.2017.8296962http://dx.doi.org/10.1109/ICIP.2017.8296962]

Wu J L, Cao J L, Song L C, Wang Y, Yang M and Yuan J S. 2021. Track to detect and segment: an online multi-object tracker//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 12347-12356 [DOI: 10.1109/cvpr46437.2021.01217http://dx.doi.org/10.1109/cvpr46437.2021.01217]

Xu Y H, Ban Y T, Delorme G, Gan C, Rus D and Alameda-Pineda X. 2022. TransCenter: transformers with dense queries for multiple-object tracking [EB/OL]. [2022-04-28].https://arxiv.org/pdf/2103.15145.pdfhttps://arxiv.org/pdf/2103.15145.pdf

Yang L, Fan Y and Xu N. 2019. Video instance segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5187-5196 [DOI: 10.1109/ICCV.2019.00529http://dx.doi.org/10.1109/ICCV.2019.00529]

Yu F W, Li W B, Li Q Q, Liu Y, Shi X H and Yan J J. 2016. POI: multiple object tracking with high performance detection and appearance feature//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: #9914 [DOI: 10.1007/978-3-319-48881-3_3http://dx.doi.org/10.1007/978-3-319-48881-3_3]

Zhang Y, Sheng H, Wu Y B, Wang S, Ke W and Xiong Z. 2020. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 7(9): 7892-7902 [DOI: 10.1109/jiot.2020.2996609]

Zhang Y F, Wang C Y, Wang X G, Zeng W J and Liu W Y. 2021. FairMOT: on the fairness ofdetection and re-identification in multiple object tracking [EB/OL]. [2021-10-19]https://arxiv.org/pdf/2004.01888.pdfhttps://arxiv.org/pdf/2004.01888.pdf

Zhou X Y, Koltun V and Krähenbühl P. 2020. Tracking objects as points//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 474-490 [DOI: 10.1007/978-3-030-58548-8_28http://dx.doi.org/10.1007/978-3-030-58548-8_28]

Zhu J, Yang H, Liu N, Kim M, Zhang W J and Yang M H. 2018. Online multi-object tracking with dual matching attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 379-396 [DOI: 10.1007/978-3-030-01228-1_23http://dx.doi.org/10.1007/978-3-030-01228-1_23]

Zhu S S, Wang H and Yan H. 2022. Multi-object tracking based on intra-frame relationship modeling and self-attention fusion mechanism. Control and Decision: 1-10

朱姝姝, 王欢, 严慧. 2022. 基于帧内关系建模和自注意力融合的多目标跟踪方法. 控制与决策: 1-10 [DOI:10.13195/j.kzyjc.2021.1188]

文章被引用时，请邮件提醒。

提交

局部自适应的灰度图像彩色化