Spatiotemporal distance and multiple networks mutual learning-relevant pedestrian re-identification

Li Kuan; Gong Xun; Fan Jianfeng

doi:10.11834/jig.220668

Person Re-identification | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Spatiotemporal distance and multiple networks mutual learning-relevant pedestrian re-identification
Vol. 28, Issue 5, Pages: 1409-1421(2023)
Published： 16 May 2023 ，
DOI： 10.11834/jig.220668
稿件说明：

移动端阅览

李宽，龚勋，樊剑锋. 2023. 结合时空距离的多网络互学习行人重识别. 中国图象图形学报， 28(05):1409-1421

Li Kuan， Gong Xun， Fan Jianfeng. 2023. Spatiotemporal distance and multiple networks mutual learning-relevant pedestrian re-identification. Journal of Image and Graphics， 28(05):1409-1421
李宽，龚勋，樊剑锋. 2023. 结合时空距离的多网络互学习行人重识别. 中国图象图形学报， 28(05):1409-1421 DOI： 10.11834/jig.220668.

Li Kuan， Gong Xun， Fan Jianfeng. 2023. Spatiotemporal distance and multiple networks mutual learning-relevant pedestrian re-identification. Journal of Image and Graphics， 28(05):1409-1421 DOI： 10.11834/jig.220668.

摘要

目的

在真实行人识别场景中，获得准确的标注需要耗费大量人力，因此无监督领域自适应成为行人重识别具有潜力的研究方向，这类方法通常需要聚类生成伪标签，往往会存在噪音。此外，在行人搜索过程中，好的排序算法也是取得更好识别性能的关键，但寻常的Re-Ranking排序优化由于巨大的性能消耗，限制了在真实场景下的应用。针对这两个问题，本文提出了一个联合多网络、分摄像头训练的框架，利用时空信息对排序进行优化。

方法

对源域数据使用有监督进行预训练，利用未标记的目标域样本进行多个网络模型的深度互学习无监督训练，提高网络的泛化能力，同时在训练过程中进行分摄像头处理，减小跨摄像头的影响，提升伪标签的质量。在排序匹配阶段利用时空信息对排序进行优化，进一步提升匹配性能。

结果

实验在2个跨域实验数据集上进行测试比较，在源域为DukeMTMC-ReID（Duke multi-tracking multi-camera re-identification）数据集，目标域为Market-1501数据集的实验中，本文方法的平均精度均值（mean average precision，mAP）和Rank1分别为82.5%和95.3%；在源域为Market-1501，目标域为DukeMTMC-ReID数据集的实验中，mAP和Rank1分别为75.3%和90.2%。

结论

提出的结合时空距离排序的分摄像头网络互学习模型，提升了伪标签的精度，并优化了匹配排序，相比于其他优化算法大幅减少了计算量，进一步提升了行人重识别性能。

Abstract

Objective

Pedestrian re-identification can be focused on real-time target detection and matching. Due to labor-intensive to annotate accurate labels，unsupervised domain adaptation has become a potential solution. To generate pseudo labels， this method is required for clustering accompany with noise. Experimental analysis is demonstrated that camera-cross is one of the key distorted factors for noise. Current eigenvector method is oriented to weaken the cross-domain and it is challenged for identifying camera ID-based information effectively. Hence， we design a camera module to resolve the problem of camera-cross. In addition， a single network is often used to extract features. Experimental analysis illustrate the single feature extraction ability of a single backbone network would also have more effective impact on the final performance. Therefore， learning-mutual is used to optimize the single network. For pedestrian searching， a good ranking algorithm is beneficial for a better recognition performance. We optimize traditional re-ranking algorithm using spatio-temporal information in the dataset because regular re-ranking optimization limits its application in real scenarios due to huge performance consumption. The time and space consumption close to the original ranking can reach the traditional re-ranking effect. To optimize the ranking， we develop a joint of spatio-temporal information-relevant multi-network and camera-splitting training framework.

Method

First， to improve the initial recognition performance， the network is pre-trained on the source domain dataset， and two of loss functions in relevance with label smoothing cross-entropy and triplet are used to pre-train the source domain. Second， due to the unique features extracted from a single backbone network， the single network model cannot be used to preserve good generalization ability in the ever-changing real scenarios. Therefore， we design a learning-mutual model to enhance its robustness. The pedestrian re-identification-oriented camera-split strategy is implemented to deal with recognition interference derived from cross-camera. For the pseudo-label generation， the dataset is split according to the camera ID， and the output vector is averaged after different networks-toward input. Additionally， we make full use of spatial information to optimize the pedestrian re-identification algorithm in another dimension because prior recognition analyses are originated from the distribution factors of pedestrians. For example， since the same camera is relatively close under the same timestamp， we use the timestamp information in the image. The one-hot-coded time stamps are spliced into feature vectors and it is then clustered to obtain pseudo labels. For training， to transfer knowledge from one network model to another， we use the class prediction of each network model as a soft label for training other related network models. For learning-mutual module， a time-averaging model is added， which can be updated iteratively during the training process. To suppress error-amplified， a large amount of prior information can be preserved. Furthermore， the learning-mutual correlation loss function is designed as well. Traditional classification loss and triplet loss are modified， and the loss function is designed on the basis of the integration of pseudo-labels and multiple backbone networks-related features. The network model training-based feature distribution can be constrained by multiple network models at the same time. For features-sorting， to optimize the traditional sort algorithm， pedestrian re-identification characteristics and spatiotemporal information of the dataset can be used according to the cameras of the same pseudo label number. The distribution of timestamp and statistics is used to generate the time distribution between different cameras， and a spatiotemporal score of camera is defined to fine-tune distance-between characteristics. This method is focused on a re-ranking spatially and the efficient and effective method can achieve similar spatio-temporal results close to the original ranking.

Result

The comparative analysis is carried out and popular 10 methods are compared on two cross-domain experimental datasets. For source domain-relevant Duke multi-tracking multi-camera reidentification（DukeMTMC-ReID） data set and target domain-related market-1501 dataset， the mean average precision （mAP） value can be reached to 82.5%， and the Rank1 is increased by 3.1% and reached to 95.3%. For the dataset in relevant to source domain market-1501 and target domain DukeMTMC-ReID， mAP and Rank1 can be reached to 75.3% and 90.2% of each.

Conclusion

To improve the accuracy of pseudo labels and optimize the matching ranking， the spatiotemporal distance ranking-coordinated learning-mutual model is developed in sub-camera network. Its computation is optimized more and pedestrian re-recognition performance is improved further.

关键词

行人重识别互学习分摄像头跨域时空距离

Keywords

pedstrian re-IDmutual learningmultiple camerascross domaintime and space distance

references

Bai X， Yang X W， Longin Jan Latecki， Liu W and Tu Z W. 2009. Learning context-sensitive shape similarity by graph transduction. IEEE Transactions on Pattern Analysis and Machine Intelligence， 32（5）： 861-874 ［DOI： 10.1109/TPAMI.2009.85http://dx.doi.org/10.1109/TPAMI.2009.85］

Bertocco G C， Andaló F and Rocha A. 2021. Unsupervised and self-adaptative techniques for cross-domain person re-identification. IEEE Transactions on Information Forensics and Security， 16： 4419-4434 ［DOI： 10.1109/TIFS.2021.3107157http://dx.doi.org/10.1109/TIFS.2021.3107157］

Deng W J， Zheng L， Ye Q X， Kang G L， Yang Y and Jiao J B. 2018. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 994-1003 ［DOI： 10.1109/CVPR.2018.00110http://dx.doi.org/10.1109/CVPR.2018.00110］

Fan H H， Zheng L， Yan C G and Yang Y. 2018. Unsupervised person re-identification： clustering and fine-tuning. ACM Transactions on Multimedia Computing， Communications， and Applications， 14（4）： #83 ［DOI： 10.1145/3243316http://dx.doi.org/10.1145/3243316］

Ge Y X，Zhu F，Chen D，Zhao R and Li H S. 2020. Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. Advances in Neural Information Processing Systems. 33： 11309-11321［DOI： 10.48550/ arXiv.2006.02713http://dx.doi.org/10.48550/arXiv.2006.02713］

Geng W F， Wang X， Jing L P and Yu J. 2023. Consensus graph learning-based self-supervised ensemble clustering. Journal of Image and Graphics， 28（04）： 1069-1078

狄伟峰，王翔，景丽萍，于剑.2023.共识图学习驱动的自监督集成聚变. 中国图象图形学报， 28（04）： 1069-1078 ［DOI： 10.11834/jig.210947http://dx.doi.org/10.11834/jig.210947］

Gong X， Zhang Z Y， Liu L， Ma B and Wu K L. 2022. A survey of human-object interaction detection. Journal of Southwest Jiaotong University， 57（4）： 693-704

龚勋，张志莹，刘璐，马冰，吴昆伦. 2022. 人物交互检测研究进展综述. 西南交通大学学报， 57（4）： 693-704 ［DOI： 10.3969/j.issn.0258-2724.20210339http://dx.doi.org/10.3969/j.issn.0258-2724.20210339］

Jegou H， Schmid C， Harzallah H and Verbeek J. 2010. Accurate image search using the contextual dissimilarity measure. IEEE Transactions on Pattern Analysis and Machine Intelligence， 32（1）： 2-11 ［DOI： 10.1109/TPAMI.2008.285http://dx.doi.org/10.1109/TPAMI.2008.285］

Jin X， Lan C L， Zeng W J and Chen Z B. 2020. Global distance-distributions separation for unsupervised person re-identification//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 735-751 ［DOI： 10.1007/978-3-030-58571-6_43http://dx.doi.org/10.1007/978-3-030-58571-6_43］

Mekhazni D， Bhuiyan A， Ekladious G and Granger E. 2020. Unsupervised domain adaptation in the dissimilarity space for person re-identification//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 159-174 ［DOI： 10.1007/978-3-030-58583-9_10http://dx.doi.org/10.1007/978-3-030-58583-9_10］

Peng P X， Xiang T， Wang Y W， Pontil M， Gong S G， Huang T J and Tian Y H. 2016. Unsupervised cross-dataset transfer learning for person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1306-1315 ［DOI： 10.1109/CVPR.2016.146http://dx.doi.org/10.1109/CVPR.2016.146］

Qi L， Wang L， Huo J， Zhou L P， Shi Y H and Gao Y. 2019. A novel unsupervised camera-aware domain adaptation framework for person re-identification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 8079-8088 ［DOI： 10.1109/ICCV.2019.00817http://dx.doi.org/10.1109/ICCV.2019.00817］

Ristani E， Solera F， Zou R， Cucchiara R and Tomasi C. 2016. Performance measures and a data set for multi-target， multi-camera tracking//Computer Vision. Amsterdam， the Netherlands： Springer： 17-35［DOI： 10.1007/978-3-319-48881-3_2http://dx.doi.org/10.1007/978-3-319-48881-3_2］

Sarfraz M S， Schumann A， Eberle A and Stiefelhagen R. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 420-429 ［DOI： 10.1109/CVPR.2018.00051http://dx.doi.org/10.1109/CVPR.2018.00051］

Song L C， Wang C， Zhang L F， Du B， Zhang Q， Huang C and Wang X G. 2020. Unsupervised domain adaptive re-identification： theory and practice. Pattern Recognition， 102： #107173 ［DOI： 10.1016/j.patcog.2019.107173http://dx.doi.org/10.1016/j.patcog.2019.107173］

Wei L H， Zhang S L， Gao W and Tian Q. 2018. Person transfer GAN to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake， USA： IEEE： 79-88 ［DOI： 10.1109/CVPR.2018.00016http://dx.doi.org/10.1109/CVPR.2018.00016］

Zeng K W， Ning M N， Wang Y H and Guo Y. 2020. Hierarchical clustering with hard-batch triplet loss for person re-identification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 13654-13662 ［DOI： 10.1109/CVPR42600.2020.01367http://dx.doi.org/10.1109/CVPR42600.2020.01367］

Zhai Y P， Ye Q X， Lu S J， Jia M X， Ji R R and Tian Y H. 2020. Multiple expert brainstorming for domain adaptive person re-identification//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 594-611 ［DOI： 10.1007/978-3-030-58571-6_35http://dx.doi.org/10.1007/978-3-030-58571-6_35］

Zhang X M， Jiang M Y， Zheng Z D，Tan X， Ding E and Yang Y. 2020.Understanding image retrieval re-ranking： a graph neural network perspective. //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： #7620 ［DOI： 10.48550/arXiv.2012.07620http://dx.doi.org/10.48550/arXiv.2012.07620］

Zhang Y， Xiang T， Hospedales T M and Lu H C. 2018. Deep mutual learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 4320-4328 ［DOI： 10.1109/CVPR.2018.00454http://dx.doi.org/10.1109/CVPR.2018.00454］

Zheng K C， Liu W， He L X， Mei T， Luo J B and Zha Z J. 2021a. Group-aware label transfer for domain adaptive person re-identification//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 5306-5315 ［DOI： 10.1109/CVPR46437.2021.00527http://dx.doi.org/10.1109/CVPR46437.2021.00527］

Zheng L， Shen L Y， Tian L， Wang S J， Wang J D and Tian Q. 2015. Scalable person re-identification： a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 1116-1124 ［DOI： 10.1109/ICCV.2015.133http://dx.doi.org/10.1109/ICCV.2015.133］

Zheng Y， Tang S X， Teng G L， Ge Y X， Liu K J， Qin J， Qi D L and Chen D P. 2021b. Online pseudo label generation by hierarchical cluster dynamics for adaptive person re-identification//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 8351-8361 ［DOI： 10.1109/ICCV48922.2021.00826http://dx.doi.org/10.1109/ICCV48922.2021.00826］

Zhong Z， Zheng L， Cao D L and Li S Z. 2017. Re-ranking person re-identification with k-reciprocal encoding//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 3652-3661 ［DOI： 10.1109/CVPR.2017.389http://dx.doi.org/10.1109/CVPR.2017.389］

Zhong Z， Zheng L， Li S Z and Yang Y. 2018a. Generalizing a person retrieval model hetero- and homogeneously//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 176-192 ［DOI： 10.1007/978-3-030-01261-8_11http://dx.doi.org/10.1007/978-3-030-01261-8_11］

Zhong Z， Zheng L， Luo Z， Li S and Yang Y. 2019. Invariance matters： exemplar memory for domain adaptive person re-identification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 598-607 ［DOI： 10.1109/CVPR.2019.00069http://dx.doi.org/10.1109/CVPR.2019.00069］

Zhong Z， Zheng L， Zheng Z D， Li S Z and Yang Y. 2018b. Camera style adaptation for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5157-5166 ［DOI： 10.1109/CVPR.2018.00541http://dx.doi.org/10.1109/CVPR.2018.00541］

Alert me when the article has been cited

提交

Single-modality self-supervised information mining for cross-modality person re-identification