结合时空距离的多网络互学习行人重识别

李宽; 龚勋; 樊剑锋

发布时间： 2023-05-16
摘要点击次数： 724
全文下载次数： 613
DOI: 10.11834/jig.220668
2023 | Volume 28 | Number 5

结合时空距离的多网络互学习行人重识别

李宽¹, 龚勋^1,2,3,4, 樊剑锋¹(1.西南交通大学唐山研究生院, 唐山 063000;2.西南交通大学计算机与人工智能学院, 成都 611756;3.可持续城市交通智能化教育部工程研究中心, 成都 611756;4.四川省制造业产业链协同与信息化支撑技术重点实验室, 成都 610031)

摘要

目的在真实行人识别场景中,获得准确的标注需要耗费大量人力,因此无监督领域自适应成为行人重识别具有潜力的研究方向,这类方法通常需要聚类生成伪标签,往往会存在噪音。此外,在行人搜索过程中,好的排序算法也是取得更好识别性能的关键,但寻常的Re-Ranking排序优化由于巨大的性能消耗,限制了在真实场景下的应用。针对这两个问题,本文提出了一个联合多网络、分摄像头训练的框架,利用时空信息对排序进行优化。方法对源域数据使用有监督进行预训练,利用未标记的目标域样本进行多个网络模型的深度互学习无监督训练,提高网络的泛化能力,同时在训练过程中进行分摄像头处理,减小跨摄像头的影响,提升伪标签的质量。在排序匹配阶段利用时空信息对排序进行优化,进一步提升匹配性能。结果实验在2个跨域实验数据集上进行测试比较,在源域为DukeMTMC-ReID (Duke multi-tracking multi-camera re-identification)数据集,目标域为Market-1501数据集的实验中,本文方法的平均精度均值(mean average precision,mAP)和Rank1分别为82.5%和95.3%;在源域为Market-1501,目标域为DukeMTMC-ReID数据集的实验中,mAP和Rank1分别为75.3%和90.2%。结论提出的结合时空距离排序的分摄像头网络互学习模型,提升了伪标签的精度,并优化了匹配排序,相比于其他优化算法大幅减少了计算量,进一步提升了行人重识别性能。

关键词

行人重识别互学习分摄像头跨域时空距离

Spatiotemporal distance and multiple networks mutual learning-relevant pedestrian re-identification

Li Kuan¹, Gong Xun^1,2,3,4, Fan Jianfeng¹(1.Graduate School of Tangshan, Southwest Jiaotong University, Tangshan 063000, China;2.School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China;3.Engineering Research Center of Sustainable Urban Intelligent Transportation, Chengdu 611756, China;4.Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Chengdu 610031, China)

Abstract

Objective Pedestrian re-identification can be focused on real-time target detection and matching. Due to laborintensive to annotate accurate labels， unsupervised domain adaptation has become a potential solution. To generate pseudo labels，this method is required for clustering accompany with noise. Experimental analysis is demonstrated that cameracross is one of the key distorted factors for noise. Current eigenvector method is oriented to weaken the cross-domain and it is challenged for identifying camera ID-based information effectively. Hence，we design a camera module to resolve the problem of camera-cross. In addition，a single network is often used to extract features. Experimental analysis illustrate the single feature extraction ability of a single backbone network would also have more effective impact on the final performance. Therefore，learning-mutual is used to optimize the single network. For pedestrian searching，a good ranking algorithm is beneficial for a better recognition performance. We optimize traditional re-ranking algorithm using spatio-temporal information in the dataset because regular re-ranking optimization limits its application in real scenarios due to huge performance consumption. The time and space consumption close to the original ranking can reach the traditional re-ranking effect. To optimize the ranking，we develop a joint of spatio-temporal information-relevant multi-network and camerasplitting training framework. Method First，to improve the initial recognition performance，the network is pre-trained on the source domain dataset，and two of loss functions in relevance with label smoothing cross-entropy and triplet are used to pre-train the source domain. Second，due to the unique features extracted from a single backbone network，the single network model cannot be used to preserve good generalization ability in the ever-changing real scenarios. Therefore，we design a learning-mutual model to enhance its robustness. The pedestrian re-identification-oriented camera-split strategy is implemented to deal with recognition interference derived from cross-camera. For the pseudo-label generation，the dataset is split according to the camera ID，and the output vector is averaged after different networks-toward input. Additionally，we make full use of spatial information to optimize the pedestrian re-identification algorithm in another dimension because prior recognition analyses are originated from the distribution factors of pedestrians. For example，since the same camera is relatively close under the same timestamp，we use the timestamp information in the image. The one-hot-coded time stamps are spliced into feature vectors and it is then clustered to obtain pseudo labels. For training，to transfer knowledge from one network model to another，we use the class prediction of each network model as a soft label for training other related network models. For learning-mutual module，a time-averaging model is added，which can be updated iteratively during the training process. To suppress error-amplified，a large amount of prior information can be preserved. Furthermore，the learningmutual correlation loss function is designed as well. Traditional classification loss and triplet loss are modified，and the loss function is designed on the basis of the integration of pseudo-labels and multiple backbone networks-related features. The network model training-based feature distribution can be constrained by multiple network models at the same time. For features-sorting，to optimize the traditional sort algorithm，pedestrian re-identification characteristics and spatiotemporal information of the dataset can be used according to the cameras of the same pseudo label number. The distribution of timestamp and statistics is used to generate the time distribution between different cameras，and a spatiotemporal score of camera is defined to fine-tune distance-between characteristics. This method is focused on a re-ranking spatially and the efficient and effective method can achieve similar spatio-temporal results close to the original ranking. Result The comparative analysis is carried out and popular 10 methods are compared on two cross-domain experimental datasets. For source domainrelevant Duke multi-tracking multi-camera reidentification （DukeMTMC-ReID）data set and target domain-related market- 1501 dataset，the mean average precision（mAP）value can be reached to 82. 5%，and the Rank1 is increased by 3. 1% and reached to 95. 3%. For the dataset in relevant to source domain market-1501 and target domain DukeMTMC-ReID，mAP and Rank1 can be reached to 75. 3% and 90. 2% of each. Conclusion To improve the accuracy of pseudo labels and optimize the matching ranking，the spatiotemporal distance ranking-coordinated learning-mutual model is developed in subcamera network. Its computation is optimized more and pedestrian re-recognition performance is improved further.

Keywords

pedstrian re-ID mutual learning multiple cameras cross domain time and space distance