Current Issue Cover

吴禹航,桑农(华中科技大学人工智能与自动化学院图像信息处理与智能控制教育部重点实验室, 武汉 430074)

摘 要
目的 无监督域适应行人重识别(unsupervised domain adaptive pedestrians're-identification,UDA Re-ID)旨在通过已有标记的应用场景(即源域)数据和新的无标记应用场景(即目标域)数据,训练一个可以在目标域泛化性能好的行人重识别模型。现有方法没有考虑实例特征在训练过程中的不稳定问题,也没有显式考虑由于相机变化所导致的行人类内距离变大、类间距离变小的问题,以及无标注目标域数据聚类误差带来的伪标签噪声问题。针对这些问题,提出了一种具有一致性约束和标签优化的方法。方法 首先提出了实例一致性以约束同一实例在不同增广下的特征距离,提升行人实例特征稳定性;然后提出相机一致性以约束跨相机正实例特征对之间的距离,提升对相机变化的鲁棒性;最后提出了基于标签集成的标签优化,将one-hot编码的伪标签转换为更可靠的软标签,提升了监督信号的鲁棒性。结果 本文方法在Duke→Market,Market→Duke,Duke→MSMT,Market→MSMT等常用的UDA Re-ID任务上的平均精度均值(mean average precision,mAP)分别为85.0%,73.5%,41.3%,39.3%;Rank-1分别为94.0%,85.6%,71.6%,69.5%。通过消融实验验证了本文提出的3个模块的有效性。结论 提出的实例一致性约束和相机一致性约束可以使模型学习到更鲁棒的行人特征表达,提升行人特征的稳定性,提出的基于标签集成的标签优化可以减少伪标签噪声的过拟合风险。
Consistency constraints and label optimization-relevant domain-unsupervised adaptive pedestrians're-identification

Wu Yuhang,Sang Nong(Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China)

Objective Pedestrians’re-identification(Re-ID)can be as one of key techniques for multi-camera pedestrians’retrieval in the video surveillance system. To achieve good performance,current Re-ID model is often trained on a scene based on a large number of annotations(source domain),but the performance will be dropped significantly when a new scene(target domain)is applied straightforward. However,re-labeling is time-consuming and labor-intensive for the new scene, which is beneficial for Re-ID-related optimization. The unsupervised domain adaptive pedestrians’ re-identification(UDA Re-ID)method is focused on a model training,which can be generalized on the target domain well using the existing source domain data-labeled and target domain data-unlabeled. But,these methods are still challenged for the instability of instance features and image heterogeneity of the intra-class distance-wider and the inter-class distancenarrowed. Furthermore,current cluster unlabeled target domain data can be melted into multiple clusters,and the encoded pseudo labels can be assigned for each cluster. However,due to the limited representation ability of the model,the clustering results are incredible,especially in the early stage of training. One pedestrian-related image is grouped into different clusters while some images of different pedestrian are merged into a cluster,called pseudo label noises. However,it is still challenged for the problem of pseudo label noises-over-fitted and the performance of the model-suppressed although pseudo labels are recognized as a supervision signal for the feature learning process(e. g. ,contrastive learning). To resolve these problems,we develop a multi-centroid representation network with consistency constraints method(MCRNCC)in terms of the popular multi-centroid representation network method(MCRN). Method The MCRNCC is designed on the basis of three MCRN-related modules to improve the stability of instance features and the robustness of pedestrian features,and the overfitting risk of the pseudo-label noise can be reduced. First,to optimize the instance feature stability and semantic information,an instance-consistent is demonstrated to suppress the feature distance of the same instance under different augmentation. The exponential moving average model is illustrated to output additional features based on recent selfsupervised learning works. For each image of the training batch,it can be augmented twice in random,the features can be extracted in relevance to original model and exponential moving average model,and cosine distance is used to constrain the feature pairs. Second,to improve robustness of multiple variations-captured,its homogeneity is concerned for suppressing the distance between feature pairs of positive instances. Specifically,two instances are opted to construct a positive pair in the context of same labels-without identity label,and two instances is opted as well to construct a negative pair in related to same label-within multiple identity labels,and a triplet is built up to optimize the network as well. Finally,the labelensemble-based optimization is carried out to convert one-hot encoded pseudo labels into more reliable soft labels,which improves the robustness of supervision signals. In detail,we add a target domain classifier to generate additional label predictions,followed by linearly weighting the predictions and one-hot encoded pseudo labels into refined soft labels. Result To verify the effectiveness of our method,adequate experiments are carried out on 4 popular UDA Re-ID tasks like 1)Duke→Market,2)Market→Duke,3)Duke→MSMT,and 4)Market→MSMT tasks. At the beginning,the ablation studies are carried out about the modules in MCRNCC. The four tasks-derived instance consistency constraints can be reached to 0. 6%,0. 2%,0. 7% and 0. 8% of each mean average precision(mAP),which demonstrates the effectiveness of the instance consistency constraint. The camera consistency constraint yields a general improvement for all of 4 tasks. For example,mAP/Rank-1 is increased by 3. 5%/3. 2% and 5. 3%/4. 7% on Duke→MSMT and Market→MSMT. In addition,we visualize the feature space of adding camera consistency constraint before and after. Furthermore,we compare the feature space of some pedestrian-focused close to the camera consistency constraint,and the visualization results show that the camera consistency can make the feature space more compactible. The label-ensemble-optimized can be improved to 0. 6%,0. 6%,1. 4% and 0. 4% of the mAP for each 4 tasks. Second,our proposed MCRNCC is compared to the existing methods. The comparative analysis shows that the MCRNCC can be reached to 85. 0%/94. 0%,73. 5%/85. 6%,41. 3%/ 71. 6% and 39. 3%/69. 5% for the optimization of mAP/Rank-1 performance,and the MCRN is surpassed by 1. 2%/0. 2%, 2. 0%/1. 1%,5. 6%/4. 1%,and 6. 5%/5. 1% as well. Conclusion we develop a method MCRNCC to resolve the UDA Re-ID problem further. The instance consistency constraint and camera consistency constraint proposed in MCRNCC can enable the model to learn more robust pedestrian-related feature representations,while the proposed label ensemble-based optimization can reduce the overfitting risks of pseudo label noises. Experiments show that the effectiveness of threemodule based MCRNCC has its potentials for future works.