汪琦, 雪心远, 闵卫东, 汪晟, 盖迪, 韩清(南昌大学)
目的 现有的跨域重识别任务普遍存在着源域与目标域之间的域偏差大和聚类质量差的问题；同时跨域模型过度地关注在目标域上的泛化能力将导致对源域知识的永久性遗忘。为了克服以上挑战，提出了一个基于跨域联合学习与共享子空间度量的车辆重识别方法。方法 首先，在跨域联合学习中设计了一种交叉置信软聚类来建立源域与目标域之间的域间相关性，并利用软聚类结果产生的监督信息来保留旧知识与泛化新知识。然后提出了一种显著性感知注意力机制来获取车辆的显著性特征，将原始特征与显著性特征映射到一个共享子空间中并通过它们各自全局与局部之间的杰卡德距离来获取共享度量因子，最终根据共享度量因子来平滑全局与局部的伪标签，进而促使模型能够学习到更具鉴别力的特征。结果 在三个公共车辆重识别数据集VeRi-776、VehicleID和VeRi-Wild上与现有最新的方法进行实验的对比，本文的方法分别在VeRi-776→VeRi-Wild，VeRi-Wild→VeRi-776，VeRi-776→VehicleID，VehicleID→VeRi-776的跨域任务中分别在目标域中取得了42.40%，41.70%，56.40%，61.90%的Rank-1准确率；22.50%，23.10%，41.50%，49.10%的mAP准确率。在积累源域的旧知识表现中分别取得了84.60%，84.00%，77.10%，67.00%的Rank-1准确率；55.80%，44.80%，46.50%，30.70%的mAP准确率。结论 本文所提出的方法能够在积累跨域知识的同时有效地缓解域偏差大的问题，进而提升车辆重识别任务的性能。
Cross-domain joint learning and shared subspace metric for vehicle re-identification
Wang Qi, Xue Xinyuan, Min Weidong, Wang Sheng, Gai Di, Han Qing(Nanchang University)
Objective Vehicle re-identification (Re-ID) is a technology that uses computer vision technology to determine whether a specific target vehicle exists in an image or video sequence, which is considered a sub problem of image retrieval. Vehicle Re-ID technology can be used to monitor specific abandoned vehicles and prevent driving escape, and is widely applied in the fields of intelligent surveillance and intelligent transportation. The previous methods mainly focused on supervised training in a single domain. If the effective Re-ID model in the single domain is transferred to an unlabeled new domain for testing, it will result in a significant decrease in retrieval accuracy. In order to alleviate the manual annotation cost of massive surveillance data, many cross-domain based Re-ID methods have gradually been proposed by some researchers. This task aims to transfer the trained supervised Re-ID model from the labelled source domain to the unlabeled target domain for clustering. The entire tranfer process uses unsupervised iteration and update of model parameters, thereby achieving the goal of reducing manual annotation costs. However, there are generally two main challenges in existing cross-domain Re-ID tasks: on the one hand, existing cross-domain Re-ID methods focus too much on the performance of the target domain, often forgetting the old knowledge previously learned in the source domain, which will cause catastrophic forgetting of the old knowledge. On the other hand, the large deviation between the source domain and the target domain will directly affect the generalization ability of the Re-ID model, mainly due to the significant differences in data distribution and domain attributes in different domains. In order to overcome the above challenges, a vehicle re-identification method based on cross-domain joint learning, and a shared subspace metric is proposed. Method First, a cross-confidence soft cluster is designed in cross-domain joint learning to establish the inter-domain correlation between the source domain and the target domain. The cross-confidence soft cluster aims to introduce prior knowledge of the source domain data into the target domain by calculating the confidence level of the cross mean, and jointly perform soft clustering, thereby effectively integrating prior knowledge of the source domain with new knowledge of the target domain. The training data is re-labeled with pseudo labels based on the cross mean confidence of each type of source domain data, and the supervised information generated by the soft clustering results is ultimately retained to preserve old knowledge and generalize new knowledge. Then, a salient-aware attention mechanism is proposed to obtain the salient features of vehicles. In order to improve the Re-ID model"s ability to identify significant regions of vehicles in the channel and spatial dimensions, the salient-aware attention mechanism module is embedded into the reference network, and the expression of vehicle significant region features is improved by calculating the channel weight factor and spatial weight factor. For the channel weight factor, a convolution operation with a convolution kernel of 1 is used to compress the channel dimensions of the feature matrix, and the importance of each channel in the feature matrix is calculated in a self-aware manner. In addition, to prevent the loss of some channel spatial information when compressing channel dimensions, global average pooling is applied to the feature matrix, and more refined channel style attention is jointly inferred by considering channel self-attention and channel-by-channel spatial information. The original features and salient features are mapped into a shared subspace, and the shared metric factors are obtained through the jaccard distance of their respective global and local regions. Finally, in order to further alleviate the label noise caused by domain bias, a shared metric factor is used to smooth global and local pseudo labels based on the results of cross-confidence soft clustering, which enables the training model to learn more discriminating features. The proposed method in this paper is trained in the Python 3.7 and Python 1.6.0 frameworks, with an operating system of Ubuntu 18.04 and CUDA 11.2. The hardware configuration is an Intel (R) Xeon (R) Silver 4210 CPU @ 2.20GHz model CPU, a Tesla V100 graphics card with 32GB of graphics memory, and a running memory of 64GB. The whole training uses ResNet-50 as the baseline model, and the size of the input image is uniformly cropped to 224×124. The total number of training iteration epoch is 50, and the batchsize is set to 64. The pre-training model on ImageNet is used as the initialization model in this paper, and the initial learning rate is set to 0.00035. At the same time, SGD is used to iterate and optimize the model weight. Result Experimental comparisons were conducted on three public vehicle recognition datasets, VeRi-776, VehicleID, and VeRi Wild, with the latest existing methods. The proposed method achieved a Rank-1 accuracy of 42.40%, 41.70%, 56.40%, and 61.90% in the target domain in the cross-domain tasks of VeRi-776→VeRi-Wild, VeRi-Wild→VeRi-776, VeRi-776→VehicleID, and VehicleID→VeRi-776, respectively; the accuracy of mAP was 22.50%, 23.10%, 41.50%, and 49.10%. Achieved Rank-1 accuracy of 84.60%, 84.00%, 77.10%, and 67.00%, respectively, in accumulating old knowledge representation in the source domain; the mAP accuracy is 55.80%, 44.80%, 46.50%, and 30.70%. In addition, a series of experiments are conducted to further demonstrate the robustness of the proposed method in cross-domain tasks, including ablation comparison of different modules, comparison of different training methods, comparison of outliers and visualization of attention maps, comparison of rank-lists, and comparison of T-SNE visualization. Conclusion In this study, the proposed method can effectively alleviate the problem of large domain deviation while accumulating cross-domain knowledge, thereby improving the performance of vehicle Re-ID tasks.