张红颖1, 刘腾飞1, 罗谦2, 张涛2(1.中国民航大学;2.民航成都电子技术有限责任公司)
目的 在行人重识别任务中，行人外观特征会因为遮挡发生变化，从而降低行人特征的辨别性，仅基于可视部分的传统方法仍会识别错误。针对此问题，提出了一种融合姿态引导和多尺度特征的遮挡行人重识别方法。方法 首先，构建了一种特征修复模块，根据遮挡部位邻近信息恢复特征空间中被遮挡区域的语义信息，实现缺失部位特征的修补。然后，为了从修复的图像中提取有效的姿态信息，设计了一种姿态引导模块，通过姿态估计引导特征提取，实现更加精准的行人匹配。最后，搭建了特征增强模块，并融合显著性区域检测方法增强有效的身体部位特征同时消除背景信息造成的干扰。结果 实验在三个公开的数据集上进行了对比实验和消融实验，在Market1501、DukeMTMC-reID(Duke multi-tracking multi-camera re-identification)和Occluded-DukeMTMC(occluded Duke multi-tracking multi-camera re-identification)数据集上的平均精度均值(mean average precision, mAP)和首次命中率(rank-1 accuracy, Rank-1)分别为88.8%和95.5%、79.2%和89.3%、51.7%和60.3%。对比实验结果证明提出的融合算法提高了行人匹配的准确率，具有较好的竞争优势。结论 本文所提的姿态引导和多尺度融合方法，修复了因遮挡而缺失的部位特征，结合姿态信息融合了不同粒度的图像特征，提高了模型的识别准确率，能有效缓解遮挡导致的误识别现象，验证了方法的有效性。
Pose guidance and multi-scale features fusion for occluded person re-identification
zhanghongying, LiuTengfei1, Luoqian2, ZhangTao2(1.College of Electronic Information and Automation,Civil Aviation University of China;2.Civil Aviation Chengdu Electronic Technology Co., Ltd)
Objective Person re-identification (ReID) is an important task in computer vision, which aims to accurately identify and associate the same person between multiple visual surveillance cameras by extracting and matching features of pedestrian under different scenarios. Occluded person re-identification is a challenging and specialized task in the existing person re-identification problems. In real-world settings, occlusion is a common issue, and it impacts the practical application of person re-identification technique to a certain extent. Recently, occluded person re-identification has gradually attracted the attention of many researchers, and lots of methods has been proposed for addressing the issue of occlusion, which achieves impressive results. Currently, these methods primarily focus on the visible regions in images. Concretely, it first locates the visible regions in the image and then specially designs a model to extract discerning feature information from these regions, thereby achieving accurate person matching. These methods typically remove features coming from the occluded areas, and then exploit discriminative features from the non-occluded regions for matching. Despite these methods achieve impressive results, the influence of occluded regions and background interference in images are ignored, which results in the aforementioned solutions failing to effectively address the misclassification issue resulting from similar appearances in non-occluded regions. Consequiently, merely relying on visible regions for subsequent recognition task leads to a sharp performance drop of the model, and the interference coming from image backgrounds also affects the further improvement of recognition accuracy. Aiming at these issues, some methods were proposed to recover the occluded regions in images. Specifically, these methods restore the occluded parts by utilizing the unobstructed image information at the image level. However, the restoration approaches may cause image distortion and introduce an excessive number of parameters. Method To alleviate the aforementioned issues, we propose a person re-identification method based on pose guidance and multi-scale feature fusion, which can enhance the feature representation capability of the model and obtain more discriminative features. Firstly, a feature restoration module is constructed to restore the occluded image features at the feature level, while effectively reduces the model"s parameters. The module uses spatial contextual information from the non-occluded regions to predict the features of adjacent occluded regions, thereby restoring the semantic information of the occluded regions in the feature space. The feature restoration module mainly consists of two subparts: the adaptive region division unit and the feature restoration one. The adaptive region division unit divides the image into six regions adaptively according to the predicted localization points, so as to facilitate the clustering of similar feature information in different regions. The adaptive division in the module could effectively alleviate the misalignment caused by fixed division methods, and achieve more accurate position alignment. The feature restoration unit comprises of an encoder and a decoder. The encoder encodes the feature information coming from the divided regions of the image with similar appearances or close positions into a cluster, while the decoder assigns the cluster information to the occluded body parts in the image, thereby completing the feature restoration of missing body parts. Secondly, a pose estimation network is employed to extract pedestrian pose information. The pose estimation network is responsible for guiding the generation of keypoint heatmaps for the restored complete image features, and then implementing the prediction of body keypoints with the heatmaps to obtain pose information. The pretrained pose estimation guidance model performs fusion learning on the global non-occluded regions and the restored regions, to obtain more distinctive pedestrian feature information for more accurate pedestrian matching. Finally, in order to eliminate the interference coming from background information while enhancing the learning capability for effective information, a feature enhancement module is proposed to extract salient features from the image. This module not only makes the network pay more attention to the valid semantic information in the feature maps, but also reduces the interference coming from background noises, which could effectively alleviate the failure of feature learning caused by occlusion. Result To validate the effectiveness of our method, we conducted a series of comparative experiments and ablation experiments on three publicly available datasets, and employed mean average precision (mAP) and Rank-1 accuracy as our evaluation metrics. Experiment results demonstrate that our method achieved mAP and Rank-1 of 88.8% and 95.5% on the Market1501 dataset respectively; The mAP and Rank 1 are 79.2% and 89.3% respectively on the Duke multi-tracking multi-camera re-identification (DukeMTMC-reID) dataset; On the occluded Duke multi-tracking multi-camera re-recognition (Occluded-DukeMTMC) dataset, mAP and Rank-1 can reach 51.7% and 60.3%, respectively. Moreover, our method outperformed the PGMA-Net by 0.4% in mAP on the Market1501 dataset, by 0.8% in mAP and 0.7% in Rank-1 on the DukeMTMC-reID dataset, and by 1.2% in mAP on the Occluded-DukeMTMC dataset. At the same time, the ablation experiments also confirmed the effectiveness of the three proposed modules. Conclusion Our proposed method, pose-guided and multi-scale feature fusion (PGMF), could effectively recover the missing body parts" features, alleviate the issue of background interference, and achieve accurate pedestrian matching. Therefore,the proposed model in this paper effectively alleviates the misidentification caused by occlusion, improves the accuracy of person re-identification, and its robustness has been demonstrated.