张红颖,王徐泳,彭晓雯(中国民航大学电子信息与自动化学院, 天津 300300)
目的 行人重识别任务中,同一行人在不同图像中的背景差异会导致识别准确率下降,出现误识别的现象。针对此问题,提出了一种结合前景分割的多特征融合行人重识别方法。方法 首先构建前景分割模块,提取图像的前景,并通过前景分割损失,保持前景图像的平滑性和完整性;然后提出了注意力共享策略和多尺度非局部运算方法,将图像中的全局特征与局部特征、高维特征与低维特征结合,实现不同特征之间的优势互补;最后通过多损失函数对网络模型进行训练优化。结果 在3个公开数据集Market-1501、DukeMTMC-reID (Duke multi-tracking multi-camera re-identification)和MSMT17(multi-scene multi-time person ReID dataset)上进行了消融实验和对比实验,并以首位命中率(rank-1 accuracy,Rank-1)和平均精度均值(mean average precision,mAP)作为评价指标。实验结果显示,在引入前景分割和多特征融合方法时,网络的识别准确率均有一定提升。本文方法在Market-1501、DukeMTMC-reID和MSMT17数据集上Rank-1和mAP分别为96.8%和91.5%、91.5%和82.3%以及83.9%和63.8%,相比于对比算法,本文方法具有较大优势。结论 本文提出的结合前景分割的多特征融合方法,在提取前景的同时,综合了不同尺度和不同粒度图像特征,相较于已有模型,提高了识别准确率。同时,前景分割模块消除了无用背景,缓解了背景差异导致的误识别现象,使行人重识别模型的实用性得到加强,在面对实际背景情况时,也能有较好的识别效果。
Foreground segmentation-relevant multi-feature fusion person re-identification
Zhang Hongying,Wang Xuyong,Peng Xiaowen(College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China)
Objective Person re-identification（ReID）is a computer vision-based cross camera recognition technology to target a pedestrian-specific in an image or video sequence. To obtain more discriminative features and achieve high accuracy，the deep learning based method of ReID is focused on processing personal features in recent years. The whole pedestrian image is often as the sample for the ReID model and each pixel feature is as the basis for recognition in the image. However，ReID，as a cross-camera recognition task，is required to deploy a wide range and number of camera locations， which will inevitably lead to background variations in pedestrian images. In other words，the heterogeneity of imagescaptured is challenged for the interrelations between background similarity and its identity for both of single and multiple cameras. Therefore，it is necessary to rich background information to the pedestrian similarity metric in the ReID model. To resolve this problem，we develop a foreground segmentation-based multi-branch joint person re-identification method in terms of the residual network 50（ResNet50）. Method An integration of foreground segmentation and ReID method are employed. First，as the input for feature extraction，the foreground area of the pedestrian images is extracted by the foreground segmentation module. Then，to achieve mutual benefits between different features，the global features in the image are combined with local features and high-dimensional features with low-dimensional features using a multi-grain featureguided branch and multi-scale feature fusion branch. For the foreground segmentation module，an attention mechanism is used to improve mask region-based convolutional neural network（Mask R-CNN）. And，the foreground segmentation loss function is adopted to optimize the feature information loss derived of rough segmentation of the foreground. For multi-grain feature branch，the convolutional block attention module（CBAM）is improved initially in terms of a three-branch attention structure. The information-interacted between two dimensions is based on adding new branches between the spatial and channel dimensions. Furthermore，an attention-sharing strategy is implemented. To improve the effectiveness of feature extraction and avoid feature chunking-derived missing information，the attention information is shared in coarse-grained branches with fine-grained branches，which can yield global features to guide the extraction of local features simply. For multi-scale feature fusion branch，the features at different stages backbone network feature extraction are used as the input of multi-scale fusion straightforward. Additionally，the pyramid attention structure is used as well to get the feature information before fusion. Next，for the fusion module，to synthesize the global information and alleviate the loss of feature information，a non-local algorithm is illustrated in multiscale for feature fusion. Finally，as a joint loss function，the lossincorporated is in relevance with foreground segmentation，TriHard，and Softmax. And，it is used to train the network for optimization further. Result The comparative analysis is based on 3 publicly available datasets（Market-1501，Duke multitracking multi-camera re-identification（DukeMTMC-reID）and multi-scene multi-time person ReID dataset（MSMT17））. The metrics-related evaluations consist of rank-n accuracy（Rank-n）and mean average precision（mAP）. For the Market- 1501 dataset， the Rank-1 and mAP can be reached 96. 8% and 91. 5% of each， which is a 0. 6% improvement in Rank-1 and a 1% improvement in mAP compared to attention pyramid network（APNet-C）； For the DukeMTMC-reID dataset， the Rank-1 and mAP can be reached to 91. 5% and 82. 3% of each， as well as an improvement of 1. 1% in Rank-1 and an improvement of 0. 8% in mAP compared to the model APNet-C；For the MSMT17 dataset，the Rank-1 and mAP can be reached to 83. 9% and 63. 8% of each，which is increased by 0. 2% in Rank-1 and 0. 3% in mAP compared to APNet-C； Conclusion We facilitate a foreground segmentation based multi-branch joint model. It can be focused on foreground extraction-based in accordance with an integration of multiple scales and grains image features. At the same time， the foreground segmentation module can wipe out ineffective background and alleviate the background-differentiated false recognition more.
foreground segmentation semantic segmentation person re-identification（ReID） feature fusion attention mechanism