Current Issue Cover
结合注意力机制和多属性分类的行人再识别

郑鑫1, 林兰2, 叶茂2, 王丽1, 贺春林1(1.西华师范大学计算机学院, 南充 637002;2.电子科技大学计算机科学与工程学院, 成都 611731)

摘 要
目的 在行人再识别中,经常出现由于行人身体部位被遮挡和行人图像对之间不对齐而导致误判的情况。利用人体固有结构的特性,关注具有显著性特征的行人部件,忽略带有干扰信息的其他部件,有利于判断不同摄像头拍摄的行人对是否为同一人。因此,提出了基于注意力机制和多属性分类的行人再识别方法。方法 在训练阶段,利用改进的ResNet50网络作为基本框架提取特征,随后传递给全局分支和局部分支。在全局分支中,将该特征作为全局特征进行身份和全局属性分类;在局部分支中,按信道展开特征,获取每层响应值最高的点,聚合这些点,分成4个行人部件,计算每个行人部件上显著性特征的权重,并乘以初始特征得到每个部件的总特征。最后将这4个部件的总特征都进行身份和对应属性的分类。在测试阶段,将通过网络提取的部位特征和全局特征串联起来,计算行人间的相似度,从而判断是否为同一人。结果 本文方法引入了Market-1501_attribute和DukeMTMC-attribute数据集中的属性信息,并在Market-1501和DukeMTMC-reid数据集上进行测试,其中rank-1分别达到90.67%和80.2%,mAP分别达到76.65%和62.14%;使用re-ranking算法后,rank-1分别达到92.4%和84.15%,mAP分别达到87.5%和78.41%,相比近年来具有代表性的其他方法,识别率有了极大提升。结论 本文方法通过学习行人属性能更快地聚集行人部件的注意力,而注意力机制又能更好地学习行人部位的显著性特征,从而有效解决了行人被遮挡和不对齐的问题,提高了行人再识别的准确率。
关键词
Improving person re-identification by attention and multi-attributes

Zheng Xin1, Lin Lan2, Ye Mao2, Wang Li1, He Chunlin1(1.School of Computer Science, China West Normal University, Nanchong 637002, China;2.School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China)

Abstract
Objective Person re-identification (ReID) refers to the retrieval of target pedestrians from multiple non-overlapping cameras. This technology can be widely used in various fields. In the security field, police can quickly track and retrieve the location of suspects and easily find missing individuals by using person ReID. In the field of album management, users manage electronic albums according to person ReID. In the field of e-commerce, managers of unmanned supermarkets use person ReID to track user behavior. However, this technology poses a variety of challenges, such as low resolution, light change, background clutter, variable pedestrian action, and occlusion of pedestrian body parts. Numerous methods have been proposed to solve these problems. The traditional methods mainly include two ways, namely, feature design and distance metric. The main idea of feature design is to design a feature representation with strong discriminability and robustness. Thus, an expressive feature can be extracted from a pedestrian image. The distance metric method is used to reduce the distance among similar pedestrian images while increasing the distance among different pedestrian images. In recent years, deep learning has been widely used in the field of computer vision. Given the popularity of deep learning, person ReID based on deep learning has achieved higher recognition rate than traditional methods. The deep learning method mainly uses convolutional neural networks (CNNs) to classify pedestrian images, extract the representation of pedestrian images from the network, and determine whether pedestrian pairs form matches by calculating the similarity between pedestrian pairs. In this work, the proposed method is based on a CNN. We observe that the problems of misjudgment due to the occlusion of pedestrian body parts and misalignment between pedestrian image pairs are frequent. Thus, we attempt to focus on pedestrian parts with distinctive features and ignore other parts with interference information to determine whether pedestrian pairs captured by different cameras form matches. We then propose a method for person ReID using attention and attributes to overcome pedestrian occlusion and misalignment problem in ReID. Method Our method introduces the attribute information of pedestrians. Attribute information is divided into local and global attributes. The local attributes are composed of the head, upper body, hand, and lower body part attributes. The global attributes include gender and age. The proposed method falls into two stages: training and test stages. In the training stage, the improved ResNet50 network is used as the basic framework to extract features. The training stage is composed of two branches: a global branch and a local branch. In the global branch, features are used as the global feature to classify identity and global attributes. Meanwhile, in the local branch, multiple processes are involved. First, features are extended into multiple channels. Second, we obtain the point with the highest response value per channel. Third, we cluster these points, and the cluster points are divided into four pedestrian parts. Fourth, we obtain the weight of the distinctive feature on each pedestrian part through calculation. Fifth, the weights of these four parts are multiplied by the initial features to obtain the total features of each part. Finally, the total features of these four parts are classified according to the identity and corresponding attributes. In the test stage, the part and global features are extracted through the trained network and concatenated as joint features. The similarity between pedestrian pairs is calculated to determine whether the pedestrian pairs form matches. In this method, the network locates pedestrian parts by using attribute information. At the same time, the attention mechanisms of the network are used to obtain discriminative features of pedestrian parts. Result The proposed method uses attribute information in the Market-1501_attribute and DukeMTMC-attribute datasets and then evaluates the Market-1501 and DukeMTMC-reid datasets. In the field of person ReID, two evaluation protocols are used to evaluate the performance of the model: cumulative matching characteristics (CMC) curve and mean average precision (mAP). The following steps are performed to draw the CMC curve. First, we calculate the Euclidean distance between the probe image and the gallery images. Second, we sort the ranking order of all gallery images from the image with the shortest distance to the longest distance for each probe image. Finally, the true match percentage of the top m sorted images is calculated and called as rank-m. Specifically, rank-1 is an important indicator of person ReID. The following steps are performed to complete the mAP. First, we compute the average precision, which is the area under the precise-recall curve of each probe image. Next, the mean of average precision of all probe images is calculated. In the Market-1501 and DukeMTMC-reid datasets, the rank-1 accuracies are 90.67% and 80.2%, respectively, whereas the mAP accuracies are 76.65% and 62.14%, respectively. Recently, the re-ranking method has been widely used in the field of person ReID. In the re-ranking method, the rank-1 accuracies are 92.4% and 84.15%, whereas the mAP accuracies are 87.5% and 78.41%. Compared with other state-of-the-art methods, our accuracies are greatly improved. Conclusion In this study, we propose a method for ReID based on attention and attributes. This proposed method can direct attention toward pedestrian parts by learning the attributes of pedestrians. Then, the attention mechanism can be used to learn the distinctive features of pedestrians. Even if pedestrians are occluded and misaligned, the proposed method can still achieve high accuracy.
Keywords

订阅号|日报