结合注意力机制和多属性分类的行人再识别
Improving person re-identification by attention and multi-attributes
- 2020年25卷第5期 页码:936-945
收稿:2019-05-15,
修回:2019-10-27,
录用:2019-11-3,
纸质出版:2020-05-16
DOI: 10.11834/jig.190185
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-05-15,
修回:2019-10-27,
录用:2019-11-3,
纸质出版:2020-05-16
移动端阅览
目的
2
在行人再识别中,经常出现由于行人身体部位被遮挡和行人图像对之间不对齐而导致误判的情况。利用人体固有结构的特性,关注具有显著性特征的行人部件,忽略带有干扰信息的其他部件,有利于判断不同摄像头拍摄的行人对是否为同一人。因此,提出了基于注意力机制和多属性分类的行人再识别方法。
方法
2
在训练阶段,利用改进的ResNet50网络作为基本框架提取特征,随后传递给全局分支和局部分支。在全局分支中,将该特征作为全局特征进行身份和全局属性分类;在局部分支中,按信道展开特征,获取每层响应值最高的点,聚合这些点,分成4个行人部件,计算每个行人部件上显著性特征的权重,并乘以初始特征得到每个部件的总特征。最后将这4个部件的总特征都进行身份和对应属性的分类。在测试阶段,将通过网络提取的部位特征和全局特征串联起来,计算行人间的相似度,从而判断是否为同一人。
结果
2
本文方法引入了Market-1501_attribute和DukeMTMC-attribute数据集中的属性信息,并在Market-1501和DukeMTMC-reid数据集上进行测试,其中rank-1分别达到90.67%和80.2%,mAP分别达到76.65%和62.14%;使用re-ranking算法后,rank-1分别达到92.4%和84.15%,mAP分别达到87.5%和78.41%,相比近年来具有代表性的其他方法,识别率有了极大提升。
结论
2
本文方法通过学习行人属性能更快地聚集行人部件的注意力,而注意力机制又能更好地学习行人部位的显著性特征,从而有效解决了行人被遮挡和不对齐的问题,提高了行人再识别的准确率。
Objective
2
Person re-identification (ReID) refers to the retrieval of target pedestrians from multiple non-overlapping cameras. This technology can be widely used in various fields. In the security field
police can quickly track and retrieve the location of suspects and easily find missing individuals by using person ReID. In the field of album management
users manage electronic albums according to person ReID. In the field of e-commerce
managers of unmanned supermarkets use person ReID to track user behavior. However
this technology poses a variety of challenges
such as low resolution
light change
background clutter
variable pedestrian action
and occlusion of pedestrian body parts. Numerous methods have been proposed to solve these problems. The traditional methods mainly include two ways
namely
feature design and distance metric. The main idea of feature design is to design a feature representation with strong discriminability and robustness. Thus
an expressive feature can be extracted from a pedestrian image. The distance metric method is used to reduce the distance among similar pedestrian images while increasing the distance among different pedestrian images. In recent years
deep learning has been widely used in the field of computer vision. Given the popularity of deep learning
person ReID based on deep learning has achieved higher recognition rate than traditional methods. The deep learning method mainly uses convolutional neural networks (CNNs) to classify pedestrian images
extract the representation of pedestrian images from the network
and determine whether pedestrian pairs form matches by calculating the similarity between pedestrian pairs. In this work
the proposed method is based on a CNN. We observe that the problems of misjudgment due to the occlusion of pedestrian body parts and misalignment between pedestrian image pairs are frequent. Thus
we attempt to focus on pedestrian parts with distinctive features and ignore other parts with interference information to determine whether pedestrian pairs captured by different cameras form matches. We then propose a method for person ReID using attention and attributes to overcome pedestrian occlusion and misalignment problem in ReID.
Method
2
Our method introduces the attribute information of pedestrians. Attribute information is divided into local and global attributes. The local attributes are composed of the head
upper body
hand
and lower body part attributes. The global attributes include gender and age. The proposed method falls into two stages: training and test stages. In the training stage
the improved ResNet50 network is used as the basic framework to extract features. The training stage is composed of two branches: a global branch and a local branch. In the global branch
features are used as the global feature to classify identity and global attributes. Meanwhile
in the local branch
multiple processes are involved. First
features are extended into multiple channels. Second
we obtain the point with the highest response value per channel. Third
we cluster these points
and the cluster points are divided into four pedestrian parts. Fourth
we obtain the weight of the distinctive feature on each pedestrian part through calculation. Fifth
the weights of these four parts are multiplied by the initial features to obtain the total features of each part. Finally
the total features of these four parts are classified according to the identity and corresponding attributes. In the test stage
the part and global features are extracted through the trained network and concatenated as joint features. The similarity between pedestrian pairs is calculated to determine whether the pedestrian pairs form matches. In this method
the network locates pedestrian parts by using attribute information. At the same time
the attention mechanisms of the network are used to obtain discriminative features of pedestrian parts.
Result
2
The proposed method uses attribute information in the Market-1501_attribute and DukeMTMC-attribute datasets and then evaluates the Market-1501 and DukeMTMC-reid datasets. In the field of person ReID
two evaluation protocols are used to evaluate the performance of the model: cumulative matching characteristics (CMC) curve and mean average precision (mAP). The following steps are performed to draw the CMC curve. First
we calculate the Euclidean distance between the probe image and the gallery images. Second
we sort the ranking order of all gallery images from the image with the shortest distance to the longest distance for each probe image. Finally
the true match percentage of the top
m
sorted images is calculated and called as rank-
m
. Specifically
rank-1 is an important indicator of person ReID. The following steps are performed to complete the mAP. First
we compute the average precision
which is the area under the precise-recall curve of each probe image. Next
the mean of average precision of all probe images is calculated. In the Market-1501 and DukeMTMC-reid datasets
the rank-1 accuracies are 90.67% and 80.2%
respectively
whereas the mAP accuracies are 76.65% and 62.14%
respectively. Recently
the re-ranking method has been widely used in the field of person ReID. In the re-ranking method
the rank-1 accuracies are 92.4% and 84.15%
whereas the mAP accuracies are 87.5% and 78.41%. Compared with other st
ate-of-the-art methods
our accuracies are greatly improved.
Conclusion
2
In this study
we propose a method for ReID based on attention and attributes. This proposed method can direct attention toward pedestrian parts by learning the attributes of pedestrians. Then
the attention mechanism can be used to learn the distinctive features of pedestrians. Even if pedestrians are occluded and misaligned
the proposed method can still achieve high accuracy.
Ahmed E, Jones M and Marks T K. 2015. An improved deep learning architecture for person re-identification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3908-3916[ DOI: 10.1109/CVPR.2015.7299016 http://dx.doi.org/10.1109/CVPR.2015.7299016 ]
Chen W H, Chen X T, Zhang J G and Huang K Q. 2017. Beyond triplet loss: a deep quadruplet network for person re-identification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 1320-1329[ DOI: 10.1109/CVPR.2017.145 http://dx.doi.org/10.1109/CVPR.2017.145 ]
Farenzena M, Bazzani L, Perina A, Murino V and Cristani M. 2010. Person re-identification by symmetry-driven accumulation of local features//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE: 2360-2367[ DOI: 10.1109/CVPR.2010.5539926 http://dx.doi.org/10.1109/CVPR.2010.5539926 ]
Gray D and Tao H. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 262-275[ DOI: 10.1007/978-3-540-88682-2_21 http://dx.doi.org/10.1007/978-3-540-88682-2_21 ]
Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE: 1735-1742[ DOI: 10.1109/CVPR.2006.100 http://dx.doi.org/10.1109/CVPR.2006.100 ]
Hamdoun O, Moutarde F, Stanciulescu B and Steux B. 2008. Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences//Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras. Stanford: IEEE: 1-6[ DOI: 10.1109/ICDSC.2008.4635689 http://dx.doi.org/10.1109/ICDSC.2008.4635689 ]
Hermans A, Beyer L and Leibe B. 2017. In defense of the triplet loss for person re-identification[EB/OL ] .[2019-05-10 ] . https://arxiv.org/pdf/1703.07737.pdf https://arxiv.org/pdf/1703.07737.pdf
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR: 448-456
Khamis S, Kuo C H, Singh V K, Shet V D and Davis L S. 2015. Joint learning for attribute-c onsistent person re-identification//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 134-146[ DOI: 10.1007/978-3-319-16199-0_10 http://dx.doi.org/10.1007/978-3-319-16199-0_10 ]
Layne R, Hospedales T and Gong S G. 2012. Person re-identification by attributes//Proceedings of the British Machine Vision Conference.[s.l. ] : BMVA Press: 24.1-11[ DOI: 10.5244/C.26.24 http://dx.doi.org/10.5244/C.26.24 ]
Li D W, Chen X T, Zhang Z and Huang K Q. 2017a. Learning deep context-aware features over body and latent parts for person re-identification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 7398-7407[ DOI: 10.1109/CVPR.2017.782 http://dx.doi.org/10.1109/CVPR.2017.782 ]
Li S, Xiao T, Li H S, Zhou B L, Yue D Y and Wang X G. 2017b. Person search with natural language description//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 5187-5196[ DOI: 10.1109/CVPR.2017.551 http://dx.doi.org/10.1109/CVPR.2017.551 ]
Li W, Zhao R, Xiao T and Wang X G. 2014. DeepReID: deep filter pairing neural network for person re-identification//Proceedings of the 2014 IEEE Conference on Computer Vision and Patte rn Recognition. Columbus: IEEE: 152-159[ DOI: 10.1109/CVPR.2014.27 http://dx.doi.org/10.1109/CVPR.2014.27 ]
Li W, Zhu X T and Gong S G. 2017c. Person re-identification by deep joint learning of multi-loss classification//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: [s.n. ] : 2194-2200[ DOI: 10.24963/ijcai.2017/305 http://dx.doi.org/10.24963/ijcai.2017/305 ]
Liao S C, Hu Y, Zhu X Y and Li S Z. 2015. Person re-identification by local maximal occurrence representation and metric learning//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 2197-2206[ DOI: 10.1109/CVPR.2015.7298832 http://dx.doi.org/10.1109/CVPR.2015.7298832 ]
Lin Y T, Zheng L, Zheng Z D, Wu Y, Hu Z L, Yan C G and Yang Y. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition, 95:151-161[DOI:10.1016/j.patcog.2019.06.006]
Liu X H, Zhao H Y, Tian M Q, Sheng L, Shao J, Yi S, Yan J J and Wang X G. 2017. HydraPlus-Net: attentive deep features for pedestrian analysis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 350-359[ DOI: 10.1109/ICCV.2017.46 http://dx.doi.org/10.1109/ICCV.2017.46 ]
Ma A J, Yuen P C and Li J W. 2013. Domain transfer support vector ranking for person re-identification without target camera label information//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE: 3567-3574[ DOI: 10.1109/ICCV.2013.443 http://dx.doi.org/10.1109/ICCV.2013.443 ]
Nair V and Hinton G E. 2010. Rectified linear units improve restricted Boltzmann machines//Proceedings of the 27th International Conference on Machine Learning. Haifa: Omnipress: 807-814
Qi M B, Wang C C, Jiang J G and Li J. 2018. Person re-identification based on multi-feature fusion and alternating direction method of multipliers. Journal of Image and Graphics, 23(6):827-836
齐美彬, 王慈淳, 蒋建国, 李佶. 2018.多特征融合与交替方向乘子法的行人再识别.中国图象图形学报, 23(6):827-836)[DOI:10.11834/jig.170507]
Ristani E and Tomasi C. 2018. Features for multi-target multi-camera tracking and re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6036-6046[ DOI: 10.1109/CVPR.2018.00632 http://dx.doi.org/10.1109/CVPR.2018.00632 ]
Ristani E, Solera F, Zou R, Cucchiara R and Tomasi C. 2016. Performance measures and a data set for multi-target, multi-camera tracking//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 17-35[ DOI: 10.1007/978-3-319-48881-3_2 http://dx.doi.org/10.1007/978-3-319-48881-3_2 ]
Sarfraz M S, Schumann A, Eberle A and Stiefelhagen R. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 420-429[ DOI: 10.1109/CVPR.2018.00051 http://dx.doi.org/10.1109/CVPR.2018.00051 ]
Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 815-823[ DOI: 10.1109/CVPR.2015.7298682 http://dx.doi.org/10.1109/CVPR.2015.7298682 ]
Song C F, Huang Y, Ouyang W L and Wang L. 2018. Mask-guided contrastive attention model for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 1179-1188[ DOI: 10.1109/CVPR.2018.00129 http://dx.doi.org/10.1109/CVPR.2018.00129 ]
Su C, Li J N, Zhang S L, Xing J L, Gao W and Tian Q. 2017. Pose-driven deep convolutional model for person re-identification//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 3980-3989[ DOI: 10.1109/ICCV.2017.427 http://dx.doi.org/10.1109/ICCV.2017.427 ]
Sun Y F, Zheng L, Deng W J and Wang S J. 2017. SVDNet for pedestrian retrieval//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 3820-3828[ DOI: 10.1109/ICCV.2017.410 http://dx.doi.org/10.1109/ICCV.2017.410 ]
Varior R R, Haloi M and Wang G. 2016. Gated siamese convolutional neural network architecture for human re-identification//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 791-808[ DOI: 10.1007/978-3-319-46484-8_48 http://dx.doi.org/10.1007/978-3-319-46484-8_48 ]
Xiao Q Q, Luo H and Zhang C. 2017. Margin sample mining loss: a deep learning based method for person re-identification[EB/OL ] .[2019-05-07 ] . https://arxiv.org/pdf/1710.00478.pdf https://arxiv.org/pdf/1710.00478.pdf
Yi D, Lei Z, Liao S C and Li S Z. 2014. Deep metric learning for person re- identification//Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm: IEEE: 34-39[ DOI: 10.1109/ICPR.2014.16 http://dx.doi.org/10.1109/ICPR.2014.16 ]
Zhang Y, Xiang T, Hospedales T M and Lu H C. 2017. Deep mutual learning[EB/OL ] .[2019-05-07 ] . https://arxiv.org/pdf/1706.00384.pdf https://arxiv.org/pdf/1706.00384.pdf
Zhao L M, Li X, Wang Y T and Wang J D. 2017. Deeply-learned part-aligned representations for person re-identification//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 3239-3248[ DOI: 10.1109/ICCV.2017.349 http://dx.doi.org/10.1109/ICCV.2017.349 ]
Zheng H L, Fu J L, Mei T and Luo J B. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 5219-5227[ DOI: 10.1109/ICCV.2017.557 http://dx.doi.org/10.1109/ICCV.2017.557 ]
Zheng L, Shen L Y, Tian L, Wang S J and Wang J D. 2015. Scalable person re-identification: a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1116-1124[ DOI: 10.1109/ICCV.2015.133 http://dx.doi.org/10.1109/ICCV.2015.133 ]
Zheng W S, Gong S G and Xiang T. 2013. Reidentification by relative distance comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3):653-668[DOI:10.1109/TPAMI.2012.138]
Zhong Z, Zheng L, Cao D L and Li S Z. 2017. Re-ranking person re-identification with k-reciprocal encoding//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 3652-3661[ DOI: 10.1109/CVPR.2017.389 http://dx.doi.org/10.1109/CVPR.2017.389 ]
Zhong Z,Zheng L, Zheng Z D, Li S Z and Yang Y. 2018. Camera style adaptation for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 5157-5166[ DOI: 10.1109/CVPR.2018.00541 http://dx.doi.org/10.1109/CVPR.2018.00541 ]
Zhou T, Chen M H, Yu J and Terzopoulos D. 2017. Attention-based natural language person retrieval//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE: 27-34[ DOI: 10.1109/CVPRW.2017.10 http://dx.doi.org/10.1109/CVPRW.2017.10 ]
Zhu F Q, Kong X W, Fu H Y and Tian Q. 2018. Two-stream complementary symmetrical CNN architecture for person re-identification. Journal of Image and Graphics, 23(7):1052-1060
朱福庆, 孔祥维, 付海燕, 田奇. 2018.两路互补对称CNN结构的行人再识别.中国图象图形学报, 23(7):1052-1060)[DOI:10.11834/jig.170557]
相关作者
相关机构
京公网安备11010802024621