发布时间: 2019-04-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180370 2019 | Volume 24 | Number 4 图像分析和识别

1. 合肥工业大学计算机与信息学院, 合肥 230009;
2. 工业安全与应急技术安徽省重点实验室, 合肥 230009
 收稿日期: 2018-06-07; 修回日期: 2018-11-07 基金项目: 国家自然科学基金项目（61876056，61771180，61174170）；安徽省重点研究与开发计划项目（1704d0802183） 第一作者简介: 蒋建国, 1955年生, 男, 教授, 硕士, 主要研究方向为数字图像分析与处理、分布式智能系统和DSP技术应用。E-mail:jgjiang@hfut.edu.cn;齐美彬, 男, 教授, 主要研究方向为视频目标检测与跟踪、机器视觉、DSP技术。E-mail:qimeibin@163.com;陈翠群, 女, 博士研究生, 主要研究方向为行人再识别。E-mail:chencuiqun_hfut@163.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2019)04-0513-10

关键词

Person re-identification with region block segmentation and fusion
Jiang Jianguo1,2, Yang Ning1, Qi Meibin1,2, Chen Cuiqun1
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;
2. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009, China
Supported by: National Natural Science Foundation of China (61876056, 61771180, 61174170); Key Research and Development Project of Anhui Province, China (1704d0802183)

Abstract

Objective The person re-identification task is of great value in multi-target tracking and the target retrieval of multi-cameras. Thus, it has received increasing attention in the field of computer vision and widespread interest among researchers at home and abroad in recent years. The differences in camera viewing angles and imaging quality lead to variations in pedestrian posture, image resolution, and illumination. These variations make the appearance of the same pedestrian in various surveillance videos considerably different. This difference, in turn, causes severe interference in person re-identification. To improve the recognition rate of person re-identification and solve the posture changing problem, this study proposes a person re-identification algorithm with region block segmentation and fusion on the basis of human body structure information. Method First, according to the distribution of the human body structure, a pedestrian image is divided into three local regions:the head part (the H region), the shoulder-knee part (the SK region), and the leg part (the L region). These local regions are enlarged to the original image size using a bilinear interpolation method, which can enhance the expression of the regions and fully use the region information. Second, according to the different roles of each local region in the recognition process, the Gaussian of Gaussian (GOG) feature is extracted from the H and the L regions. The GOG feature, the local maximal occurrence (LOMO) feature, and the kernel canonical correlation analysis (KCCA) feature are extracted from the SK region because the SK region contains the most abundant information of pedestrian images. Extracting numerous features in the SK region can increase the diversity of the region information and strengthen the role of the region in the re-identification process. Third, the interference block removal (IBR) algorithm is used to eliminate the invalid blocks in the image and fuse the similarities of the effective blocks. Given the differences in posture and viewpoint, some objects might appear in one image and be absent in another image of the same person captured by another camera. Such objects may cause large changes in the color and texture information of the pedestrian's corresponding body regions. These changes result in disturbances to the recognition process. The regions in which such objects are located are called interference blocks in this study. By observing the location of the interference blocks, we find that the interference blocks are distributed from the shoulder to the knee of pedestrians. Therefore, the IBR algorithm uses the image of the SK region. According to the human body structure distribution, the IBR algorithm horizontally divides the SK region into the chest part (h1 block), the lumbar part (h2 block), and the leg part (h3 block); and vertically divides the region into the left-arm part (v1 block), the torso part (v2 block), and the right-arm part (v3 block). Then, the GOG feature, LOMO feature, and KCCA feature are extracted from each block. The three features of each block are fed to the similarity measure function to obtain the three similarities between the corresponding blocks. The three similarities of the same block are merged to form the final similarity of the block. When the final similarities of the six block (h1, h2, h3, v1, v2, v3) pairs are calculated, the similarities of the three horizontal block (h1, h2, h3) pairs are compared to find the block with the smallest similarity, which is the interference block in the horizontal direction. The interference block in the vertical direction is found in the same manner. When the two interference blocks are removed, the influence of the interference block on the overall pedestrian similarity can be eliminated. After the interference blocks are removed, the similarities of the remaining four blocks are fused as the similarity of the SK region. Finally, the global similarity of the pedestrian image pair and the similarities of the three local regions (H, L, and SK) are combined to realize person re-identification. Result Many experiments are conducted on four benchmark datasets, namely, VIPeR, GRID, PRID450S, and CUHK01. The results of rank 1 (represents the proportion of queried people) for the four datasets are 62.85%, 30.56%, 71.82%, and 79.03%. The results of rank 5 are 86.17%, 51.20%, 91.16%, and 93.60%. The experimental results show the considerable improvement of recognition rates for the small and large datasets. Thus, the proposed algorithm offers practical application value. Conclusion Experimental results show that the proposed method can effectively express the image information of pedestrians. Furthermore, the proposed region block segmentation and fusion algorithm can remove useless and interference information in images as much as possible under the guidance of human body structure information. It can also preserve the effective information of pedestrians and use it effectively. This method can solve the differences in pedestrian appearance caused by changes in pedestrian posture to a certain extent and greatly improve recognition rates.

Key words

person re-identification; human structure information; region block segmentation; interference block removal; region block fusion

2 干扰块剔除和区域块融合

 $\begin{array}{l} \;\;\;\;\;\;\;\;{f_{{\rm{SK}}}}(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}}) = \sum\limits_{i \in \{ {\rm{h}}1, {\rm{h}}2, {\rm{h}}3, {\rm{v}}1, {\rm{v}}2, {\rm{v}}3\} } {f(\mathit{\boldsymbol{x}}_a^{{\rm{S}}{{\rm{K}}_i}}, \mathit{\boldsymbol{x}}_b^{{\rm{S}}{{\rm{K}}_i}})} - \\ \mathop {{\rm{min}}}\limits_{j \in \{ {\rm{h}}1, {\rm{h}}2, {\rm{h}}3\} } (f(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}_j}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}_j})) - \mathop {{\rm{min}}}\limits_{k \in \{ {\rm{v}}1, {\rm{v}}2, {\rm{v}}3\} } (f(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}_k}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}_k})) \end{array}$ (1)

3 特征提取

KCCA特征利用一个各向异性的高斯核对原图像进行映射操作，从而提取出HSV、RGB和Lab 3种加权颜色直方图特征。LOMO特征高效融合了HSV特征和SILTP纹理特征，并通过最大值的操作使得LOMO特征能够应对视角变化。与这两种统计信息特征不同的是GOG特征。GOG特征直接利用图像各像素点自身的信息，首先对每个像素点提取包含位置、梯度和颜色信息的8维特征向量，然后通过块高斯操作融合局部块内各像素点的特征，再通过区域高斯操作融合局部块特征，最终获得具有良好区分性和鲁棒性的整幅图像的特征。

4.1 VIPeR数据集的测试结果

VIPeR数据集包含了632个行人，每个行人有来自两个不同的摄像机拍摄的两张图片。实验中随机选择316个行人构成训练集，剩下的316个行人构成测试集。实验结果如表 1所示。

Table 1 Recognition rates of different methods on VIPeR dataset

 /% 算法 Rank1 Rank5 Rank10 Rank20 NFST[17] 42.28 71.46 82.94 92.06 文献[22] 42.47 - 83.45 93.29 文献[1] 49.05 74.08 84.43 93.10 GOG[12] 49.7 79.7 88.7 94.5 SCSP[16] 53.54 82.59 91.49 96.65 本文(NFST) 51.80 79.40 89.46 95.76 本文(SCSP) 62.85 86.17 93.07 97.41 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.2 GRID数据集的测试结果

GRID数据集共有1 275张行人图片，其中有250个行人，每个行人的两张图片来自两个不同的摄像机。此外，数据集还包含了775张额外的行人图片，这775张行人图片不属于前面250人中的任何一人。每次实验随机选取125对行人构成训练集，剩余的125对行人和额外的775张不相关行人图像构成测试集。实验结果如表 2所示。

Table 2 Recognition rates of different methods on GRID

 /% 算法 Rank1 Rank5 Rank10 Rank20 LOMO[11] 16.56 33.8 41.84 52.40 BRSF[23] 23.96 - 54.56 63.24 SCSP[16] 24.24 44.56 54.08 65.20 GOG[12] 24.7 47.0 58.4 69.0 本文(NFST) 27.92 46.16 56.56 67.44 本文(SCSP) 30.56 51.20 60.80 71.44 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.3 PRID450S数据集的测试结果

PRID450S数据集也是一个比较经典的行人再识别数据集，是PRID2011数据集的扩展数据集，一共拍摄了450个行人，共900张行人图片，每个行人的两张图片来自两个不同的摄像机。每次实验随机选取225个行人构成训练集，剩余的225个行人构成测试集。实验结果如表 3所示。

Table 3 Recognition rates of different methods on PRID450S

 /% 算法 Rank1 Rank5 Rank10 Rank20 文献[24] 48.0 76.2 86.2 92.9 文献[22] 60.62 - 89.82 94.62 FNN[25] 66.62 86.84 92.84 96.89 GOG[12] 68.4 88.8 94.5 97.8 本文(NFST) 70.0 89.20 92.93 96.04 本文(SCSP) 71.82 91.16 95.73 97.56 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.4 CUHK01数据集的测试结果

CUHK01数据集是一个比较大的数据集，共有971对行人的3 884张行人图片。CUHK01只包含来自两个摄像机的图片，每个摄像机对每个行人拍摄两张图片，所以每个行人共有4张图片。每次实验随机选取485对行人构成训练集，剩余的486对行人构成测试集。实验结果如表 4所示。

Table 4 Recognition rates of different methods on CUHK01

 /% 算法 Rank1 Rank5 Rank10 Rank20 FNN[25] 55.51 78.40 83.68 92.59 LOMO[11] 63.21 83.89 90.04 94.16 NFST[17] 64.98 84.96 89.92 94.36 GOG[12] 67.3 86.9 91.8 95.9 文献[1] 70.45 87.92 92.67 96.34 本文(NFST) 76.83 91.34 95.02 97.47 本文(SCSP) 79.03 93.60 96.89 98.81 注：加粗字体为该排列(Rank)下的最优结果。

4.5 BSS分割和IBR算法有效性的验证

Table 5 Recognition rates of whether the image uses BSS segmentation

 /% 图像状态 Rank1 Rank5 Rank10 Rank20 无分割 40.73 72.66 83.67 90.89 分割未放大 41.42 71.90 83.54 91.49 分割并放大 43.73 74.87 84.56 92.82 注：加粗字体为该排列(Rank)下的最优结果。

IBR算法处理的是SK区图像中的干扰块，因此实验中采用的是经过分割并放大后的图像。同时为更好地体现IBR算法的有效性，在实验中加入了行人全局图像的特征。表 6给出实验中是否使用IBR算法的实验对比。

Table 6 Recognition rates of whether the proposed algorithm uses IBR removal

 /% IBR算法 Rank1 Rank5 Rank10 Rank20 不使用 49.08 77.66 87.97 94.21 使用 50.13 78.86 88.26 94.68 注：加粗字体为该排列(Rank)下的最优结果。

Table 7 Recognition rates of the different segmentation methods in the SK region

 /% 分割方法 Rank1 Rank5 Rank10 Rank20 均分成3块 50.13 78.86 88.26 94.68 均分成4块 50.64 78.93 88.99 94.80 均分成5块 48.23 76.52 88.07 94.23 注：加粗字体为该排列(Rank)下的最优结果。

参考文献

• [1] Chu H F, Qi M B, Liu H, et al. Local region partition for person re-identification[J]. Multimedia Tools and Applications, 2017. [DOI:10.1007/s11042-017-4817-4]
• [2] Qi M B, Hu L F, Jiang J G, et al. Person re-identification based on multi-features fusion and independent metric learning[J]. Journal of Image and Graphics, 2016, 21(11): 1464–1472. [齐美彬, 胡龙飞, 蒋建国, 等. 多特征融合与独立测度学习的行人再识别[J]. 中国图象图形学报, 2016, 21(11): 1464–1472. ] [DOI:10.11834/jig.20161106]
• [3] Qi M B, Wang C C, Jiang J G, et al. Person re-identification based on multi-feature fusion and alternating direction method of multipliers[J]. Journal of Image and Graphics, 2018, 23(6): 827–836. [齐美彬, 王慈淳, 蒋建国, 等. 多特征融合与交替方向乘子法的行人再识别[J]. 中国图象图形学报, 2018, 23(6): 827–836. ] [DOI:10.11834/jig.170507]
• [4] You J J, Wu A C, Li X, et al. Top-push video-based person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1345-1353.[DOI:10.1109/CVPR.2016.150]
• [5] Zheng W S, Li X, Xiang T, et al. Partial person re-identification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 4678-4686.[DOI:10.1109/ICCV.2015.531]
• [6] Wei L H, Zhang S L, Yao H T, et al. GLAD: global-local-alignment descriptor for pedestrian retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, CA, USA: ACM, 2017: 420-428.[DOI:10.1145/3123266.3123279]
• [7] Liu H, Jie Z Q, Jayashree K, et al. Video-based person re-identification with accumulative motion context[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10): 2788–2802. [DOI:10.1109/TCSVT.2017.2715499]
• [8] Zhao H Y, Tian M Q, Sun S Y, et al. Spindle net: person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 907-915.[DOI:10.1109/CVPR.2017.103]
• [9] Li D W, Chen X T, Zhang Z, et al. Learning deep context-aware features over body and latent parts for person re-identification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 7398-7404.[DOI:10.1109/CVPR.2017.782]
• [10] Liao S C, Zhao G Y, Kellokumpu V, et al. Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1301-1306.[DOI:10.1109/CVPR.2010.5539817]
• [11] Liao S C, Hu Y, Zhu X Y, et al. Person re-identification by local maximal occurrence representation and metric learning[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 2197-2206.[DOI:10.1109/CVPR.2015.7298832]
• [12] Matsukawa T, Okabe T, Suzuki E, et al. Hierarchical Gaussian descriptor for person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1363-1372.[DOI:10.1109/CVPR.2016.152]
• [13] Lisanti G, Masi I, del Bimbo A. Matching people across camera views using kernel canonical correlation analysis[C]//Proceedings of 2014 International Conference on Distributed Smart Cameras. Venezia Mestre, Italy: ACM, 2014: #10.[DOI:10.1145/2659021.2659036]
• [14] Farenzena M, Bazzani L, Perina A, et al. Person re-identification by symmetry-driven accumulation of local features[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 2360-2367.[DOI:10.1109/CVPR.2010.5539926]
• [15] Bąk S, Corvee E, Bremond F, et al. Person re-identification using spatial covariance regions of human body parts[C]//Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. Boston, MA, USA: IEEE, 2010: 435-440.[DOI:10.1109/AVSS.2010.34]
• [16] Chen D P, Yuan Z J, Chen B D, et al. Similarity learning with spatial constraints for person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1268-1277.[DOI:10.1109/CVPR.2016.142]
• [17] Zhang L, Xiang T, Gong S G. Learning a discriminative null space for person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1239-1248.[DOI:10.1109/CVPR.2016.139]
• [18] Gray D, Tao H. Viewpoint invariant pedestrian recognition with an ensemble of localized features[C]//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer, 2008: 262-275.[DOI:10.1007/978-3-540-88682-2_21]
• [19] Loy C C, Xiang T, Gong S G. Time-delayed correlation analysis for multi-camera activity understanding[J]. International Journal of Computer Vision, 2010, 90(1): 106–129. [DOI:10.1007/s11263-010-0347-5]
• [20] Roth P M, Hirzer M, Köstinger M, et al. Mahalanobis distance learning for person re-identification[M]//Gong S G, Cristani M, Yan S C, et al. Person Re-Identification. London: Springer, 2014: 247-267.[DOI:10.1007/978-1-4471-6296-4_12]
• [21] Li W, Zhao R, Wang X G. Human reidentification with transferred metric learning[C]//Proceedings of the 11th Asian Conference on Computer Vision. Daejeon, Korea: Springer, 2012: 31-44.[DOI:10.1007/978-3-642-37331-2_3]
• [22] Zhang J, Zhao X. Global-local metric learning for person re-identification[J]. Journal of Image and Graphics, 2017, 22(4): 472–481. [张晶, 赵旭. 整合全局-局部度量学习的人体目标再识别[J]. 中国图象图形学报, 2017, 22(4): 472–481. ] [DOI:10.11834/jig.20170407]
• [23] Zhang N, Zhang F X, Wang Q, et al. Learning bidirectional relationship similarity function for person re-identification[J]. Computer Systems & Applications, 2018, 27(5): 33–40. [张娜, 张福星, 王强, 等. 基于双向关系相似度函数学习的行人再识别[J]. 计算机系统应用, 2018, 27(5): 33–40. ] [DOI:10.15888/j.cnki.csa.006354]
• [24] Liu Q, Hou L, Peng Z Y. Invariant feature and kernel distance metric learning based person re-identification[J]. Journal of Image and Signal Processing, 2018, 7(2): 65–73. [刘琦, 侯丽, 彭章友. 基于不变特征和核距离度量学习的行人再识别[J]. 图像与信号处理, 2018, 7(2): 65–73. ] [DOI:10.12677/JISP.2018.72008]
• [25] Wu S X, Chen Y C, Li X, et al. An enhanced deep feature representation for person re-identification[C]//Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision. Lake Placid, NY, USA: IEEE, 2016.[DOI:10.1109/WACV.2016.7477681]