发布时间: 2019-04-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180370 2019 | Volume 24 | Number 4 图像分析和识别

1. 合肥工业大学计算机与信息学院, 合肥 230009;
2. 工业安全与应急技术安徽省重点实验室, 合肥 230009
 收稿日期: 2018-06-07; 修回日期: 2018-11-07 基金项目: 国家自然科学基金项目（61876056，61771180，61174170）；安徽省重点研究与开发计划项目（1704d0802183） 第一作者简介: 蒋建国, 1955年生, 男, 教授, 硕士, 主要研究方向为数字图像分析与处理、分布式智能系统和DSP技术应用。E-mail:jgjiang@hfut.edu.cn;齐美彬, 男, 教授, 主要研究方向为视频目标检测与跟踪、机器视觉、DSP技术。E-mail:qimeibin@163.com;陈翠群, 女, 博士研究生, 主要研究方向为行人再识别。E-mail:chencuiqun_hfut@163.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2019)04-0513-10

Person re-identification with region block segmentation and fusion
Jiang Jianguo1,2, Yang Ning1, Qi Meibin1,2, Chen Cuiqun1
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;
2. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009, China
Supported by: National Natural Science Foundation of China (61876056, 61771180, 61174170); Key Research and Development Project of Anhui Province, China (1704d0802183)

Abstract

Objective The person re-identification task is of great value in multi-target tracking and the target retrieval of multi-cameras. Thus, it has received increasing attention in the field of computer vision and widespread interest among researchers at home and abroad in recent years. The differences in camera viewing angles and imaging quality lead to variations in pedestrian posture, image resolution, and illumination. These variations make the appearance of the same pedestrian in various surveillance videos considerably different. This difference, in turn, causes severe interference in person re-identification. To improve the recognition rate of person re-identification and solve the posture changing problem, this study proposes a person re-identification algorithm with region block segmentation and fusion on the basis of human body structure information. Method First, according to the distribution of the human body structure, a pedestrian image is divided into three local regions:the head part (the H region), the shoulder-knee part (the SK region), and the leg part (the L region). These local regions are enlarged to the original image size using a bilinear interpolation method, which can enhance the expression of the regions and fully use the region information. Second, according to the different roles of each local region in the recognition process, the Gaussian of Gaussian (GOG) feature is extracted from the H and the L regions. The GOG feature, the local maximal occurrence (LOMO) feature, and the kernel canonical correlation analysis (KCCA) feature are extracted from the SK region because the SK region contains the most abundant information of pedestrian images. Extracting numerous features in the SK region can increase the diversity of the region information and strengthen the role of the region in the re-identification process. Third, the interference block removal (IBR) algorithm is used to eliminate the invalid blocks in the image and fuse the similarities of the effective blocks. Given the differences in posture and viewpoint, some objects might appear in one image and be absent in another image of the same person captured by another camera. Such objects may cause large changes in the color and texture information of the pedestrian's corresponding body regions. These changes result in disturbances to the recognition process. The regions in which such objects are located are called interference blocks in this study. By observing the location of the interference blocks, we find that the interference blocks are distributed from the shoulder to the knee of pedestrians. Therefore, the IBR algorithm uses the image of the SK region. According to the human body structure distribution, the IBR algorithm horizontally divides the SK region into the chest part (h1 block), the lumbar part (h2 block), and the leg part (h3 block); and vertically divides the region into the left-arm part (v1 block), the torso part (v2 block), and the right-arm part (v3 block). Then, the GOG feature, LOMO feature, and KCCA feature are extracted from each block. The three features of each block are fed to the similarity measure function to obtain the three similarities between the corresponding blocks. The three similarities of the same block are merged to form the final similarity of the block. When the final similarities of the six block (h1, h2, h3, v1, v2, v3) pairs are calculated, the similarities of the three horizontal block (h1, h2, h3) pairs are compared to find the block with the smallest similarity, which is the interference block in the horizontal direction. The interference block in the vertical direction is found in the same manner. When the two interference blocks are removed, the influence of the interference block on the overall pedestrian similarity can be eliminated. After the interference blocks are removed, the similarities of the remaining four blocks are fused as the similarity of the SK region. Finally, the global similarity of the pedestrian image pair and the similarities of the three local regions (H, L, and SK) are combined to realize person re-identification. Result Many experiments are conducted on four benchmark datasets, namely, VIPeR, GRID, PRID450S, and CUHK01. The results of rank 1 (represents the proportion of queried people) for the four datasets are 62.85%, 30.56%, 71.82%, and 79.03%. The results of rank 5 are 86.17%, 51.20%, 91.16%, and 93.60%. The experimental results show the considerable improvement of recognition rates for the small and large datasets. Thus, the proposed algorithm offers practical application value. Conclusion Experimental results show that the proposed method can effectively express the image information of pedestrians. Furthermore, the proposed region block segmentation and fusion algorithm can remove useless and interference information in images as much as possible under the guidance of human body structure information. It can also preserve the effective information of pedestrians and use it effectively. This method can solve the differences in pedestrian appearance caused by changes in pedestrian posture to a certain extent and greatly improve recognition rates.

Key words

person re-identification; human structure information; region block segmentation; interference block removal; region block fusion

2 干扰块剔除和区域块融合

 $\begin{array}{l} \;\;\;\;\;\;\;\;{f_{{\rm{SK}}}}(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}}) = \sum\limits_{i \in \{ {\rm{h}}1, {\rm{h}}2, {\rm{h}}3, {\rm{v}}1, {\rm{v}}2, {\rm{v}}3\} } {f(\mathit{\boldsymbol{x}}_a^{{\rm{S}}{{\rm{K}}_i}}, \mathit{\boldsymbol{x}}_b^{{\rm{S}}{{\rm{K}}_i}})} - \\ \mathop {{\rm{min}}}\limits_{j \in \{ {\rm{h}}1, {\rm{h}}2, {\rm{h}}3\} } (f(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}_j}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}_j})) - \mathop {{\rm{min}}}\limits_{k \in \{ {\rm{v}}1, {\rm{v}}2, {\rm{v}}3\} } (f(\mathit{\boldsymbol{x}}_a^{{\rm{SK}}_k}, \mathit{\boldsymbol{x}}_b^{{\rm{SK}}_k})) \end{array}$ (1)

3 特征提取

KCCA特征利用一个各向异性的高斯核对原图像进行映射操作，从而提取出HSV、RGB和Lab 3种加权颜色直方图特征。LOMO特征高效融合了HSV特征和SILTP纹理特征，并通过最大值的操作使得LOMO特征能够应对视角变化。与这两种统计信息特征不同的是GOG特征。GOG特征直接利用图像各像素点自身的信息，首先对每个像素点提取包含位置、梯度和颜色信息的8维特征向量，然后通过块高斯操作融合局部块内各像素点的特征，再通过区域高斯操作融合局部块特征，最终获得具有良好区分性和鲁棒性的整幅图像的特征。

4.1 VIPeR数据集的测试结果

VIPeR数据集包含了632个行人，每个行人有来自两个不同的摄像机拍摄的两张图片。实验中随机选择316个行人构成训练集，剩下的316个行人构成测试集。实验结果如表 1所示。

Table 1 Recognition rates of different methods on VIPeR dataset

 /% 算法 Rank1 Rank5 Rank10 Rank20 NFST[17] 42.28 71.46 82.94 92.06 文献[22] 42.47 - 83.45 93.29 文献[1] 49.05 74.08 84.43 93.10 GOG[12] 49.7 79.7 88.7 94.5 SCSP[16] 53.54 82.59 91.49 96.65 本文(NFST) 51.80 79.40 89.46 95.76 本文(SCSP) 62.85 86.17 93.07 97.41 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.2 GRID数据集的测试结果

GRID数据集共有1 275张行人图片，其中有250个行人，每个行人的两张图片来自两个不同的摄像机。此外，数据集还包含了775张额外的行人图片，这775张行人图片不属于前面250人中的任何一人。每次实验随机选取125对行人构成训练集，剩余的125对行人和额外的775张不相关行人图像构成测试集。实验结果如表 2所示。

Table 2 Recognition rates of different methods on GRID

 /% 算法 Rank1 Rank5 Rank10 Rank20 LOMO[11] 16.56 33.8 41.84 52.40 BRSF[23] 23.96 - 54.56 63.24 SCSP[16] 24.24 44.56 54.08 65.20 GOG[12] 24.7 47.0 58.4 69.0 本文(NFST) 27.92 46.16 56.56 67.44 本文(SCSP) 30.56 51.20 60.80 71.44 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.3 PRID450S数据集的测试结果

PRID450S数据集也是一个比较经典的行人再识别数据集，是PRID2011数据集的扩展数据集，一共拍摄了450个行人，共900张行人图片，每个行人的两张图片来自两个不同的摄像机。每次实验随机选取225个行人构成训练集，剩余的225个行人构成测试集。实验结果如表 3所示。

Table 3 Recognition rates of different methods on PRID450S

 /% 算法 Rank1 Rank5 Rank10 Rank20 文献[24] 48.0 76.2 86.2 92.9 文献[22] 60.62 - 89.82 94.62 FNN[25] 66.62 86.84 92.84 96.89 GOG[12] 68.4 88.8 94.5 97.8 本文(NFST) 70.0 89.20 92.93 96.04 本文(SCSP) 71.82 91.16 95.73 97.56 注：加粗字体为该排列(Rank)下的最优结果，“-”表示原文献未提供该排列下的结果。

4.4 CUHK01数据集的测试结果

CUHK01数据集是一个比较大的数据集，共有971对行人的3 884张行人图片。CUHK01只包含来自两个摄像机的图片，每个摄像机对每个行人拍摄两张图片，所以每个行人共有4张图片。每次实验随机选取485对行人构成训练集，剩余的486对行人构成测试集。实验结果如表 4所示。

Table 4 Recognition rates of different methods on CUHK01

 /% 算法 Rank1 Rank5 Rank10 Rank20 FNN[25] 55.51 78.40 83.68 92.59 LOMO[11] 63.21 83.89 90.04 94.16 NFST[17] 64.98 84.96 89.92 94.36 GOG[12] 67.3 86.9 91.8 95.9 文献[1] 70.45 87.92 92.67 96.34 本文(NFST) 76.83 91.34 95.02 97.47 本文(SCSP) 79.03 93.60 96.89 98.81 注：加粗字体为该排列(Rank)下的最优结果。

4.5 BSS分割和IBR算法有效性的验证

Table 5 Recognition rates of whether the image uses BSS segmentation

 /% 图像状态 Rank1 Rank5 Rank10 Rank20 无分割 40.73 72.66 83.67 90.89 分割未放大 41.42 71.90 83.54 91.49 分割并放大 43.73 74.87 84.56 92.82 注：加粗字体为该排列(Rank)下的最优结果。

IBR算法处理的是SK区图像中的干扰块，因此实验中采用的是经过分割并放大后的图像。同时为更好地体现IBR算法的有效性，在实验中加入了行人全局图像的特征。表 6给出实验中是否使用IBR算法的实验对比。

Table 6 Recognition rates of whether the proposed algorithm uses IBR removal

 /% IBR算法 Rank1 Rank5 Rank10 Rank20 不使用 49.08 77.66 87.97 94.21 使用 50.13 78.86 88.26 94.68 注：加粗字体为该排列(Rank)下的最优结果。

Table 7 Recognition rates of the different segmentation methods in the SK region

 /% 分割方法 Rank1 Rank5 Rank10 Rank20 均分成3块 50.13 78.86 88.26 94.68 均分成4块 50.64 78.93 88.99 94.80 均分成5块 48.23 76.52 88.07 94.23 注：加粗字体为该排列(Rank)下的最优结果。

