Person re-identification based on top-view depth head and shoulder sequence

Xinnian Wang; Chunhua Liu; Guoqing Qi; Shiqiang Zhang

doi:10.11834/jig.190608

Image Analysis and Recognition | Views : 0 下载量: 112 CSCD: 1

PDF
Export
Share
Collection
Album

Person re-identification based on top-view depth head and shoulder sequence
Vol. 25, Issue 7, Pages: 1393-1407(2020)
Received：02 December 2019，

Revised：2020-1-20，

Accepted：27 January 2020，

Published：16 July 2020
DOI： 10.11834/jig.190608
稿件说明：

移动端阅览

DOI：

Xinnian Wang, Chunhua Liu, Guoqing Qi, Shiqiang Zhang. Person re-identification based on top-view depth head and shoulder sequence[J]. Journal of Image and Graphics, 2020, 25(7): 1393-1407. DOI： 10.11834/jig.190608.

摘要

目的

行人再识别是指在一个或者多个相机拍摄的图像或视频中实现行人匹配的技术，广泛用于图像检索、智能安保等领域。按照相机种类和拍摄视角的不同，行人再识别算法可主要分为基于侧视角彩色相机的行人再识别算法和基于俯视角深度相机的行人再识别算法。在侧视角彩色相机场景中，行人身体的大部分表观信息可见；而在俯视角深度相机场景中，仅行人头部和肩部的结构信息可见。现有的多数算法主要针对侧视角彩色相机场景，只有少数算法可以直接应用于俯视角深度相机场景中，尤其是低分辨率场景，如公交车的车载飞行时间（time of flight，TOF）相机拍摄的视频。因此针对俯视角深度相机场景，本文提出了一种基于俯视深度头肩序列的行人再识别算法，以期提高低分辨率场景下的行人再识别精度。

方法

对俯视深度头肩序列进行头部区域检测和卡尔曼滤波器跟踪，获取行人的头部图像序列，构建头部深度能量图组（head depth energy map group，HeDEMaG），并据此提取深度特征、面积特征、投影特征、傅里叶描述子和方向梯度直方图（histogram of oriented gradient，HOG）特征。计算行人之间头部深度能量图组的各特征之间的相似度，再利用经过模型学习所获得的权重系数对各特征相似度进行加权融合，从而得到相似度总分，将最大相似度对应的行人标签作为识别结果，实现行人再识别。

结果

本文算法在公开的室内单人场景TVPR（top view person re-identification）数据集、自建的室内多人场景TDPI-L（top-view depth based person identification for laboratory scenarios）数据集和公交车实际场景TDPI-B（top-view depth based person identification for bus scenarios）数据集上进行了测试，使用首位匹配率（rank-1）、前5位匹配率（rank-5）、宏F1值（macro-F1）、累计匹配曲线（cumulative match characteristic，CMC）和平均耗时等5个指标来衡量算法性能。其中，rank-1、rank-5和macro-F1分别达到61%、68%和67%以上，相比于典型算法至少提高了11%。

结论

本文构建了表达行人结构与行为特征的头部深度能量图组，实现了适合低分辨率行人的多特征表达；提出了基于权重学习的相似度融合，提高了识别精度，在室内单人、室内多人和公交车实际场景数据集中均取得了较好的效果。

Abstract

Objective

Person reidentification is an important task in video surveillance systems with a goal to establish the correspondence among images or videos of a person taken from different cameras at different times. In accordance with camera types

person re-identification algorithms can be divided into RGB camera-based and depth camera-based ones. RGB camera-based algorithms are generally based on the appearance characteristics of clothes

such as color and texture. Their performances are greatly affected by external conditions

such as illumination variations. On the contrary

depth camera-based algorithms are minimally affected by lighting conditions. Person re-identification algorithms can also be divided into side view-oriented and vertical view-oriented algorithms according to camera-shooting angle. Most body parts can be seen in side-view scenarios

whereas only the plan view of head and shoulders can be seen in vertical-view scenarios. Most existing algorithms are for side-view RGB scenarios

and only a few of them can be directly applied to top-view depth scenarios. For example

they have poor performance in the case of bus-mounted low-resolution depth cameras. Our focus is on person re-identification on depth head and shoulder sequences.

Method

The proposed person re-identification algorithm consists of four modules

namely

head region detection

head depth energy map group (HeDEMaG) construction

HeDEMaG-based multifeature representation and similarity computation

and learning-based score-level fusion and person re-identification. First

the head region detection module is to detect each head region in every frame. The pixel value in a depth image represents the distance between an object and the camera plane. The range that the height of a person distributes is used to roughly segment the candidate head regions. A frame-averaging model is proposed to compute the distance between floor and the camera plane for determining the height of each person with respect to floor. The person's height can be computed by subtracting floor values from the raw frame. The circularity ratio of a head region is used to remove nonhead regions from the candidate regions because the shape of a real head region is similar to a circle. Second

the HeDEMaG construction module is to describe the structural and behavioral characteristics of a walking person's head. Kalman filter and Hungarian matching method are used to track multiple persons' heads in each frame. In the walking process

the head direction may change with time. A principal component analysis(PCA)based method is used to normalize the direction of a person's head regions. Each person's normalized head image sequence is uniformly divided into

groups in time order to capture the structural and behavioral characteristics of a person's head in local and overall time periods. The average map of each group is called the head depth energy map

and the set of the head depth energy maps is named as HeDEMaG. Third

the HeDEMaG-based multifeature representation and similarity computation module is to extract features and compute the similarity between the probe and gallery set. The depth

area

projection maps in two directions

Fourier descriptor

and histogram of oriented gradient(HOG) feature of each head depth energy map in HeDEMaG are proposed to represent a person. The similarity on depth is defined as the ratio of the depth difference to the maximum difference between the probe and gallery set. The similarity on area is defined as the ratio of the area difference to the maximum difference between the probe and gallery set. The similarities on projections

Fourier descriptor

and HOG are computed by their correlation coefficients. Fourth

the learning-based similarity score-level fusion and person re-identification module is to identify persons according to the similarity score that is defined as a weighted version of the above-mentioned five similarity values. The fusing weights are learned from the training set by minimizing the cost function that measures the error rate of recognition. In the experiments

we use the label of the top one image in the ranked list as the predicted label of the probe.

Result

Experiments are conducted on a public top view person re-identification(TVPR) dataset and two self-built datasets to verify the effectiveness of the proposed algorithm. TVPR consists of videos recorded indoors using a vertical RGB-D camera

and only one person's walking behavior is recorded. We establish two datasets

namely

top-view depth based person identification for laboratory scenarios(TDPI-L) and top-view depth based person identification for bus scenarios(TDPI-B)

to verify the performance on multiple persons and real-world scenarios. TDPI-L is composed of videos captured indoors by depth cameras

and more than two persons' walking is recorded in each frame. TDPI-B consists of sequences recorded by bus-mounted low-resolution time of flight(TOF) cameras. Five measures

namely

rank-1

rank-5

macro-F1

cumulative match characteristic(CMC) and average time are used to evaluate the proposed algorithm. The rank-1

rank-5

and macro-F1 of the proposed algorithm are above 61%

68%

and 67%

respectively

which are at least 11% higher than those of the state-of-the-art algorithms. The ablation studies and the effects of tracking algorithms and parameters on the performance are also discussed.

Conclusion

The proposed algorithm is to identify persons in head and shoulder sequences captured by depth cameras from top views. HeDEMaG is proposed to represent the structural and behavioral characteristics of persons. A learning-based fusing weight-computing method is proposed to avoid parameter fine tuning and improve the recognition accuracy. Experimental results show that proposed algorithm outperforms the state-of-the-art algorithms on public available indoor videos and real-world low-resolution bus-mounted videos.

关键词

Keywords

references

Barbosa I B, Cristani M, Del Bue A, Bazzani L and Murino V. 2012. Re-identification with RGB-D sensors//Proceedings of 2012 European Conference on Computer Vision. Florence: Springer: 433-442[ DOI: 10.1007/978-3-642-33863-2_43 http://dx.doi.org/10.1007/978-3-642-33863-2_43 ]

Bewley A, Ge Z Y, Ott L, Ramos F and Upcroft B. 2016. Simple online and realtime tracking//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix: IEEE: 3464-3468[ DOI: 10.1109/ICIP.2016.7533003 http://dx.doi.org/10.1109/ICIP.2016.7533003 ]

Han J and Bhanu B. 2006. Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2):316-322[DOI:10.1109/TPAMI.2006.38]

Held D, Thrun S and Savarese S. 2016. Learning to track at 100 FPS with deep regression networks//Proceedings of 2016 European Conference on Computer Vision. Amsterdam: Springer: 749-765[ DOI: 10.1007/978-3-319-46448-0_45 http://dx.doi.org/10.1007/978-3-319-46448-0_45 ]

Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583-596[DOI:10.1109/TPAMI.2014.2345390]

Hofmann M, Bachmann S and Rigoll G. 2012. 2.5D gait biometrics using the depth gradient histogram energy image//Proceedings of the 5th IEEE International Conference on Biometrics: Theory, Applications and Systems. Arlington: IEEE: 399-403[ DOI: 10.1109/BTAS.2012.6374606 http://dx.doi.org/10.1109/BTAS.2012.6374606 ]

Imani Z and Soltanizadeh H. 2019. Local binary pattern, local derivative pattern and skeleton features for RGB-D person re-identification. National Academy Science Letters, 42(3):233-238[DOI:10.1007/s40009-018-0736-9]

Jiang J G, Yang N, Qi M B and Chen C Q. 2019. Person re-identification with region block segmentation and fusion. Journal of Image and Graphics, 24(04):513-522

蒋建国, 杨宁, 齐美彬, 陈翠群.2019.区域块分割与融合的行人再识别.中国图象图形学报, 24(04):513-522[DOI:10. 11834/jig. 180370]

Kim M, Jung J, Kim H and Paik J. 2017. Person Re-identification using color name descriptor-based sparse representation//Proceedings of the 7th IEEE Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE: 1-4[ DOI: 10.1109/CCWC.2017.7868394 http://dx.doi.org/10.1109/CCWC.2017.7868394 ]

Liao S C, Hu Y, Zhu X Y and Li S Z. 2015. Person re-identification by local maximal occurrence representation and metric learning//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 2197-2206[ DOI: 10.1109/CVPR.2015.7298832 http://dx.doi.org/10.1109/CVPR.2015.7298832 ]

Liao S C, Zhao G Y, Kellokumpu V, Pietikäinen M and Li S Z. 2010. Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE: 1301-1306[ DOI: 10.1109/CVPR.2010.5539817 http://dx.doi.org/10.1109/CVPR.2010.5539817 ]

Matsukawa T, Okabe T, Suzuki E and Sato Y. 2016. Hierarchical Gaussian descriptor for person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1363-1372[ DOI: 10.1109/CVPR.2016.152 http://dx.doi.org/10.1109/CVPR.2016.152 ]

Munaro M, Basso A, Fossati A, Van Gool L and Menegatti E. 2014a. 3D reconstruction of freely moving persons for re-identification with a depth sensor//Proceedings of 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE: 4512-4519[ DOI: 10.1109/ICRA.2014.6907518 http://dx.doi.org/10.1109/ICRA.2014.6907518 ]

Munaro M, Fossati A, Basso A, Menegatti E and Van Gool L. 2014b. One-shot person re-identification with a consumer depth camera//Gong S G, Cristani M, Yan S C, Loy C C, eds. Person Re-Identification. London: Springer: 161-181[ DOI: 10.1007/978-1-4471-6296-4_8 http://dx.doi.org/10.1007/978-1-4471-6296-4_8 ]

Munaro M, Ghidoni S, Dizmen D T and Menegatti E. 2014c. A feature-based approach to people re-identification using skeleton keypoints//Proceedings of 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE: 5644-5651[ DOI: 10.1109/ICRA.2014.6907689 http://dx.doi.org/10.1109/ICRA.2014.6907689 ]

Nguyen T B, Nguyen H Q, Le T L, Pham T T T and Pham N N. 2019. A quantitative analysis of the effect of human detection and segmentation quality in person re-identification performance//Proceedings of 2019 International Conference on Multimedia Analysis and Pattern Recognition. Ho Chi Minh City: IEEE: 1-6[ DOI: 10.1109/MAPR.2019.8743532 http://dx.doi.org/10.1109/MAPR.2019.8743532 ]

Nguyen T B, Tran D L, Le T L, Pham T T T and Doan H G. 2018. An effective implementation of Gaussian of Gaussian descriptor for person re-identification//Proceedings of the 5th NAFOSTED Conference on Information and Computer Science. Ho Chi Minh City: IEEE: 388-393[ DOI: 10.1109/NICS.2018.8606858 http://dx.doi.org/10.1109/NICS.2018.8606858 ]

Paolanti M, Romeo L, Liciotti D, Pietrini R, Cenci A, Frontoni E and Zingaretti P. 2018. Person re-identification with RGB-D camera in top-view configuration through multiple nearest neighbor classifiers and neighborhood component features selection. Sensors, 18(10):#3471[DOI:10.3390/s18103471]

Sivapalan S, Chen D, Denman S, Sridharan S and Fookes C. 2011. Gait energy volumes and frontal gait recognition using depth images//Proceedings of 2011 International Joint Conference on Biometrics. Washington: IEEE: 1-6[ DOI: 10.1109/IJCB.2011.6117504 http://dx.doi.org/10.1109/IJCB.2011.6117504 ]

Wu A C, Zheng W S and Lai J H. 2017. Robust depth-based person re-identification. IEEE Transactions on Image Processing, 26(6):2588-2603[DOI:10.1109/TIP.2017.2675201]

Xiang Y, Alahi A and Savarese S. 2015. Learning to track: online multi-object tracking by decision making// Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4705-4713[ DOI: 10.1109/ICCV.2015.534 http://dx.doi.org/10.1109/ICCV.2015.534 ]

Zheng S T, Li X Y, Jiang Z Q and Guo X Q. 2017. LOMO3D descriptor for video-based person re-identification//Proceedings of 2017 IEEE Global Conference on Signal and Information Processing. Montreal: IEEE: 672-676[ DOI: 10.1109/GlobalSIP.2017.8309044 http://dx.doi.org/10.1109/GlobalSIP.2017.8309044 ]

Alert me when the article has been cited

提交

View-aware feature learning for person re-identification

Person re-identification based on deformation and occlusion mechanisms

Improving person re-identification by attention and multi-attributes

Multi-resolution feature attention fusion method for person re-identification

Multishape part network architecture for person re-identification