多层语义融合CNN的步态人体语义分割
Semantic segmentation of gait body with multilayer semantic fusion convolutional neural network
- 2019年24卷第8期 页码:1302-1314
收稿:2018-10-18,
修回:2019-3-6,
纸质出版:2019-08-16
DOI: 10.11834/jig.180597
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-10-18,
修回:2019-3-6,
纸质出版:2019-08-16
移动端阅览
目的
2
针对反恐、安防领域利用监控视频进行步态识别时由光照、拍摄角度、遮挡等多协变量引起的轮廓缺失、人体阴影和运算时间等问题,提出了一种基于RPGNet(Regin of Interest+Parts of Body Semantics+GaitNet)网络的步态人体语义分割方法。
方法
2
该方法按照功能划分为R(region of interest)模块、P(parts of body semantics)模块和GNet(GaitNet)模块。R模块提取人体步态感兴趣区域,起到提升算法效率和图像去噪的作用。P模块借助LabelMe开源图像注释工具进行步态人体部位语义标注。GNet模块进行步态人体部位语义训练与分割。借鉴ResNet和RefineNet网络模型,设计了一种细节性步态语义分割网络模型。
结果
2
对步态数据库1 380张图片进行了测试,RPGNet方法与6种人体轮廓分割方法进行了对比实验,实验结果表明RPGNet方法对细节和全局信息处理得都很精确,在0°、45°和90°视角都表现出较高的分割正确率。在多人、戴帽和遮挡条件下,实验结果表明RPGNet方法人体分割效果良好,能够满足步态识别过程中的实时性要求。
结论
2
实验结果表明,RPGNet步态人体语义分割方法在多协变量情况下能够有效进行步态人体语义分割,同时也有效提高了步态识别的识别率。
Objective
2
Gait recognition has many advantages over DNA
fingerprint
iris
and 2D and 3D face recognition methods. For example
the observer does not need to cooperate in this method. In addition
the method can be performed at a relatively long distance and at a relatively lower image quality. Moreover
a person's gait is difficult to camouflage and hide. Therefore
gait recognition has become a research hotspot in recent years
and it is widely used in security
anti-terrorism
and medical applications
such as personal identification
treatment
and rehabilitation of abnormal leg and foot diseases. This paper proposes a novel gait human semantic segmentation method based on RPGNet (Region of Interest + Parts of Body Semantics + GaitNet) network to solve the problems of contour loss
human shadow
and long computing time caused by lighting
camera angles
and obstructions when gait recognition is performed by using a surveillance video in the field of anti-terrorism and security.
Method
2
This method is divided into the R (region of interest)
P (parts of body semantics)
and GNet (GaitNet) modules according to function. The R module obtains the area of interest of the gait body
which could improve computing efficiency and reduce image noise. First
the original image is processed by using the background subtraction method and translated into a binary image. Then
the image is operated by morphological processing methods
such as expansion
corrosion
and filtering. Second
we search the connected region of the human body in the graph and frame that area with a rectangular frame. Finally
we enlarge the length and width of the rectangular frame by a quarter and clip the image. Therefore
we obtain the connected regions of interest. The main function of the P module is to annotate gait body parts semantically by using LabelMe
an open-source image annotation tool. We train the human body according to its position. The semantics of the human body is defined as six parts:head
trunk
upper arm
lower arm
thigh
and lower leg. We map the semantics of the human body parts to six RGB information one by one. Then
we use LabelMe to annotate the image semantics captured by the camera
which generates the structure file of the image semantics annotation in XML format. Finally
the XML file and the original RGB image are imported into MATLAB to generate a human body part semantic annotation map. The GNet module designs a detailed semantic segmentation network model of the gait body. In the light of existing ResNet and RefineNet network models
we use ResNet model for reference to extract the high-level and the low-level semantics of the gait human body. The RefineNet network model is used to integrate low-level semantics with high-level semantics. Multi-resolution images generate fine low-level semantic feature maps and rough high-level semantic feature maps through residual network convolution units. Then
the feature maps are input into the fusion unit of multi-resolution feature map to generate the fused feature maps. Afterwards
the chained residual pools the fused feature maps to generate the fused pooled feature maps. Furthermore
the pooled feature maps of multi-resolution fusion are processed by output convolution. Thus
we obtain the semantically segmented feature maps. Finally
we use the softmax classifier to output the final gait semantics segmentation image by using bilinear interpolation. Through many experiments
we find that when the resolution is 1/8
1/16
1/32
and 1/64 of the original image
the semantics segmentation effect of the gait human body is better than that in other situations.
Result
2
A test conducted on 1 380 images from the gait database shows that the proposed RPGNet method has a higher segmentation accuracy in local and global information processing compared with six human contour segmentation methods
especially at viewing angles of 0°
45°
and 90°. In this study
we define the formula of segmentation accuracy
$$ρ$$
and experience shows that the accuracy of human gait segmentation is positively correlated with the rate of gait recognition. After a series of experiments
the RPGNet image semantics segmentation algorithm under the segmentation accuracy
$$ρ$$
whether at viewing angles of 0°
45°
or 90°
shows a high segmentation accuracy. Experiments on human segmentation under multi-person
hat-wearing
and occlusion conditions show that the RPGNet-based segmentation algorithm has a good grasp of global and local segmentation
high segmentation precision
and high contour integrity. The RPGNet algorithm can process eight frames of pictures per second
which could meet the real-time performance requirements of gait recognition.
Conclusion
2
The proposed gait semantic segmentation method can not only solve the problem of missing contours and human shadow caused by multi-covariates in outdoor conditions but also deal with the problem of contour difficult segmentation in conditions of outdoor multi-person segmentation
hat wearing
and occlusion. The use of the RPGNet-based human semantic segmentation method can improve the recognition rate of a gait recognition system
as indicated by an experiment on the relationship between recognition rate and segmentation accuracy. Simulations and analyses prove that the proposed RPGNet method shows improved human segmentation effect and high gait recognition rate in the conditions of multi-person scenes
people wearing a hat
and shielding. The training model of the image semantics segmentation algorithm is based on the deep learning model and the use of GPU to accelerate training. The training cost is higher than that of traditional machine learning methods
and the segmentation process is slower than that of traditional machine learning methods such as background subtraction. Further work will minimize the depth and complexity of the network model to improve the speed of training and testing.
Iwama H, Muramatsu D, Makihara Y, et al. Gait-based person-verification system for forensics[C]//Proceedings of 2012 IEEE Fifth International Conference on Biometrics: Theory, Applications and Systems. Arlington, VA, USA: IEEE, 2012: 113-120.[ DOI: 10.1109/BTAS.2012.6374565 http://dx.doi.org/10.1109/BTAS.2012.6374565 ]
Tang J, Luo J, Tjahjadi T, et al. Robust arbitrary-view gait recognition based on 3D partial similarity matching[J]. IEEE Transactions on Image Processing, 2017, 26(1):7-22.[DOI:10.1109/TIP.2016.2612823]
Luo J, Tang J, Tjahjadi T, et al. Robust arbitrary view gait recognition based on parametric 3D human body reconstruction and virtual posture synthesis[J]. Pattern Recognition, 2016, 60:361-377.[DOI:10.1016/j.patcog.2016.05.030]
Bouchrika I, Goffredo M, Carter J, et al. On using gait in forensic biometrics[J]. Journal of Forensic Sciences, 2011, 56(4):882-889.[DOI:10.1111/j.1556-4029.2011.01793.x]
Han J, Bhanu B. Individual recognition using gait energy image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(2):316-322.[DOI:10.1109/TPAMI.2006.38]
Lam T H W, Lee R S T. A new representation for human gait recognition: motion silhouettes image (MSI)[C]//Proceedings of 2006 International Conference on Biometrics. Hong Kong, China: Springer, 2006: 612-618.[ DOI: 10.1007/11608288_81 http://dx.doi.org/10.1007/11608288_81 ]
Lam T H W, Cheung K H, Liu J N K. Gait flow image:a silhouette-based gait representation for human identification[J]. Pattern Recognition, 2011, 44(4):973-987.[DOI:10.1016/j.patcog.2010.10.011]
Wang C, Zhang J P, Wang L, et al. Human identification using temporal information preserving gait template[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11):2164-2176.[DOI:10.1109/TPAMI.2011.260]
Chen C H, Liang J M, Zhao H, et al. Frame difference energy image for gait recognition with incomplete silhouettes[J]. Pattern Recognition Letters, 2009, 30(11):977-984.[DOI:10.1016/j.patrec.2009.04.012]
Sivapalan S, Chen D, Denman S, et al. Gait energy volumes and frontal gait recognition using depth images[C]//Proceedings of 2011 International Joint Conference on Biometrics. Washington, DC, USA: IEEE, 2011: 1-6.[ DOI: 10.1109/IJCB.2011.6117504 http://dx.doi.org/10.1109/IJCB.2011.6117504 ]
Hofmann M, Bachmann S, Rigoll G. 2.5D gait biometrics using the depth gradient histogram energy image[C]//Proceedings of the 5th IEEE International Conference on Biometrics: Theory, Applications and Systems. Arlington, VA: IEEE, 2012: 399-403.[ DOI: 10.1109/BTAS.2012.6374606 http://dx.doi.org/10.1109/BTAS.2012.6374606 ]
Tang J, Luo J, Tjahjadi T, et al. 2.5D multi-view gait recognition based on point cloud registration[J]. Sensors, 2014, 14(4):6124-6143.[DOI:10.3390/s140406124]
Du Y, Wang Y. Generating virtual training samples for sparse representation of face images and face recognition[J]. Journal of Modern Optics, 2016, 63(6):536-544.[DOI:10.1080/09500340.2015.1083131]
Li D C, Wu C S, Tsai T I, et al. Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge[J]. Computers&Operations Research, 2007, 34(4):966-982.[DOI:10.1016/j.cor.2005.05.019]
Wang K J, Yan T, Lv Z W, et al. Kernel sparsity preserving projections and its application to gait recognition[J]. Journal of Image and Graphics, 2013, 18(3):257-263.
王科俊, 阎涛, 吕卓纹, 等.核稀疏保留投影及在步态识别中的应用[J].中国图象图形学报, 2013, 18(3):257-263. [DOI:10.11834/jig.20130302]
Huang L, Yang Y, Wang Q J, et al. Indoor scene segmentation based on fully convolutional neural networks[J]. Journal of Image and Graphics, 2019, 24(1):64-72.
黄龙, 杨媛, 王庆军, 等.结合全卷积神经网络的室内场景分割[J].中国图象图形学报, 2019, 24(1):64-72. [DOI:10.11834/jig.180364]
Peng S, Jiang R X. Real-time motion detection algorithm for high definition video surveillance system[J]. Computer Engineering, 2014, 40(11):288-291, 296.
彭爽, 蒋荣欣.面向高清视频监控系统的实时运动检测算法[J].计算机工程, 2014, 40(11):288-291, 296. [DOI:10.3969/j.issn.1000-3428.2014.11.057]
Huo D H, Yang D, Zhang X H, et al. Principal component analysis based Codebook background modeling algorithm[J]. Acta Automatica Sinica, 2012, 38(4):591-600.
霍东海, 杨丹, 张小洪, 等.一种基于主成分分析的Codebook背景建模算法[J].自动化学报, 2012, 38(4):591-600. [DOI:10.3724/SP.J.1004.2012.00591]
Yang W H, Li X M. Single Gaussian model for background using block-based gradient and linear prediction[J]. Journal of Computer Applications, 2016, 36(5):1383-1386.
杨文浩, 李小曼.融合子块梯度与线性预测的单高斯背景建模[J].计算机应用, 2016, 36(5):1383-1386. [DOI:10.11772/j.issn.1001-9081.2016.05.1383]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Maninis K K, Pont-Tuset J, Arbeláez P, et al. Convolutional oriented boundaries[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 580-596.[ DOI: 10.1007/978-3-319-46448-0_35 http://dx.doi.org/10.1007/978-3-319-46448-0_35 ]
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[EB/OL][2018-10-03] . https://arxiv.org/pdf/511.07122.pdf https://arxiv.org/pdf/511.07122.pdf .
Lin G S, Milan A, Shen C H, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation[EB/OL].[2018-10-03] . https://arxiv.org/pdf/1611.06612.pdf https://arxiv.org/pdf/1611.06612.pdf .
相关作者
相关机构
京公网安备11010802024621