双分支特征融合网络的步态识别算法

徐硕; 郑锋; 唐俊; 鲍文霞

doi:10.11834/jig.200730

图像分析和识别 | 浏览量 : 0 下载量: 101 CSCD: 5

PDF
导出
分享
收藏
专辑

双分支特征融合网络的步态识别算法
Dual branch feature fusion network based gait recognition algorithm
2022年27卷第7期页码：2263-2273
收稿日期：2020-12-14，

修回日期：2021-04-13，

录用日期：2021-4-20，

纸质出版日期：2022-07-16
DOI： 10.11834/jig.200730
稿件说明：

移动端阅览

徐硕, 郑锋, 唐俊, 鲍文霞. 双分支特征融合网络的步态识别算法[J]. 中国图象图形学报, 2022,27(7):2263-2273. DOI： 10.11834/jig.200730.

Shuo Xu, Feng Zheng, Jun Tang, Wenxia Bao. Dual branch feature fusion network based gait recognition algorithm[J]. Journal of image and graphics, 2022, 27(7): 2263-2273. DOI： 10.11834/jig.200730.

摘要

目的

在步态识别算法中，基于外观的方法准确率高且易于实施，但对外观变化敏感；基于模型的方法对外观变化更加鲁棒，但建模困难且准确率较低。为了使步态识别算法在获得高准确率的同时对外观变化具有更好的鲁棒性，提出了一种双分支网络融合外观特征和姿态特征，以结合两种方法的优点。

方法

双分支网络模型包含外观和姿态两条分支，外观分支采用GaitSet网络从轮廓图像中提取外观特征；姿态分支采用5层卷积网络从姿态骨架中提取姿态特征。在此基础上构建特征融合模块，融合外观特征和姿态特征，并引入通道注意力机制实现任意尺寸的特征融合，设计的模块结构使其能够在融合过程中抑制特征中的噪声。最后将融合后的步态特征应用于识别行人身份。

结果

实验在CASIA-B（Institute of Automation，Chinese Academy of Sciences，Gait Dataset B）数据集上通过跨视角和不同行走状态两种实验设置与目前主流的步态识别算法进行对比，并以Rank-1准确率作为评价指标。在跨视角实验设置的MT（medium-sample training）划分中，该算法在3种行走状态下的准确率分别为93.4%、84.8%和70.9%，相比性能第2的算法分别提升了1.4%、0.5%和8.4%；在不同行走状态实验设置中，该算法在两种行走状态下的准确率分别为94.9%和90.0%，获得了最佳性能。

结论

在能够同时获取外观数据和姿态数据的场景下，该算法能够有效地融合外观信息和姿态信息，在获得更丰富的步态特征的同时降低了外观变化对步态特征的影响，提高了步态识别的性能。

Abstract

Objective

Gait is a kind of human walking pattern

which is one of the key biometric features for person identification. As a non-contact and long-distance recognition way to capture human identity information

gait recognition has been developed in video surveillance and public security. Gait recognition algorithms can be segmented into two mainstreams like appearance-based methods and the model-based methods. The appearance-based methods extract gait from a sequence of silhouette images in common. However

the appearance-based methods are basically affected by appearance changes like non-rigid clothing deformation and background clutters. Different from the appearance-based methods

the model-based methods commonly leverage body structure or motion prior to model gait pattern and more robust to appearance variations. Actually

it is challenged to identify a universal model for gait description

and the previous pre-defined models can be constrained in certain scenarios. Recent model-based methods are focused on deep learning-based pose estimation to model key-points of human body. But the estimated pose model constrains the redundant noises in subject to pose estimators and occlusion. In summary

the appearance-based methods are based visual features description while the model-based methods tend to describe a semantic level-based motion and structure. We aim to design a novel approach for gait recognition beyond the existed two methods mentioned above and improve gait recognition ability via the added appearance features and pose features.

Method

we design a dual-branch network for gait recognition. The input data are fed into a dual-branch network to extract appearance features and pose features each. Then

the two kinds of features are merged into the final gait features in the context of feature fusion module. In detail

we adopt an optimal network GaitSet as the appearance branch to extract appearance features from silhouette images and design a two-stream convolutional neural network (CNN) to extract pose features from pose key-points based on the position information and motion information. Meanwhile

a squeeze-and-excitation feature fusion module (SEFM) is designed to merge two kinds of features via the weights of two kinds of features learning. In the squeeze step

appearance feature maps and pose feature maps are integrated via pooling

concatenation

and projection. In the excitation step

we obtain the weighted feature maps of appearance and pose via projection and Hadamard product. The two kinds of feature maps are down-sampled and concatenated into the final gait feature in accordance with adaptive weighting. To verify the appearance features and pose features

we design two variants of SEFM in related to SEFM-A and SEFM-P further. The SEFM module merges appearance features and pose features in mutual; the SEFM-A module merges pose features into appearance features and appearance features remain unchanged; the SEFM-P module merges appearance features into pose features and no pose features changed. Our algorithm is based on Pytorch and the evaluation is carried out on database CASIA(Institute of Automation

Chinese Academy of Sciences) Gait Dataset B (CASIA-B). We adopt the AlphaPose algorithm to extract pose key-points from origin RGB videos

and use silhouette images obtained. In each iteration of the training process

we randomly select 16 subjects and select 8 random samples of each subject further. Every sample of them contains a sub-sequence of 30 frames. Consequently

each batch has 3 840 image-skeleton pairs. We adopt the Adam optimizer to optimize the network for 60 000 iterations. The initial learning rate is set to 0.000 2 for the pose branch

and 0.000 1 for the appearance branch and the SEFM

and then the learning rate is cut10 times at the 45 000-th iteration.

Result

We first verify the effectiveness of the dual-branch network and feature fusion modules. Our demonstration illustrates that our dual-branch network can enhance performance and there is a clear complementary effect between appearance features and pose features. The Rank-1 accuracies of five feature fusion modules like SEFM

SEFM-A

SEFM-P

Concatenation

and multi-modal transfer module (MMTM) are 83.5%

81.9%

93.4%

92.6% and 79.5%

respectively. These results demonstrate that appearance features are more discriminative because there are noises existed in pose features. Our SEFM-P is capable to merge two features in the feature fusion procedure via noises suppression. Then

we compare our methods to advanced gait recognition methods like CNNs

event-based gait recognition(EV-Gait)

GaitSet

and PoseGait. We conduct the experiments with two protocols and evaluate the rank-1 accuracy of three walking scenarios in the context of normal walking

bag-carrying

and coat-wearing. Our method archives the best performance in all experimental protocols. Our three scenarios-based rank-1 accuracies are reached 93.4%

84.8%

and 70.9% in protocol 1. The results of protocol 2 are obtained by 95.7%

87.8%

77.0%

respectively. Comparing to the second-best method of GaitSet

the rank-1 accuracies in the context of coat-wearing walking scenario are improved by 8.4% and 6.6%.

Conclusion

We harness a novel gait recognition network based on the fusions of appearance features and pose features. Our analyzed results demonstrated that our method can develop two kinds of features and the appearance variations is more robust

especially for clothing changes scenario.

关键词

Keywords

references

Ben X, Gong C, Zhang P, Jia X T, Wu Q and Meng W X. 2019a. Coupled patch alignment for matching cross-view gaits. IEEE Transactions on Image Processing, 28(6): 3142-3157[DOI: 10.1109/TIP.2019.2894362]

Ben X, Gong C, Zhang P, Yan R, Wu Q and Meng W X. 2020. Coupled bilinear discriminant projection for cross-view gait recognition. IEEE Transactions on Circuits and Systems for Video Technology, 30(3): 734-747[DOI: 10.1109/TCSVT.2019.2893736]

Ben X, Zhang P, Lai Z H, Yan R, Zhai X L and Meng W X. 2019b. A general tensor representation framework for cross-view gait recognition. Pattern Recognition, 90: 87-98[DOI: 10.1016/j.patcog.2019.01.017]

Ben X Y, Xu S and Wang K J. 2012. Review on pedestrian gait feature expression and recognition. Pattern Recognition and Artificial Intelligence, 25(1): 71-81

贲晛烨, 徐森, 王科俊. 2012. 行人步态的特征表达及识别综述. 模式识别与人工智能, 25(1): 71-81 [DOI: 10.16451/j.cnki.issn1003-6059.2012.01.010]

Chao H Q, He Y W, Zhang J P and Feng J F. 2019. GaitSet: regarding gait as a set for cross-view gait recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 8126-8133[DOI: 10.1609/aaai.v33i01.33018126]

Chao H Q, Wang K, He Y W, Zhang J P and Feng J F. 2021. GaitSet: cross-view gait recognition through utilizing gait as a deep set. IEEE Transactions on Pattern Analysis and Machine Intelligence: #3057879[ DOI: 10.1109/TPAMI.2021.3057879 http://dx.doi.org/10.1109/TPAMI.2021.3057879 ]

Fang H S, Xie S Q, Tai Y W and Lu C W. 2017. RMPE: regional multi-person pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2353-2362[ DOI: 10.1109/ICCV.2017.256 http://dx.doi.org/10.1109/ICCV.2017.256 ]

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI: 10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]

Kastaniotis D, Theodorakopoulos I, Theoharatos C, Economou G and Fotopoulos S. 2015. A framework for gait-based recognition using Kinect. Pattern Recognition Letters, 68: 327-335[DOI: 10.1016/j.patrec.2015.06.020]

Li C, Zhong Q Y, Xie D and Pu S L. 2018. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization: 786-792[ DOI: 10.24963/ijcai.2018/109 http://dx.doi.org/10.24963/ijcai.2018/109 ]

Li N, Zhao X B and Ma C. 2020. JointsGait: a model-based gait recognition method based on gait graph convolutional networks and joints relationship pyramid mapping[EB/OL]. [2020-12-09] . https://arxiv.org/pdf/2005.08625.pdf https://arxiv.org/pdf/2005.08625.pdf

Liao R J, Cao C S, Garcia E B, Yu S Q and Huang Y Z. 2017. Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing variations//Proceedings of the 12th Chinese Conference on Biometric Recognition. Shenzhen, China: Springer: 474-483[ DOI: 10.1007/978-3-319-69923-3_51 http://dx.doi.org/10.1007/978-3-319-69923-3_51 ]

Liao R J, Yu S Q, An W Z and Huang Y Z. 2020. A model-based gait recognition method with body pose and human prior knowledge. Pattern Recognition, 98: #107069[DOI: 10.1016/j.patcog.2019.107069]

Sadeghzadehyazdi N, Batabyal T, Glandon A, Dhar N K, Familoni B O, Iftekharuddin K M and Acton S T. 2019. Glidar3DJ: a view-invariant gait identification via flash lidar data correction//Proceedings of 2019 IEEE International Conference on Image Processing (ICIP). Taipei, China: IEEE: 2606-2610[ DOI: 10.1109/ICIP.2019.8803237 http://dx.doi.org/10.1109/ICIP.2019.8803237 ]

Vaezi Joze H R, Shaban A, Iuzzolino M L and Koishida K. 2020. MMTM: multimodal transfer module for CNN fusion//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 13286-13296[ DOI: 10.1109/CVPR42600.2020.01330 http://dx.doi.org/10.1109/CVPR42600.2020.01330 ]

Wang Y X, Du B W, Shen Y R, Wu K, Zhao G R, Sun J G and Wen H K. 2019. EV-Gait: event-based robust gait recognition using dynamic vision sensors//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 6351-6360[ DOI: 10.1109/CVPR.2019.00652 http://dx.doi.org/10.1109/CVPR.2019.00652 ]

Wolf T, Babaee M and Rigoll G. 2016. Multi-view gait recognition using 3D convolutional neural networks//Proceedings of 2016 IEEE International Conference on Image Processing (ICIP). Phoenix, USA: IEEE: 4165-4169[ DOI: 10.1109/ICIP.2016.7533144 http://dx.doi.org/10.1109/ICIP.2016.7533144 ]

Wu Z F, Huang Y Z, Wang L, Wang X G and Tan T N. 2017. A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2): 209-226[DOI: 10.1109/TPAMI.2016.2545669]

Yu S Q, Liao R J, An W Z, Chen H F, García E B, Huang Y Z and Poh N. 2019. GaitGANv2: invariant gait feature extraction using generative adversarial networks. Pattern Recognition, 87: 179-189[DOI: 10.1016/j.patcog.2018.10.019]

Yu S Q, Tan D L and Tan T N. 2006. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition//Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06). Hong Kong, China: IEEE: 441-444[ DOI: 10.1109/ICPR.2006.67 http://dx.doi.org/10.1109/ICPR.2006.67 ]

Zhang K H, Luo W H, Ma L, Liu W and Li H D. 2019a. Learning joint gait representation via quintuplet loss minimization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4695-4704[ DOI: 10.1109/CVPR.2019.00483 http://dx.doi.org/10.1109/CVPR.2019.00483 ]

Zhang Z Y, Tran L, Yin X, Atoum Y, Liu X M, Wan J and Wang N X. 2019b. Gait recognition via disentangled representation learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4705-4714[ DOI: 10.1109/CVPR.2019.00484 http://dx.doi.org/10.1109/CVPR.2019.00484 ]

文章被引用时，请邮件提醒。

提交