阳强, 罗坚, 黄宇琛(湖南师范大学信息科学与工程学院)
目的 当前基于视觉的步态识别方法，多基于完整的步态序列图像。然而，现实场景拍摄下的行人难免被遮挡，以至于获取的步态图像不完整，对识别结果有很大影响。如何处理大面积遮挡是步态识别中一个具有挑战性且重要的问题。针对此，提出了一种步态时空序列重建网络（gait spatio-temporal reconstruction network, GSTRNet）用于修复被遮挡的步态序列图像。方法 使用基于3D卷积神经网络和Transformer的GSTRNet来修复步态序列，在修复每一帧步态图像的空间信息的同时保持帧与帧之间的时空连贯性。GSTRNet通过引入YOLOv5（you only look once）网络来检测步态图像的局部遮挡区域，并将其作为先验知识为遮挡修复区域分配更高的修复权值，实现遮挡区域的局部修复，将局部修复步态图与原始遮挡图像进行融合，生成完整的修复步态图。同时，在GSTRNet中引入三元组特征损失和重建损失组成的联合损失函数来优化修复网络，提升修复效果。最终，以修复完整的步态序列图像为特征进行身份识别。结果 本文在大规模步态数据集OU_MVLP（the OU-ISIR gait database, multi-view large population dataset）中人工合成遮挡步态序列数据来进行修复实验，结果表明，该方法在面对步态轮廓大面积遮挡时，识别准确率比现有的步态修复和遮挡识别方法有一定的提升，如在未知遮挡模式时比sVideoWGAN-hinge（三元组视频生成对抗网络）最高提升达6.7%，非单一模式遮挡时比Gaitset等方法识别率提高40%左右。结论 本文提出的GSTRNet对各种遮挡模式下的步态图像序列有较好的修复效果，使用修复后图像进行步态识别，可有效改善识别率。
Gait image spatio-temporal restoration network and its application under occlusion conditions
Yang Qiang, Luo Jian, Huang Yuchen(School of Information Science and Engineering, Hunan Normal University)
Objective Gait recognition is a kind of identity recognition method based on human walking mode, which has been widely used in the field of video surveillance and public security. Compared with the face, fingerprint, and other biometric features, it has the advantages of long-distance recognition without the need for participant cooperation and the difficulty of camouflaging and hiding. At present, there are gait recognition algorithms based on vision and deep learning, most of which use gait sequences without occlusion to form gait features for recognition, however, in reality, people under the monitoring of various public places are inevitably blocked, so the gait sequences obtained are usually occluded. The occlusion sequence has a great impact on gait recognition, such as the inability to obtain accurate gait periods from the sequence, and the lack of gait spatio-temporal information is also more serious, which will lead to a significant reduction in recognition performance. The existing occlusion gait processing algorithms are divided into two kinds, one is to directly extract the features of occlusion robustness to identify from the occlusion sequence, it often needs to know the gait period in advance, but it is difficult in the occlusion gait sequence, and the other algorithms are to identify by reconstructing the gait silhouette or repairing the gait features, but the existing algorithms often have poor performance when the occlusion area is large or the entire sequence is occluded. Method A Gait Spatio-Temporal Reconstruction Network (GSTRNet), which consists of the occlusion detection network (YOLO, You Only Look Once), the spatio-temporal codec network, and the feature extraction network (Gaitset), based on prior knowledge is proposed to repair occluded gait sequences. GSTRNet uses YOLOv5 to detect the occlusion region in sequence (assigning the occlusion area to 1 and the non-occlusion area to 0) as a piece of prior knowledge to assign a higher weight to the loss of the occlusion area. Spatio-temporal codec network consists of 3D Convolutional Neural Networks(3DCNN) and transformers, 3DCNN can repair the spatial information of each gait image while maintaining the time coherence between frames; The encoder uses 3DCNN with a stride of 2 to reduce the dimension of the data so that each element can participate in the current sampling and more detailed information can be retained; The decoder uses skip connection to stitch together features in the encoder to further reduce the loss of detail due to encoder downsampling. To ensure the time and space consistency of the entire repair sequence, multiple transformers composed of multi-scale self-attention module are added between the encoder and decoder, extracting useful information from the global and local scope to repair the gait sequence. Since the 3DCNN is a global repair, the non-occlusion region data in the repaired gait sequence has also changed, and GSTRNet uses prior knowledge to take only the occlusion region repair result from the decoder output, and then adds it to the original sequence as the output of the network. The Gaitset network was also introduced to extract features from three sequences as triplet losses to maintain feature consistency between the repair sequence and the original sequence, namely occlusion sequences, genuine sequences (other non-occlusion sequences with the same identity as occlusion sequences), and imposter sequences (sequences that have different identities from occlusion sequences). In OU_MVLP (The OU-ISIR gait database, Multi-View Large Population Dataset), we synthesized 24 occlusion gait sequences as experimental data by simulating various occlusion types in real life and evaluated our algorithm using three sets of experiments. The three sets of experiments were that the occlusion mode was known and the gallery and probe occlusion modes were consistent, the occlusion mode was known but the gallery and probe occlusion modes were inconsistent, and the occlusion mode was unknown. Result The results show that the proposed algorithm has better performance than the existing occlusion sequence repair algorithms. Compared with other repair algorithms, the rank1 recognition rate of our algorithm in single occlusion mode and non-single occlusion mode when occlusion mode is known is 4.1% and 4.1% higher respectively and has a maximum recognition accuracy improvement of 6.7% in the case of large area occlusion of the gait silhouette when the occlusion mode is unknown. Compared with gait recognition algorithms such as 3DLocal (3D Local Convolutional Neural Networks), the recognition rate in non-single occlusion mode has a maximum increase of about 50%. Conclusion The proposed GSTRNet model has a good effect on the repair of gait sequences to varying degrees in various occlusion modes, and has a strong feasibility in reality.