许文正1, 黄天欢1, 贲晛烨1, 曾翌2, 张军平2(1.山东大学信息科学与工程学院, 青岛 266237;2.复旦大学计算机科学技术学院, 上海 200437)
步态识别具有对图像分辨率要求低、可远距离识别、无需受试者合作、难以隐藏或伪装等优势,在安防监控和调查取证等领域有着广阔的应用前景。然而在实际应用中,步态识别的性能常受到视角、着装、携物和遮挡等协变量的影响,其中视角变化最为普遍,并且会使行人的外观发生显著改变。因此,提高步态识别对视角的鲁棒性一直是该领域的研究热点。为了全面认识现有的跨视角步态识别方法,本文对相关研究工作进行了梳理和综述。首先,从基本概念、数据采集方式和发展历程等角度简要介绍了该领域的研究背景,在此基础上,整理并分析了基于视频的主流跨视角步态数据库;然后,从基于3维步态信息的识别方法、基于视角转换模型的识别方法、基于视角不变特征的识别方法和基于深度学习的识别方法4个方面详细介绍了跨视角步态识别方法。最后,在CASIA-B (CASIA gait database,dataset B)、OU-ISIR LP (OU-ISIR gait database,large population dataset)和OU-MVLP (OU-ISIR gait database,multi-view large population dataset)3个数据库上对该领域代表性方法的性能进行了对比分析,并指出跨视角步态识别的未来研究方向。
Cross-view gait recognition: a review
Xu Wenzheng1, Huang Tianhuan1, Ben Xianye1, Zeng Yi2, Zhang Junping2(1.School of Information Science and Engineering, Shandong University, Qingdao 266237, China;2.School of Computer Science, Fudan University, Shanghai 200437, China)
Gait recognition is inter-related to pedestrians’identity. Pedestrians’gait recognition can be focused on at a distance and it cannot require special acquisition equipment，high image resolution，or explicit cooperation from the person in comparison with recognition methods relevant to the features of face，fingerprint，iris and other biometrics. Moreover，one’ s gait is difficult to be hidden or disguised. Gait recognition has a wide range of applications in public surveillance，forensic collection，and daily attendance. In these practical applications，the performance of gait recognition is easily affected by covariates such as viewpoint variations，occlusions，and segmentation error，among which viewpoint variations are one of the main factors affecting the gait recognition performance. The intra-class differences of different viewpoints are often greater than the inter-class differences of the same viewpoint. Therefore，improving the robustness of cross-view gait recognition has become a hot topic. A review of existing cross-view gait recognition methods is critical analyzed. First，current situation is introduced in related to basic concepts，data acquisition methods，application scenarios，and its growing paths. Then，we review video-based cross-view gait recognition methods further. Cross-view gait databases are analyzed in the context of 1）data type，2）sample size，3）viewpoint number，4）acquisition environment，5）other related covariates，and 6）the characteristics of these databases in detail. Then，cross-view gait classification methods are presented in detail. Unlike most existing reviews that classify gait recognition methods by the basic steps such as data acquisition，feature representation，and classification，we focus on cross-view recognition problems. Specifically，four cross-view gait recognition methods are analyzed on the basis of feature representation and classification（i. e. ，3D gait information construction，view transformation model（VTM），view-invariant feature extraction，and the deep learning-based methods）. For 3D gait information methods，gait information is extracted from multi-view gait videos and it is used to construct 3D gait models. These methods have good robustness to large view changes，but they often require：complex configurations，expensive highresolution multi-camera systems，and frame synchronization. All of them limit their application to real surveillance scenarios. For VTM methods，singular value decomposition（SVD）and regression-derived view transformation models are introduced to local and global features. The discriminative analysis can be ignored although the VTM may minimize the error between the transformed gait features and the original gait features. For view-invariant feature extraction methods，1） manual feature extraction，2）discriminative subspace learning，and 3）metric learning are compared. Among the discriminative subspace learning methods，the canonical correlation analysis（CCA）based methods are highlighted. Despite the advantages of these methods，it is still challenged to sort robust view-invariant subspace or metric for features out. Deep learning based methods for cross-view recognition is mainly composed of convolution neural network（CNN），recurrent neural network（RNN），auto encoder（AE），generative adversarial network（GAN），3D convolutional neural network（3D CNN），and graph convolutional network（GCN）. To summary the potentials of multiple cross-view gait recognition methods，some representative state-of-the-art methods are compared and analyzed further on CASIA-B（CASIA gait database， dataset B），OU-ISIR LP （OU-ISIR gait database，large population dataset）and OU-MVLP （OU-ISIR gait database multiview large population dataset）databases. It is found that the methods using 3D CNN or multiple neural network architectures，which represent gait features with a sequence of silhouettes，achieve good performance. Additionally，deep neural network methods based on body model representation also show excellent performance under the condition with only view variations. Finally，future research directions are predicted for cross-view gait recognition，including 1）the establishment of large-scale gait databases containing complex covariates，2）cross-database gait recognition，3）self-supervised learning methods for gait features，4）disentangled representation learning methods for gait features，5）further developing modelbased gait representation methods，6）exploring new methods for temporal feature extraction，7）multimodal fusion gait recognition，and 8）improving the security of gait recognition systems.