异常步态3维人体建模和可变视角识别

罗坚; 黎梦霞; 罗诗光

发布时间： 2020-08-07
摘要点击次数： 1786
全文下载次数： 437
DOI: 10.11834/jig.190497
2020 | Volume 25 | Number 8

异常步态3维人体建模和可变视角识别

罗坚, 黎梦霞, 罗诗光(湖南师范大学信息科学与工程学院, 长沙 410000)

摘要

目的运用视觉和机器学习方法对步态进行研究已成为当前热点，但多集中在身份识别领域。本文从不同的视角对其进行研究，探讨一种基于点云数据和人体语义特征模型的异常步态3维人体建模和可变视角识别方法。方法运用非刚性变形和蒙皮方法，构建基于形体和姿态语义特征的参数化3维人体模型；以红外结构光传感器获取的人体异常步态点云数据为观测目标，构建其对应形体和姿态特征的3维人体模型。通过ConvGRU（convolution gated necurrent unit）卷积循环神经网络来提取其投影深度图像的时空特征，并将样本划分为正样本、负样本和自身样本三元组，对异常步态分类器进行训练，以提高分类器对细小差异的鉴别能力。同时对异常步态数据获取难度大和训练视角少的问题，提出了一种基于形体、姿态和视角变换的训练样本扩充方法，以提高模型在面对视角变化时的泛化能力。结果使用CSU（Central South University）3维异常步态数据库和DHA（depth-included human action video）深度人体行为数据库进行实验，并对比了不同异常步态或行为识别方法的效果。结果表明，本文方法在CSU异常步态库实验中，0°、45°和90°视角下对异常步态的综合检测识别率达到了96.6%，特别是在90°到0°交叉和变换视角实验中，比使用DMHI（difference motion history image）和DMM-CNN（depth motion map-convolutional neural network）等步态动作特征要高出25%以上。在DHA深度人体运动数据库实验中，本文方法识别率接近98%，比DMM等相关算法高出2%~3%。结论提出的3维异常步态识别方法综合了3维人体先验知识、循环卷积网络的时空特性和虚拟视角样本合成方法的优点，不仅能提高异常步态在面对视角变换时的识别准确性，同时也为3维异常步态检测和识别提供一种新思路。

关键词

机器视觉人体识别异常步态3维建模虚拟样本合成卷积循环神经网络

Parametric 3D body modeling and view-invariant abnormal gait recognition

Luo Jian, Li Mengxia, Luo Shiguang(College of Information Science and Engineering, Hunan Normal University, Changsha 410000, China)

Abstract

Objective Gait has become a popular research topic that is currently investigated by using visual and machine learning methods. However, most of these studies are concentrated in the field of human identification and use 2D RGB images. In contrast to these studies, this paper investigates abnormal gait recognition by using 3D data. A method based on 3D point cloud data and the semantic body model is then proposed for view-invariant abnormal gait recognition. Compared with traditional 2D abnormal gait recognition approaches, the proposed 3D-based method can easily deal with many obstacles in abnormal gait modelling and recognition processes, including view-invariant problems and interference from external items. Method The point cloud data of human gait are obtained by using an infrared structured light sensor, which is a 3D depth camera that uses a structure projector and reflecting light receiver to gain the depth information of an object and calculate its point cloud data. Although the point cloud data of the human body are also in 3D, they are generally unstructured, thereby influencing the 3D representation of the human body and posture. To deal with this problem, a 3D parametric human body learned from the 3D body dataset by using a statistic method is introduced in this paper. The parameterized human body model refers to the description and construction of the corresponding visual human body mesh through abstract high-order semantic features, such as height, weight, age, gender, and skeletal joints. The parameters are determined by using statistical learning methods. The human body is embedded into the model, and the 3D parametric model can be deformed both in shapes and poses. Unlike traditional methods that directly model the 3D body from point cloud data via the point cloud reduction algorithm and triangle mesh grid method, the related 3D parameterized body model is deformed to fit the point cloud data in both shape and posture. The standard 3D human model proposed in this paper is constructed based on the body shape PCA (principal component analysis) analysis and skin method. An observation function that measures the similarity of the deformed 3D model with the raw point cloud data of the human body is also introduced. An accurate deformation of the 3D body is ensured by iteratively minimizing the observation function. After the 3D model estimation process, the features of the raw point cloud data of the human body are converted into a high-level structured representation of the human body. This process not only abstracts the unstructured data to a high-order semantic description but also effectively reduces the dimensionality of the original data. After 3D modelling and structured feature representation, a convolution gated recurrent unit (ConvGRU) recurrent neural network is applied to extract the temporal-spatial features of the projected depth gait images. ConvGRU has the advantages of both convolutional and recurrent neural networks, the latter of which is based on the gate structure. The tow gates (i.e., reset and update gates) help the model memorize useful information and forget useless data. In the final classification process, the samples are divided into positive, negative, and anchor samples. The anchor sample is the sample itself, the positive samples are same-category samples that belong to different objects, and the negative samples are those that belong to opposite categories. Training the classifier by using the triples elements strategy can improve its ability to discriminate small feature differences of different categories. At the same time, a virtual 3D sample synthesizing method based on body, pose, and view deformation is proposed to deal with the data shortage problem of abnormal gait. Compared with normal gait datasets, abnormal gait data, especially 3D abnormal datasets, are rare and difficult to obtain. Moreover, given the limited amount of ground truth data, most of the abnormal data are imitated by the experimental participates. As a result, the virtual synthesizing method can help extend the training data and improve the generalization ability of the abnormal gait classification model. Result Experiments were performed by using the CSU(Central South University) abnormal 3D gait database and the depth-included human action video (DHA) dataset, and different abnormal gait or action recognition methods were compared with the proposed approach. In the CSU abnormal gait database, the rank-1 mean detection and recognition rate of abnormal gait is 96.6% at the 0°, 45°, and 90° views. In the 90°-0° cross view recognition experiment, the proposed method outperforms the other approaches that use DMHI(difference motion history image) or DMM-CNN(depth motion map-convolutional neural network) as feature representation by at least 25%. Meanwhile, in the DHA dataset, the proposed method result has a rank-1 mean detection and recognition rate of near 98%, which is 2% to 3% higher than that of novel approaches, including DMM based methods. Conclusion Based on the feature extraction method of the 3D parameterized human body model, abnormal gait image data can be abstracted to high-order descriptions and effectively complete the feature extraction and dimensionality reduction of the original data. ConvGRU can extract the spatial and temporal features of the abnormal gait data well. The virtual sample synthesis and triple classification methods can be combined to classify and recognize abnormal gait data from different views. The proposed method not only improves the recognition accuracy of abnormal gait under various view angles but also provides a new approach for the detection and recognition of abnormal gait.

Keywords

machine vision human recognition 3D abnormal gait modeling virtual sample generation convolutional recurrent neural network

在线采编平台

在线出版

年度会议

下载中心

年度信息