发布时间: 2018-04-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.170420
2018 | Volume 23 | Number 4

图像理解和计算机视觉

自适应骨骼中心的人体行为识别算法

冉宪宇, 刘凯, 李光, 丁文文, 陈斌

西安电子科技大学计算机学院, 西安 710071

收稿日期: 2017-07-07; 修回日期: 2017-09-14

基金项目: 国家自然科学基金面上项目（61571345，91538101，61550110247）

第一作者简介: 冉宪宇(1990-), 女, 西安电子科技大学计算机学院计算机应用技术专业硕士研究生, 主要研究方向为行为识别等。E-mail:ranxianyu@foxmail.com.

中图法分类号: TP37

文献标识码: A

文章编号: 1006-8961(2018)04-0519-07

摘要

目的基于3维骨架的行为识别研究在计算机视觉领域一直是非常活跃的主题，在监控、视频游戏、机器人、人机交互、医疗保健等领域已取得了非常多的成果。现今的行为识别算法大多选择固定关节点作为坐标中心，导致动作识别率较低，为解决动作行为识别中识别精度低的问题，提出一种自适应骨骼中心的人体行为识别的算法。方法该算法首先从骨骼数据集中获取三维骨架序列，并对其进行预处理，得到动作的原始坐标矩阵；再根据原始坐标矩阵提取特征，依据特征值的变化自适应地选择坐标中心，重新对原始坐标矩阵进行归一化；最后通过动态时间规划方法对动作坐标矩阵进行降噪处理，借助傅里叶时间金字塔表示的方法减少动作坐标矩阵时间错位和噪声问题，再使用支持向量机对动作坐标矩阵进行分类。论文使用国际上通用的数据集UTKinect-Action和MSRAction3D对算法进行验证。结果结果表明，在UTKinect-Action数据集上，该算法的行为识别率比HO3D J2算法高4.28%，比CRF算法高3.48%。在MSRAction3D数据集上，该算法比HOJ3D算法高9.57%，比Profile HMM算法高2.07%，比Eigenjoints算法高6.17%。结论本文针对现今行为识别算法的识别率低问题，探究出问题的原因是采用了固定关节坐标中心，提出了自适应骨骼中心的行为识别算法。经仿真验证，该算法能有效提高人体行为识别的精度。

关键词

人体行为识别; 骨骼序列; 特征提取; 自适应; 归一化

Human action recognition algorithm based on adaptive skeleton center

Ran Xianyu, Liu Kai, Li Guang, Ding Wenwen, Chen Bin

Computer Science and Technology, Xidian University, Xi'an 710071, China

Supported by: National Natural Science Foundation of China (61571345, 91538101, 61550110247)

Abstract

Objective Human action recognition based on 3D skeleton has been a popular topic in computer vision, the goal of which is to automatically segment, capture, and recognize human action. Human action recognition has been widely applied in real-world applications. For the past several decades, it has been used in surveillance, video games, robotics, human-human interaction, human-computer interaction, and health care, and has been widely explored by researchers since the 1960s. This study obtains 3D data in four ways. First, a motion capture system is used based on a marker. Second, multiple views are used for 2D image sequence reconstruction of 3D information. Third, range sensors are used. Fourth, RGB videos are used. However, extracting data by using a motion capture system and reconstruction is inconvenient. Range sensors are expensive and difficult to use in a human environment, and they obtain data slowly and provide a poorly estimated distance. Moreover, RGB images usually provide the appearance information of the objects in the scene. Given the limited information provided by RGB images, solving certain problems, such as the partition of the foreground and background with similar colors and textures, is difficult, if not impossible. Moreover, RGB data are highly sensitive to various factors, such as illumination, viewpoint, occlusions, clutter, or diversity of datasets. RGB video sensor data cannot capture the information that human needs. The rapid development of depth sensors, such as 3D Microsoft Kinect sensor, in recent years has provided not only color image data but also 3D depth image information. Three-dimensional depth images record the distance between object and body, thereby producing considerable information. Real-time skeletal-tracking technique and support vector machine recognize various postures and extract key information. The investigation of computer vision algorithms based on 3D skeleton algorithms has thus attracted significant attention in the last few years. Many researchers have been studying skeleton-based algorithms, which have presented numerous achievements and contributions. The present action recognition algorithm selects a fixed joint as the coordinate center, which leads to a low recognition rate. An adaptive skeleton center algorithm for human action recognition is proposed to solve the problem of low accuracy. Method In the algorithm, frames of skeleton action sequences are loaded onto a human action dataset, redundant frames are removed from the sequence frame information, and the original coordinate matrix is obtained by preprocessing the sequences. Rigid vector and joint angle features are generated by extracting the original coordinate matrix. The adaptive value can be determined on the basis of changes in rigid vector and joint angle values. The coordinate center can be adaptively selected according to the adaptive value and used to renormalize the original matrix. The action coordinate matrix is denoised by using a dynamic time-planning method. The Fourier time pyramid method is used to reduce the time displacement and noise problems of the action coordinate matrix. The matrix is classified by using support vector machine. Result Unlike existing algorithms, such as histogram of 3D joint (HO3DJ), conditional random field (CRF), EigenJoints, profile hidden Markov model (HMM), relation matrix of 3D rigid bodies+principal geodesic distance, and actionlet algorithms, the proposed algorithm exhibits improved performances on different datasets. On the UTKinect dataset, the action recognition rate of the proposed algorithm is 4.28% higher than that of the HO3DJ algorithm and 3.48% higher than that of the CRF algorithm. On the MSRAction3D dataset, the action recognition rate of the proposed algorithm is 9.57% higher than that of the HO3DJ algorithm, 2.07% higher than that of the profile HMM algorithm, and 6.17% higher than that of the EigenJoints algorithm. Action Set (AS)1, AS2, and AS3 are subsets of the MSRAction3D dataset. The action recognition rate of the proposed algorithm is not as good as that of the other algorithms on the AS2 dataset, but the action recognition rates of the proposed algorithm are high on the AS1 and AS3 datasets. Conclusion The proposed algorithm solves the low accuracy problem of the existing action recognition algorithm. The coordinate center of a fixed joint is adopted. Simulation results show that the proposed algorithm can effectively improve the accuracy of human action recognition, and its action recognition rate is higher than those of existing algorithms. On the UTKinect dataset, the recognition rate of the proposed algorithm is at least 3% higher than those of other algorithms, and the generated single-action recognition rate is as high as 90%. On the MSRAction3D dataset, the proposed algorithm shows advantages on AS1 and AS2 datasets, but its recognition rate on AS2 is not ideal, particularly in the recognition of the upper limb. Therefore, this algorithm needs improvement. The algorithm is generally efficient for single-action recognition. The next research direction is complex action recognition.

Key words

human action recognition; skeleton sequence; feature extraction; adaptive; renormalize

0 引言

在计算机视觉领域中，行为识别一直是非常活跃的研究主题，已经广泛应用于监控、视频游戏、机器人、人机交互、医疗保健等领域。由于遮挡、光线变化、视角变化以及背景干扰，行为识别的发展困难重重。随着深度传感器的发展，行为识别研究出现了机会，许多研究人员一直在研究基于3维骨架的算法。Rahmani等人^[1]提出了一种新型的鲁棒非线性知识转移模型进行人体动作识别。Lu等人^[2]提出了通过深度传感器获取深度图像得到完整的骨架信息。Xia等人^[3]使用3维关节位置直方图识别人体动作。Han等人^[4]提出了一种基于层次学习的人体动作识别方法, 包括基于分层非线性降维的特征提取和基于级联判别模型的动作建模。Yang等人^[5]提出了基于节点位置的差异方法，一种新的功能型的特征关节方法，该方法结合了静态姿势、动作信息的运动和偏移。Vemulapalli等人^[6]将一个人的骨骼关节点表示在李群上，通过使用旋转和平移明确建模各种身体部位之间的3维几何关系。Ding等人^[7]用3维刚体的关系矩阵计算相似程度强的姿势，通过对样本数据进行谱聚类建立代表性姿势，根据谱嵌入构造的全局线性特征函数生成离散符号的动作序列进行动作识别。

上述这些方法都有一个共同的特点：在建模人体动作时，对所有关节点进行归一化的过程中仅使用一个提前定义好的固定坐标中心。然而并不是所有运动关节都适用一个固定的坐标中心，对于关节运动过程与坐标中心相对变化小，就会造成识别的信息有一些误差。针对上述不足，提出了一种自适应骨骼坐标中心选择的算法。该算法根据每个动作的运动状态，在提取行为特征时参数化关节动作特征，自适应选择坐标中心，有效地提高人体动作的识别率。

本文的整体框架图如图 1所示，首先采用自适应骨骼中心的行为识别算法进行特征提取，分别以臀部和颈部为中心计算刚体关节的角速度和角加速度变量，进行比较，得出特征值，重新确定每个动作的坐标中心。然后使用动态时间规整DTW方法^[8]处理执行速率的变化，再用傅里叶时间金字塔表示FTP^[9]来去除高频系数，最后使用SVM进行分类，实现了动作的分类识别，其中图 1中阴影部分就是本文算法的核心内容，去除阴影部分就是原算法框架图^[6]。

图 1 自适应骨骼中心框架结构图

Fig. 1 Adaptive skeleton center frame structure graph

1 人体骨骼关节表示

如图 2(a)所示，定义一个刚体由一段任意相邻的两个骨骼关节点组成，那么可以将一个人体动作看做是一组骨骼关节点组成的刚体轨迹，人体的骨骼是用3维关节节点表示。如果定义一个骨骼有M个节点，那么就有M-1个刚体，人体姿势可以表示为关系矩阵

$ {\mathit{\boldsymbol{R}}_F} = {({\mathit{\boldsymbol{R}}_{i, j}})_{(\mathit{M}{\rm{-1}}) \times (\mathit{M}{\rm{-1}})}} $

(1)

图 2 人体骨骼关节表示图

Fig. 2 Human skeleton representing graph

式中，${\mathit{\boldsymbol{R}}_{i, j}}$表示的是第$i$个刚体到第$j$个刚体的相对位置关系。如图 2(b)所示，可以通过旋转和平移得到两个刚体之间的相对位置关系^[6]，同理，可以根据上述公式推出全身刚体之间的相对位置关系。根据式(1)得矩阵${\mathit{\boldsymbol{R}}_F}$表示在第$F$帧时，一个动作之间的所有刚体的相对位置关系组成的骨骼图。

2 骨骼中心自适应算法

如图 2(a)所示，人体骨骼可以表示成16个关节节点，15个刚体，图 2(a)中标红的关节点2是颈部关节点neck，8是臀部关节点hip，是预先设定的两个坐标中心。每个刚体的运动，可以表示为以刚体长度为半径的圆周运动，每个刚体有对应的角速度和角加速度，通过这两个变量可以判断刚体的变化。不同动作的刚体的角速度和角加速度变化情况不同，同一动作的相同关节点相对于不同的坐标中心的变化也不同，因此人体的运动可以参数化为不同动作刚体关节角以不同坐标中心下的动作特征。

本文将(x，y)作为显示的屏幕坐标和z作为深度测量，在3维空间中的一系列刚体轨迹可以被参数化矩阵

$ {\mathit{\boldsymbol{P}}_F} = [{\mathit{\boldsymbol{R}}_F}, {\mathit{\boldsymbol{G}}_{iF}}] $

(2)

式中，${\mathit{\boldsymbol{R}}_F}$如式(1)所示表示各个刚体之间的相对位置关系构成的关系矩阵

$ {\mathit{\boldsymbol{G}}_{iF}} = [{\mathit{\alpha }_{iF}}, {\beta _{iF}}, {\delta _{iF}}] $

(3)

式中，${\mathit{\boldsymbol{G}}_{iF}}$表示刚体在第$F$帧的第$i$个刚体特征向量，${\mathit{\alpha }_{iF}}$表示第$F$帧的第$i$个刚体关节角，${\beta _{iF}}$表示第$F$帧的第$i$个刚体向量的角速度，${\delta _{iF}}$表示第$F$帧的第$i$个刚体向量的角加速度，$i$表示刚体序号，$F$表示帧序列序号。根据式(1)和图 2(b)，第$F$帧的第$i$个刚体关节角为

$ {\mathit{\alpha }_{iF}} = {\rm{arccos}}\frac{{{r_i} \cdot {r_j}}}{{\sqrt {{{\left| {{r_i}} \right|}^2} + {{\left| {{r_j}} \right|}^2}} }} $

(4)

式中，${r_i}$和${r_j}$表示两个相邻刚体向量，$i$和$j$表示刚体序号，$i \ne j$。根据式(4)，第$i$个刚体的角速度为

$ {\beta _{iF}} = \frac{{{\alpha _{i(\mathit{F}{\rm{ + 1}})}}-{\mathit{\alpha }_{iF}}}}{{{t_2}-{t_1}}} $

(5)

式中，${\mathit{\alpha }_{iF}}$、${\alpha _{i(\mathit{F}{\rm{ + 1}})}}$表示第$F$、$F$+1帧的刚体关节角，${t_1}$、${t_2}$表示第$F$、$F$+1帧对应的时刻。根据式(5)，第$i$个刚体的角加速度为

$ {\delta _{iF}} = \frac{{{\beta _{i(\mathit{F}{\rm{ + 1}})}}-{\beta _{iF}}}}{{({\mathit{t}_3}-{\mathit{t}_2})-({\mathit{t}_2} - {\mathit{t}_1})}} $

(6)

式中，${\beta _{iF}}$、${\beta _{i(\mathit{F}{\rm{ + 1}})}}$表示第$F$、$F$+1帧的刚体角速度，${t_1}$、${t_2}$、${t_3}$表示第$F$、$F$+1、$F$+2帧对应的时刻。由于刚体角加速度需要3个时刻数据才能求出，因此已知第$F$、$F$+1、$F$+2帧对应的关节角值和上述3帧对应的时刻是必要条件。

当刚体关节角相对于坐标中心变化较为明显时，其对应的角速度和角加速度变化范围大。表 1显示的是这两个动作主要刚体关节角角速度和角加速度，角速度变化范围的单位为rad/s，角加速度变化范围的单位为rad/s²，其中刚体关节3.6.11表示关节点3和6组成的刚体与关节点6和11组成的刚体构成的刚体关节夹角，hand catch(hip)表示hand catch以hip为坐标中心进行的相关运算，hand catch(neck)表示hand catch以neck为坐标中心进行的相关运算。对于hand catch(hip)动作，运动关节3.6.11，1.3.6，2.4.7，13.9.8角速度和角加速度相较于其他刚体变化较大，对于draw tick，运动明显的刚体关节角3.6.11，4.7.12，10.8.5，2.4.7，1.3.6，2.5.8的角速度$\beta $和角加速度$\delta $相较于其他刚体变化较大。hand catch的主要关节节点，以关节3.6.11为例，以hip为中心角速度$\beta $变化率范围为0.188.97，角加速度$\delta $变化率范围为5.31269.25，分别大于以neck为中心角速度变化率范围0.158.69，角加速度变化率范围4.62260.72。

表 1 不同动作关节角相对于两个坐标中心的角速和角加速度变化范围
Table 1 Different action joint angle angular velocity and angular acceleration relative to the range of two coordinate center

下载CSV

刚体关节角	hand catch(hip)		hand catch(neck)		draw tick(hip)		draw tick(neck)
刚体关节角	角速度变化范围	角加速度变化范围	角速度变化范围	角加速度变化范围	角速度变化范围	角加速度变化范围	角速度变化范围	角加速度变化范围
3.6.11	0.188.97	5.31269.25	0.158.69	4.62260.72	0.5215.90	15.70477.15	0.5414.61	16.12438.32
4.7.12	0.082.90	2.4087.14	0.082.89	2.3386.66	0.094.32	2.83129.65	0.134.46	4.02133.92
1.3.6	0.023.46	0.54103.82	0.043.50	1.23104.93	0.193.04	3.2691.34	0.162.96	4.7988.69
2.4.7	0.144.02	4.33120.62	0.144.02	4.33120.62	0.145.96	4.18178.73	0.145.96	4.18178.7
2.5.8	0.092.67	2.7280.19	0.082.78	2.3883.39	0.143.08	4.2792.31	0.153.19	4.3695.80
9.8.5	0.252.38	7.5571.38	0.212.03	6.3060.98	0.182.80	5.5484.04	0.102.93	2.9987.83
10.8.5	0.102.04	2.9961.32	0.102.04	2.9961.32	0.085.08	2.48152.50	0.085.08	2.48152.5
13.9.8	0.023.48	0.64104.29	0.053.55	1.44105.80	0.152.02	4.5960.57	0.112.46	3.4173.69

对于不同的坐标中心，当一个刚体的角速度$\beta $和角加速度$\delta $相较于其中一个坐标中心大于其他坐标中心时，那么这一刚体以该坐标中心进行归一化，识别率较高。刚体关节角相对于坐标中心变化明显时，自适应选择的时候会选择以其变化明显的坐标中心为骨骼坐标中心。对于hand catch的13个主要刚体关节角，以hip为中心的刚体关节角共有5个角速度和角加速度变化率高于以neck为中心的变化率，剩下8个关节点的角速度和角加速度变化率都是neck高，可知hand catch以neck为中心的变化率高。图 3表示的是各动作以不同的坐标为中心识别率对比图，在数据集UTKinect-Action^[10]上进行测试，分别是以hip，neck，自适应坐标为中心的动作行为识别率，图 4表示的是各动作相对于各坐标中心角速度和角加速度变化关节点数对比图，动作high arm wave的13个关节点中，有12个关节点相对于neck中心变化明显，1个相对于hip变化明显。通过图 3可以发现，对于动作high arm wave其中以hip为中心变化识别率为70.17%，以neck为中心识别率为70.33%高于以hip为中心的识别率16.67%，而且最终坐标在自适应选择坐标中心时选择了变化明显识别率高的neck作为该动作的骨骼自适应坐标中心。通过实验，70%的动作满足以其变化明显的坐标中心为骨骼中心，如图 3和图 4所示，8个动作中有6个动作，关节角相对于坐标中心变化明显，最后自适应地选择了变化明显的坐标中心为骨骼中心对于归一化过程中直接定义hip为坐标中心的方法，并不是所有动作都适合以hip为坐标中心，刚体关节角相对于hip坐标中心变化并不明显时，并不具有优势性。基于上述分析，本文提出了自适应骨骼中心的人体行为识别算法，定义了两个坐标中心，如图 3所示，分别计算出每一个刚体以neck，hip为中心的角速度和角加速度，根据角速度和角加速度相较于两个坐标中心的变化明显程度，自适应选定变化明显的中心作为该动作的骨骼坐标中心。

图 3 动作各个坐标中心识别率对比图

Fig. 3 Each action coordinate center recognition rate comparison graph

图 4 动作相对于坐标中心角速度和角加速度变化关节点数对比图

Fig. 4 Action relative to the center coordinates of the angular velocity and angular acceleration change comparison graph

算法名称：自适应骨骼中心的人体行为识别算法

输入：骨骼动作序列帧

输出：识别精度

1) 读取骨骼动作序列帧，去掉序列帧中的冗余帧；再以臀部关节点hip、颈部关节点neck为坐标中心分别进行归一化，得到臀部中心坐标矩阵dh以及颈部中心坐标矩阵dn；

2) 根据步骤1)得到两个坐标矩阵${\mathit{\boldsymbol{d}}_{\rm{h}}}$、${\mathit{\boldsymbol{d}}_{\rm{n}}}$，分别计算每个关节角的角速度和角加速度；

3) 将根据两个坐标矩阵${\mathit{\boldsymbol{d}}_{\rm{h}}}$、${\mathit{\boldsymbol{d}}_{\rm{n}}}$计算出的所有关节角的角速度和角加速度使用选择排序方法分别进行排序，分别得到臀部关节角的角速度序列${h_1}$、角加速度序列${h_2}$和颈部关节角的角速度序列${h_3}$、角加速度序列${h_4}$；

4) 确定关节角的角速度和角加速度阈值：将所述序列${h_1}$的最小值${s_{F\rho }}$和最大值${l_{F\rho }}$作为臀部关节角的角速度阈值$[{s_{F\rho }}, {l_{F\rho }}]$，将所述序列${h_2}$的最小值${p_{F\rho }}$和最大值${q_{F\rho }}$作为臀部关节角的角加速度阈值$[{\mathit{p}_{F\rho }}, {\mathit{q}_{F\rho }}]$，将所述序列${h_3}$的最小值${m_{F\rho }}$和最大值${n_{F\rho }}$作为颈部关节角的角速度阈值$[{m_{F\rho }}, {n_{F\rho }}]$，将所述序列${h_4}$的最小值${u_{F\rho }}$和最大值${v_{F\rho }}$作为颈部关节角的角加速度阈值$[{u_{F\rho }}, {v_{F\rho }}]$；

5) 根据每个关节角臀部关节角的角速度的阈值$[{s_{F\rho }}, {l_{F\rho }}]$与颈部关节角的角速度阈值$[{m_{F\rho }}, {n_{F\rho }}]$，得到臀部关节角的角速度适应值${\mathit{s}_1}$和颈部关节角的角速度适应值${\mathit{s}_2}$；根据每个关节角的臀部关节角的角加速度阈值$[{p_{F\rho }}, {q_{F\rho }}]$与颈部关节角的角加速度阈值$[{u_{F\rho }}, {v_{F\rho }}]$，得到臀部关节角角加速度适应值${\mathit{s}_3}$和颈部关节角的角速度适应值${\mathit{s}_4}$；

6) 根据步骤5)得到的4个适应值${\mathit{s}_1}$、${\mathit{s}_2}$、${\mathit{s}_3}$和${\mathit{s}_4}$，若${\mathit{s}_1}$+${\mathit{s}_3}$>${\mathit{s}_2}$+${\mathit{s}_4}$，则选择臀部关节点h作为该动作帧序列归一化的坐标中心，否则以n作为该动作的坐标中心，依次确定每一个动作的坐标中心重新归一化，得到动作坐标矩阵；

7) 对得到动作坐标矩阵，使用动态时间规划DTW方法^[8]处理执行速率的变化使用傅里叶时间金字塔表示FTP^[9]来去除高频系数，使用支持向量机SVM方法进行分类，得到识别精度。

3 实验结果与分析

3.1 UTKinect-Action数据集

UTKinect-Action^[3]是通过一个固定的Kinect传感器提取数据，包含10个人，每人有10个不同的动作，每一个动作重复两次，一共有199个有效动作序列。其中5个动作进行训练，5个动作进行测试。这个数据集最大的挑战就是视角的变化以及高类聚动作。

如表 2所示，本文方法与其他方法进行比较的结果。其中HO3DJ^[3]和Combined feature with RFs^[4](CRF)，所有动作都是以hip为坐标中心进行归一化。本文方法的识别率是95.18%高于HOJ3D方法4.28%，高于CRF方法3.48%。

表 2 UTkinect数据集：每一个动作类型的精度比较
Table 2 UTkinect dataset: comparison of each type of action accuracy

下载CSV

/%
方法	走	坐	站	捡	搬	扔	推	拉	挥手s	拍手	平均值
HO3DJ^[3]	96.5	91.5	93.5	97.5	97.5	59	81.5	92.5	100	100	90.9
CRF^[4]	90	90	100	100	78	90	70	100	100	100	91.7
本文	98.9	91	95.96	96.97	100	94	94.95	94.95	99	92	95.18

3.2 MSRAction3D数据集

MSRAction3D^[10]数据集是由类似于Kinect深度摄像机的人体序列，并从深度序列中提取的3维位置的动作数据集。这个数据集包含10个人，每人20个动作，每人重复一个动作3次，一共有557个动作序列。这个数据集有一个挑战就是由于动作变化不是很大，许多动作非常相似。这个数据集包含3个子集，AS1，AS2，AS3，其中AS1和AS2子集是简单的相似动作，AS3是复杂动作。表 3是数据集各动作自适应坐标中心及其识别率，表 4是在MSRAction3D数据集上本文方法与其他方法进行比较的结果，可以看出本文方法高于其他方法的识别率。

表 3 MSRAction数据集各动作的坐标中心和识别率
Table 3 Identify the coordinates of the center movement rate on MSRAction dataset

下载CSV

Action Set	动作名称	坐标中心	识别率/%
Action Set1 (AS1)	水平挥手	neck	97.5
	锤	hip	78.5
	冲拳	neck	98.33
	高抛	hip	98.18
	拍手	hip	100
	弯曲	neck	85.67
	正上手发球	neck	97.33
	拾取&投掷	hip	61.63

Action Set2 (AS2)	高臂挥手	neck	70.33
	抓	hip	62.33
	画X	neck	83.01
	画√	hip	82.67
	画圆	neck	37.33
	双手挥	neck	99.33
	向前踢	neck	75.48
	侧拳	hip	100

Action Set3 (AS3)	高抛	hip	100
	向前踢	hip	100
	侧踢	neck	92.67
	慢跑	neck	94.00
	摇摆	hip	92.67
	正上手发	neck	99.33
	高尔夫挥杆	hip	100
	拾取&投掷	hip	80.82

表 4 MSRAction3D数据集识别率与其他方法的对比
Table 4 Compared with other algorithms of recognition rate on MSRAction3D dataset

下载CSV

/%
方法	AS1	AS2	AS3	综合
HO3DJ^[3]	88.0	85.5	63.3	78.9
Eigenjoints^[5]	74.5	76.1	96.4	82.3
Profile HMM^[10]	84.7	79.2	95.2	86.4
RMRB3D+PGD^[11]	86.65	82.9	93.73	87.76
Actionlet^[8]	-	-	-	88.2
本文	91.18	78.54	94.94	88.47
注：加粗为最优结果。

4 结论

在总结现有行为识别方法不足的基础上提出了一种自适应骨骼中心的人体行为识别算法，参数化运动特征，将刚体特征表示成刚体的角速度和角加速度的变化，如图 1所示，图中的特征提取时本文的亮点，根据动作特征自适应地选择坐标中心，能够有效依据动作的最大特征，提高动作识别率。在数据集上测试验证，数据集测试结果表明，本文方法，对于动作的识别率高于现有的方法，具有一定的优势性。其中在UTkinect数据集上，本文方法的识别率高于其他方法至少3%，单个动作识别率平均高达90%；对于MSRAction3D数据集，其中子集AS1优势较大，AS3有一定的优势，但是AS2识别率较为不理想，对于人体动作中的上肢相似动作的识别率不高，这也和本方法需要改进的地方。本文方法现阶段对于单一动作识别率效果较好，下一步的研究方向是研究多个复杂动作的识别。

参考文献

[1] Rahmani H, Mian A, Shah M. Learning a deep model for human action recognition from novel viewpoints[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. [DOI:10.1109/TPAMI.2017.2691768]

[2] Lu T W, Peng L, Miao S J. Human action recognition of hidden markov model based on depth information[C]//The 15th International Symposium on Parallel and Distributed Computing. Piscataway: IEEE, 2016: 354-357. [DOI:10.1109/ISPDC.2016.58]

[3] Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints[C]//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, RI: IEEE, 2012: 20-27. [DOI:10.1109/CVPRW.2012.6239233]

[4] Han L, Xu X X, Liang W, et al. Discriminative human action recognition in the learned hierarchical manifold space[J]. Image and Vision Computing, 2010, 28(5): 836–849. [DOI:10.1016/j.imavis.2009.08.003]

[5] Yang X D, Tian Y L. Effective 3D action recognition using EigenJoints[J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 2–11. [DOI:10.1016/j.jvcir.2013.03.001]

[6] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a Lie group[C]//Proceeding of the Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 88-595. [DOI:10.1109/CVPR.2014.82]

[7] Ding W W, Liu K, Li G, et al. Human action recognition using spectral embedding to similarity degree between postures[C]//Proceedings of 2016 Visual Communications and Image Processing. Chengdu: IEEE, 2016: 1-4. [DOI:10.1109/VCIP.2016.7805441]

[8] Müller M. Information Retrieval for Music and Motion[M]. Berlin, Heidelberg: Springer-Verlag, 2007: 2-5.

[9] Wang J, Liu Z C, Wu Y, et al. Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: IEEE, 2012: 1290-1297. [DOI:10.1109/CVPR.2012.6247813]

[10] Ding W W, Liu K, Fu X J, et al. Profile hmms for skeleton-based human action recognition[J]. Signal Processing:Image Communication, 2016, 42: 109–119. [DOI:10.1016/j.image.2016.01.010]