结合特权信息的人体动作识别

凌佩佩; 邱崧; 蔡茗名; 徐伟; 丰颖

doi:10.11834/jig.20170408

图像处理和编码 | 浏览量 : 0 下载量: 358 CSCD: 2

PDF
导出
分享
收藏
专辑

结合特权信息的人体动作识别
Human action recognition based on privileged information
2017年22卷第4期页码：482-491
网络出版：2017-04-07，

纸质出版：2017
DOI： 10.11834/jig.20170408
稿件说明：

移动端阅览

凌佩佩, 邱崧, 蔡茗名, 徐伟, 丰颖. 结合特权信息的人体动作识别[J]. 中国图象图形学报, 2017,22(4):482-491. DOI： 10.11834/jig.20170408.

Ling Peipei, Qiu Song, Cai Mingming, Xu Wei, Feng Ying. Human action recognition based on privileged information[J]. Journal of Image and Graphics, 2017, 22(4): 482-491. DOI： 10.11834/jig.20170408.

摘要

采用传统的2维特征提取方法，很难从视频中准确地捕获出人体的关节点位置，限制了识别率的上限。采用深度信息的3维特征提取能提升识别率，但高维空间运算复杂度高，很难实现实时识别，受应用场景限制。为克服上述难点，提出一种基于3维特权学习的人体动作识别方法，将3维信息作为特权信息引入到传统的2维动作识别过程中，用来识别人体动作。以运动边界直方图密集光流特征、Mosift（Motion SIFT）特征和多种特征结合的混合特征作为2维基本特征。从Kinect设备获得的深度信息中评估出人体的关节点信息，并用李群算法处理得到3维特征作为特权信息。特权信息在经典支持向量机下的识别效果优于2维基本特征。训练数据包含2维基本特征和3维特权信息，测试数据只有2维基本特征。通过训练样本学习，得到结合特权信息的支持向量机（SVM+），使用该向量机对测试样本进行分类，得到人体动作识别结果。在UTKinect-Action和Florence3D-Action两个人体动作数据集上进行实验。引入特权信息后，人体动作识别率较传统2维识别有2%的平均提升，最高达到9%。SVM+分类器对参数的敏感性较SVM下降。实验结果表明，本文方法较以往方法，在提升识别准确率的同时，降低了分类器对参数的敏感性。本文方法仅在训练过程中需要同时提取2维基本特征和3维特权信息，而在测试过程中无需借助深度信息获取设备提取3维特权特征信息，学习速度快，运算复杂度低，可广泛应用于低成本，高实时的人体动作识别场合。

Abstract

The study of human action recognition is an area with important academic and application values. It is widely applied to the fields of intelligent surveillance

video retrieval

human interaction

live entertainment

virtual reality

and health care. In human learning

a teacher can provide students with information hidden in examples

explanations

comments

and comparisons. However

the information offered by a teacher is seldom applied to the field of human action recognition. This study considers 3D depth features as privileged information to help solve human action recognition problems and to demonstrate the superiority of a new learning paradigm over the classical learning paradigm. This paper reports on the details of the new paradigm and its corresponding algorithms. The human body can be represented as an articulated system with rigid segments connected by joints. Human motion can be regarded as a continuous evolution of the spatial configuration of these rigid segments. With the recent release of depth cameras

an increasing number of studies have extracted the 3D positions of tracked joints to represent human activities， these studies have achieved relatively good performance. However

relative 3D algorithms have numerous application limits resulting from inconvenient equipment and costly computation. The extraction of joints from RGB video sequences is difficult

which limits recognition result. This study applies 3D depth features as privileged information to solve the aforementioned challenge. In particular

we apply a new skeletal representation that explicitly models the 3D geometric relationships among different body parts that use rotations and translations in 3D space in the lie group. We use different algorithms

including motion scale-invariant feature transform

motion boundary histograms

and different combined descriptors

for the basic 2D features to unite privileged information. Privileged information is available in the training stage

but not in the testing stage. Similar to the traditional classification problem

the new algorithm focuses on learning a new classifier

i.e.

support vector machine+ (SVM+). The SVM+ algorithm

which considers both privileged and unprivileged information

is highly similar to SVM algorithms in terms of determining solutions in the classical pattern recognition framework. In particular

it finds the optimal separating hyperplane

which incurs a few training errors and exhibits a large margin. However

the SVM+ algorithm is computationally costlier than SVM. This study applies the new algorithm to the field of human activity recognition to provide convenience to the testing set because 3D information is only required in the training set. We evaluate our method in two challenge databases

namely

UTKinect-Action and Florence3D-Action

with three different 2D features. The SVM+ algorithm considers both 2D basic features and 3D privileged information

whereas SVM only uses 2D basic features. Results show that our proposed SVM+ outperforms SVM. Moreover

SVM+ is less sensitive to relevant parameters than SVM. This paper reports on the details of the recognition performance

varying numbers of training samples

different parameters

and confusion matrix for both SVM and SVM+ on the two datasets. The privileged information can help to reduce the noise of the original 2D basic features and increase the robustness of human activity recognition. The role of a teacher in providing remarks

explanations

and analogies is highly important. This study proposes a new human action recognition method based on privileged information. The experimental results of the two datasets show the effectiveness of our method in human action recognition. The proposed method is only required to extract 3D privileged information during the training process. A depth information acquisition device is not required during the testing process. This method exhibits high learning speed and low computational complexity. It can be extensively used in low-cost

real-time human action recognition.