结合特权信息的人体动作识别
Human action recognition based on privileged information
- 2017年22卷第4期 页码:482-491
网络出版:2017-04-07,
纸质出版:2017
DOI: 10.11834/jig.20170408
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2017-04-07,
纸质出版:2017
移动端阅览
采用传统的2维特征提取方法,很难从视频中准确地捕获出人体的关节点位置,限制了识别率的上限。采用深度信息的3维特征提取能提升识别率,但高维空间运算复杂度高,很难实现实时识别,受应用场景限制。为克服上述难点,提出一种基于3维特权学习的人体动作识别方法,将3维信息作为特权信息引入到传统的2维动作识别过程中,用来识别人体动作。 以运动边界直方图密集光流特征、Mosift(Motion SIFT)特征和多种特征结合的混合特征作为2维基本特征。从Kinect设备获得的深度信息中评估出人体的关节点信息,并用李群算法处理得到3维特征作为特权信息。特权信息在经典支持向量机下的识别效果优于2维基本特征。训练数据包含2维基本特征和3维特权信息,测试数据只有2维基本特征。通过训练样本学习,得到结合特权信息的支持向量机(SVM+),使用该向量机对测试样本进行分类,得到人体动作识别结果。 在UTKinect-Action和Florence3D-Action两个人体动作数据集上进行实验。引入特权信息后,人体动作识别率较传统2维识别有2%的平均提升,最高达到9%。SVM+分类器对参数的敏感性较SVM下降。 实验结果表明,本文方法较以往方法,在提升识别准确率的同时,降低了分类器对参数的敏感性。本文方法仅在训练过程中需要同时提取2维基本特征和3维特权信息,而在测试过程中无需借助深度信息获取设备提取3维特权特征信息,学习速度快,运算复杂度低,可广泛应用于低成本,高实时的人体动作识别场合。
The study of human action recognition is an area with important academic and application values. It is widely applied to the fields of intelligent surveillance
video retrieval
human interaction
live entertainment
virtual reality
and health care. In human learning
a teacher can provide students with information hidden in examples
explanations
comments
and comparisons. However
the information offered by a teacher is seldom applied to the field of human action recognition. This study considers 3D depth features as privileged information to help solve human action recognition problems and to demonstrate the superiority of a new learning paradigm over the classical learning paradigm. This paper reports on the details of the new paradigm and its corresponding algorithms. The human body can be represented as an articulated system with rigid segments connected by joints. Human motion can be regarded as a continuous evolution of the spatial configuration of these rigid segments. With the recent release of depth cameras
an increasing number of studies have extracted the 3D positions of tracked joints to represent human activities, these studies have achieved relatively good performance. However
relative 3D algorithms have numerous application limits resulting from inconvenient equipment and costly computation. The extraction of joints from RGB video sequences is difficult
which limits recognition result. This study applies 3D depth features as privileged information to solve the aforementioned challenge. In particular
we apply a new skeletal representation that explicitly models the 3D geometric relationships among different body parts that use rotations and translations in 3D space in the lie group. We use different algorithms
including motion scale-invariant feature transform
motion boundary histograms
and different combined descriptors
for the basic 2D features to unite privileged information. Privileged information is available in the training stage
but not in the testing stage. Similar to the traditional classification problem
the new algorithm focuses on learning a new classifier
i.e.
support vector machine+ (SVM+). The SVM+ algorithm
which considers both privileged and unprivileged information
is highly similar to SVM algorithms in terms of determining solutions in the classical pattern recognition framework. In particular
it finds the optimal separating hyperplane
which incurs a few training errors and exhibits a large margin. However
the SVM+ algorithm is computationally costlier than SVM. This study applies the new algorithm to the field of human activity recognition to provide convenience to the testing set because 3D information is only required in the training set. We evaluate our method in two challenge databases
namely
UTKinect-Action and Florence3D-Action
with three different 2D features. The SVM+ algorithm considers both 2D basic features and 3D privileged information
whereas SVM only uses 2D basic features. Results show that our proposed SVM+ outperforms SVM. Moreover
SVM+ is less sensitive to relevant parameters than SVM. This paper reports on the details of the recognition performance
varying numbers of training samples
different parameters
and confusion matrix for both SVM and SVM+ on the two datasets. The privileged information can help to reduce the noise of the original 2D basic features and increase the robustness of human activity recognition. The role of a teacher in providing remarks
explanations
and analogies is highly important. This study proposes a new human action recognition method based on privileged information. The experimental results of the two datasets show the effectiveness of our method in human action recognition. The proposed method is only required to extract 3D privileged information during the training process. A depth information acquisition device is not required during the testing process. This method exhibits high learning speed and low computational complexity. It can be extensively used in low-cost
real-time human action recognition.
相关作者
相关机构
京公网安备11010802024621