多视角深度运动图的人体行为识别

刘婷婷; 李玉鹏; 张良

doi:10.11834/jig.180375

图像分析和识别 | 浏览量 : 0 下载量: 91 CSCD: 2

PDF
导出
分享
收藏
专辑

多视角深度运动图的人体行为识别
Human action recognition based on multi-perspective depth motion maps
2019年24卷第3期页码：400-409
收稿：2018-06-15，

修回：2018-9-14，

纸质出版：2019-03-16
DOI： 10.11834/jig.180375
稿件说明：

移动端阅览

刘婷婷, 李玉鹏, 张良. 多视角深度运动图的人体行为识别[J]. 中国图象图形学报, 2019,24(3):400-409. DOI： 10.11834/jig.180375.

Tingting Liu, Yupeng Li, Liang Zhang. Human action recognition based on multi-perspective depth motion maps[J]. Journal of Image and Graphics, 2019, 24(3): 400-409. DOI： 10.11834/jig.180375.

摘要

目的

使用运动历史点云（MHPC）进行人体行为识别的方法，由于点云数据量大，在提取特征时运算复杂度很高。而使用深度运动图（DMM）进行人体行为识别的方法，提取特征简单，但是包含的动作信息不全面，限制了人体行为识别精度的上限。针对上述问题，提出了一种多视角深度运动图的人体行为识别算法。

方法

首先采用深度图序列生成MHPC对动作进行表示，接着将MHPC旋转特定角度补充更多视角下的动作信息；然后将原始和旋转后MHPC投影到笛卡儿坐标平面，生成多视角深度运动图，并对其提取方向梯度直方图，采用串联融合生成特征向量；最后使用支持向量机对特征向量进行分类识别，在MSR Action3D和自建数据库上对算法进行验证。

结果

MSR Action3D数据库有2种实验设置，采用实验设置1时，算法识别率为96.8%，比APS_PHOG（axonometric projections and PHOG feature）算法高2.5%，比DMM算法高1.9%，比DMM_CRC（depth motion maps and collaborative representation classifier）算法高1.1%。采用实验设置2时，算法识别率为93.82%，比DMM算法高5.09%，比HON4D（histogram of oriented 4D surface normal）算法高4.93%。在自建数据库上该算法识别率达到97.98%，比MHPC算法高3.98%。

结论

实验结果表明，多视角深度运动图不但解决了MHPC提取特征复杂的问题，而且使DMM包含了更多视角下的动作信息，有效提高了人体行为识别的精度。

Abstract

Objective

Action recognition based on depth data is gradually performed due to the insensitivity to illumination of depth data. Two main methods are used; one refers to the point clouds converted from depth maps

and the other refers to the depth motion maps (DMMs) generated from depth map projection. Motion history point cloud (MHPC) is proposed to represent actions

but the large amount of points in the MHPC incur expensive computations when extracting features. DMMs are generated by stacking the motion energy of depth map sequence projected onto three orthogonal Cartesian planes. Projecting the depth maps onto a specific plane provides additional body shape and motion information. However

DMM contains inadequate motion information

which caps the human action recognition accuracy

although extracting features from DMMs is simple. In other words

an action is represented by DMMs from only three views; consequently

the action information from other perspectives is lacking. Multi-perspective DMMs for human action recognition are proposed to solve the above problems.

Method

In the algorithm

MHPC is first generated from a depth map sequence to represent actions. Motion information under different perspectives is supplemented through rotating the MHPC around axis

$$Y$$

at a certain angle. The primary MHPC is then projected onto three orthogonal Cartesian planes

and the rotated MHPC is projected onto

$$XOY$$

planes. The multi-perspective DMMs are generated from these projected MHPCs. After projection

the point clouds are distributed in the plane where many overlapping points under the same coordinates exist. These points may come from the same frame of depth map or from different frames. We use these overlapping points to generate DMMs and capture the spatial energy distribution of motion. For example

the pixel in DMMs generated from the MHPC projected onto

$$XOY$$

plane is the sum of the absolute difference of

$$z$$

of the adjacent two overlapping points belonging to different frames. DMM generation from the MHPC projected onto

$$YOZ$$

and

$$XOZ$$

planes is similar to this; only the point of

$$z$$

is correspondingly changed to

$$x$$

and

$$y$$

. MHPC is projected onto three orthogonal Cartesian planes to generate DMMs from the front

side

and top views. The rotated MHPC is projected onto the

$$XOY$$

plane to generate DMMs under different views. Multi-perspective DMMs that encode the 4D information of an action to 2D maps are utilized to represent an action. Thus

the action information under considerable perspective is replenished. The values of the

$$x

z$$

of points in the projected MHPC are normalized to fixed values as multi-perspective DMM image coordinates

which can reduce the intraclass variability due to different action performers. According to experience

this study normalizes the values of

$$x$$

and

$$z$$

to 511 and those of

$$y$$

to 1 023. The histogram of oriented gradients is extracted from each DMM and then concatenated as a feature vector of an action. Finally

the SVM classifier is adopted to train the classifier to recognize the action. Experiments with this method on the MSR Action3D dataset and our dataset are performed.

Result

The proposed algorithm exhibits improved performances on MSR Action 3D database and our dataset. Two experimental settings are considered for MSR Action3D. The proposed algorithm achieves an identification rate of 96.8% in the first experimental setting

which is obviously better than those for most algorithms. The action recognition rate of the proposed algorithm is 2.5% higher than that of the APS-PHOG (axonometric projections and PHOG feature) algorithm

1.9% higher than that of the DMM algorithm

and 1.1% higher than that of the DMM_CRC (DMMs and collaborative representation classifier) algorithm. In the second experimental setting

the recognition rate of the proposed algorithm reaches 93.82%

which is 5.09% higher than that of the DMM algorithm

4.93% higher than that of the HON4D algorithm

2.18% higher than that of the HOPC algorithm

and 1.92% higher than that of the DMM_LBP feature fusion. In our database

the recognition rate of the proposed algorithm is 97.98%

which is 3.98% higher than that of the MHPC algorithm.

Conclusion

MHPC is used to represent actions which supplement the action information from different perspectives by rotating certain angles. Multi-perspective DMMs are generated by computing the distribution of overlapping points in the projected MHPC

which captures the spatial distribution of the absolute motion energy. Coordinate normalization reduces the intraclass variability. Experimental results show that multi-perspective DMMs not only solve the difficulty of extracting features from MHPCs but also supplement the motion information of traditional DMMs. Human action recognition based on multi-perspective DMMs outperforms some existing methods. The new approach combines the method of point clouds with the method of deep motion map

utilizing the advantages of both and weakening their disadvantages.

关键词

Keywords

references

Weinland D, Boyer E. Action recognition using exemplar-based embedding[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-7.[ DOI: 10.1109/CVPR.2008.4587731 http://dx.doi.org/10.1109/CVPR.2008.4587731 ]

Guo K, Ishwar P, Konrad J. Action recognition in video by sparse representation on covariance manifolds of silhouette tunnels[C]//Proceedings of ICPR 2010 Contests. Istanbul, Turkey: Springer, 2010: 294-305.[ DOI: 10.1007/978-3-642-17711-8_30 http://dx.doi.org/10.1007/978-3-642-17711-8_30 ]

Zhang L, Lu M M, Jiang H. An improved scheme of visual words description and action recognition using local enhanced distribution information[J]. Journal of Electronics&Information Technology, 2016, 38(3):549-556.

张良, 鲁梦梦, 姜华.局部分布信息增强的视觉单词描述与动作识别[J].电子与信息学报, 2016, 38(3):549-556. [DOI:10.11999/JEIT150410]

Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints[C]//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, RI, USA: IEEE, 2012: 20-27.[ DOI: 10.1109/CVPRW.2012.6239233 http://dx.doi.org/10.1109/CVPRW.2012.6239233 ]

Ran X Y, Liu K, Li G, et al, Chen B. Human action recognition algorithm based on adaptive skeleton center[J]. Journal of Image and Graphics, 2018, 23(4):519-525.

冉宪宇, 刘凯, 李光, 等.自适应骨骼中心的人体行为识别算法[J].中国图象图形学报, 2018, 23(4):519-525. [DOI:10.11834/jig.170420]

Wang C Y, Wang Y Z, Yuille A L. Mining 3D key-pose-motifs for action recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2639-2647.[ DOI: 10.1109/CVPR.2016.289 http://dx.doi.org/10.1109/CVPR.2016.289 ]

Liu M Y, Liu H, Chen C. 3D action recognition using multiscale energy-based global ternary image[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(8):1824-1838.[DOI:10.1109/TCSVT.2017.2655521]

Rahmani H, Mahmood A, Du Q H, et al. HOPC: histogram of oriented principal components of 3D pointclouds for action recognition[C] //Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 742-757.[ DOI: 10.1007/978-3-319-10605-2_48 http://dx.doi.org/10.1007/978-3-319-10605-2_48 ]

Oreifej O, Liu Z C. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 716-723.[ DOI: 10.1109/CVPR.2013.98 http://dx.doi.org/10.1109/CVPR.2013.98 ]

Yang X D, Tian Y L. Super normal vector for human activity recognition with depth cameras[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(5):1028-1039.[DOI:10.1109/TPAMI.2016.2565479]

Liu W P, Jiang Y F, Wang H L, et al. Action description using point clouds[C]//Proceedings Volume 10443, Second International Workshop on Pattern Recognition. Singapore: SPIE, 2017: 104430X.[ DOI: 10.1117/12.2280342 http://dx.doi.org/10.1117/12.2280342 ]

Shen X X, Zhang H, Gao Z, et al. Human behavior recognition based on axonometric projections and PHOG feature[J]. Journal of Computational Information Systems, 2014, 10(8):3455-3463.[DOI:10.12733/jcis9956]

Yang X D, Zhang C Y, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradients[C]//Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan: ACM, 2012: 1057-1060.[ DOI: 10.1145/2393347.2396382 http://dx.doi.org/10.1145/2393347.2396382 ]

Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps[J]. Journal of Real-Time Image Processing, 2016, 12(1):155-163.[DOI:10.1007/s11554-013-0370-1]

Chen C, Jafari R, Kehtarnavaz N. Action recognition from depth sequences using depth motion maps-based local binary patterns[C]//Proceedings of 2015 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE, 2015: 1092-1099.[ DOI: 10.1109/WACV.2015.150 http://dx.doi.org/10.1109/WACV.2015.150 ]

Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego,CA, USA: IEEE, 2005: 886-893.[ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

Suykens J A K, Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999, 9(3):293-300.[DOI:10.1023/A:1018628609742]

Li W Q, Zhang Z Y, Liu Z C. Action recognition based on a bag of 3D points[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco, CA, USA: IEEE, 2010: 9-14.[ DOI: 10.1109/CVPRW.2010.5543273 http://dx.doi.org/10.1109/CVPRW.2010.5543273 ]

Jalal A, Kim Y H, Kim Y J, et al. Robust human activity recognition from depth video using spatiotemporal multi-fused features[J]. Pattern Recognition, 2016, 61:295-308.[DOI:10.1016/j.patcog.2016.08.003]