发布时间: 2019-03-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180375 2019 | Volume 24 | Number 3 图像分析和识别

 收稿日期: 2018-06-15; 修回日期: 2018-09-14 基金项目: 国家自然科学基金项目（61179045） 第一作者简介: 刘婷婷, 1992年生, 女, 硕士研究生, 主要研究方向为计算机视觉、机器学习。E-mail:ttliu_123@163.com;李玉鹏, 男, 硕士研究生, 主要研究方向为机器学习、深度学习。E-mail:yupengli666@126.com. 中图法分类号: TP391.4 文献标识码: A 文章编号: 1006-8961(2019)03-0400-10

# 关键词

Human action recognition based on multi-perspective depth motion maps
Liu Tingting, Li Yupeng, Zhang Liang
Key Laboratory of Advanced Signal and Image Processing, Civil Aviation University of China, Tianjin 300300, China
Supported by: National Natural Science Foundation of China (61179045)

# Abstract

Objective Action recognition based on depth data is gradually performed due to the insensitivity to illumination of depth data. Two main methods are used; one refers to the point clouds converted from depth maps, and the other refers to the depth motion maps (DMMs) generated from depth map projection. Motion history point cloud (MHPC) is proposed to represent actions, but the large amount of points in the MHPC incur expensive computations when extracting features. DMMs are generated by stacking the motion energy of depth map sequence projected onto three orthogonal Cartesian planes. Projecting the depth maps onto a specific plane provides additional body shape and motion information. However, DMM contains inadequate motion information, which caps the human action recognition accuracy, although extracting features from DMMs is simple. In other words, an action is represented by DMMs from only three views; consequently, the action information from other perspectives is lacking. Multi-perspective DMMs for human action recognition are proposed to solve the above problems. Method In the algorithm, MHPC is first generated from a depth map sequence to represent actions. Motion information under different perspectives is supplemented through rotating the MHPC around axis $Y$ at a certain angle. The primary MHPC is then projected onto three orthogonal Cartesian planes, and the rotated MHPC is projected onto $XOY$ planes. The multi-perspective DMMs are generated from these projected MHPCs. After projection, the point clouds are distributed in the plane where many overlapping points under the same coordinates exist. These points may come from the same frame of depth map or from different frames. We use these overlapping points to generate DMMs and capture the spatial energy distribution of motion. For example, the pixel in DMMs generated from the MHPC projected onto $XOY$ plane is the sum of the absolute difference of $z$ of the adjacent two overlapping points belonging to different frames. DMM generation from the MHPC projected onto $YOZ$ and $XOZ$ planes is similar to this; only the point of $z$ is correspondingly changed to $x$ and $y$. MHPC is projected onto three orthogonal Cartesian planes to generate DMMs from the front, side, and top views. The rotated MHPC is projected onto the $XOY$ plane to generate DMMs under different views. Multi-perspective DMMs that encode the 4D information of an action to 2D maps are utilized to represent an action. Thus, the action information under considerable perspective is replenished. The values of the $x, y, z$ of points in the projected MHPC are normalized to fixed values as multi-perspective DMM image coordinates, which can reduce the intraclass variability due to different action performers. According to experience, this study normalizes the values of $x$ and $z$ to 511 and those of $y$ to 1 023. The histogram of oriented gradients is extracted from each DMM and then concatenated as a feature vector of an action. Finally, the SVM classifier is adopted to train the classifier to recognize the action. Experiments with this method on the MSR Action3D dataset and our dataset are performed. Result The proposed algorithm exhibits improved performances on MSR Action 3D database and our dataset. Two experimental settings are considered for MSR Action3D. The proposed algorithm achieves an identification rate of 96.8% in the first experimental setting, which is obviously better than those for most algorithms. The action recognition rate of the proposed algorithm is 2.5% higher than that of the APS-PHOG (axonometric projections and PHOG feature) algorithm, 1.9% higher than that of the DMM algorithm, and 1.1% higher than that of the DMM_CRC (DMMs and collaborative representation classifier) algorithm. In the second experimental setting, the recognition rate of the proposed algorithm reaches 93.82%, which is 5.09% higher than that of the DMM algorithm, 4.93% higher than that of the HON4D algorithm, 2.18% higher than that of the HOPC algorithm, and 1.92% higher than that of the DMM_LBP feature fusion. In our database, the recognition rate of the proposed algorithm is 97.98%, which is 3.98% higher than that of the MHPC algorithm. Conclusion MHPC is used to represent actions which supplement the action information from different perspectives by rotating certain angles. Multi-perspective DMMs are generated by computing the distribution of overlapping points in the projected MHPC, which captures the spatial distribution of the absolute motion energy. Coordinate normalization reduces the intraclass variability. Experimental results show that multi-perspective DMMs not only solve the difficulty of extracting features from MHPCs but also supplement the motion information of traditional DMMs. Human action recognition based on multi-perspective DMMs outperforms some existing methods. The new approach combines the method of point clouds with the method of deep motion map, utilizing the advantages of both and weakening their disadvantages.

# Key words

human action recognition; depth maps; depth motion maps; multi-perspective depth motion maps; motion history point cloud; histogram of oriented gradient; support vector machine

# 1.1 运动历史点云

 ${\mathit{\boldsymbol{X}}_{{\rm{norm}}}} = {C_0} \times \frac{{\mathit{\boldsymbol{X}} - {\mathit{\boldsymbol{X}}_{{\rm{min}}}}}}{{{\mathit{\boldsymbol{X}}_{{\rm{max}}}} - {\mathit{\boldsymbol{X}}_{{\rm{min}}}}}}$ (4)

# 2.2 SVM分类器

 $\begin{array}{c} \mathit{\boldsymbol{T}} = \{ ({\mathit{\boldsymbol{x}}_1}, {y_1}), ({\mathit{\boldsymbol{x}}_2}, {y_2}), \cdots , ({\mathit{\boldsymbol{x}}_m}, {y_m})\} \\ {\mathit{\boldsymbol{x}}_i} \in \mathit{\boldsymbol{X}} = {{\bf{R}}^n} \end{array}$

 $\gamma = \frac{2}{{\left\| \mathit{\boldsymbol{w}} \right\|}}$ (5)

 $\begin{array}{c} \mathop {\max }\limits_{\mathit{\boldsymbol{w}}, b} \frac{2}{{\left\| \mathit{\boldsymbol{w}} \right\|}}\;\;\;{\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\;{y_i}({\mathit{\boldsymbol{w}}^{\rm{T}}}{\mathit{\boldsymbol{x}}_i} + b) \ge 1\\ i = 1, 2, \cdots , m \end{array}$ (6)

 $\begin{array}{c} \mathop {\min }\limits_{\mathit{\boldsymbol{w}}, b} \frac{1}{2}{\left\| \mathit{\boldsymbol{w}} \right\|^2}\;\;\;{\rm{s}}{\rm{.t}}.\;\;\;\;{y_i}({\mathit{\boldsymbol{w}}^{\rm{T}}}{\mathit{\boldsymbol{x}}_i} + b) \ge 1\\ i = 1, 2, \cdots , m \end{array}$ (7)

# 3.2.1 MSR Action3D数据库

Table 1 Three action subsets of MSR Action3D dataset

 动作子集(AS1) 动作子集2(AS2) 动作子集3(AS3) Horizontal wave(2) High wave(1) High throw(6) Hammer(3) Hand catch(4) Forward kick(14) Forward punch(5) Draw x(7) Side kick(15) High throw(6) Draw tick(8) Jogging(16) Hand clap(10) Draw circle(9) Tennis swing(17) Bend(13) Two hand wave(11) Tennis serve(18) Tennis serve(18) Forward kick(14) Golf swing(19) Pickup throw(20) Side boxing(12) Pickup throw(20) 注：括号中数字表示数据库中动作的序号。

Table 2 Comparison of our method with other existing methods in experiment setting 1

 /% 算法 实验1 实验2 交叉实验 平均值 AS1 AS2 AS3 平均值 AS1 AS2 AS3 平均值 AS1 AS2 AS3 平均值 DMM[13] 97.3 92.2 98.0 95.8 98.7 94.7 98.7 97.4 96.2 84.1 94.6 91.6 94.9 APS_PHOG[12] 94.8 95.2 97.9 96.0 97.8 98.8 98.0 98.2 90.6 81.4 94.6 88.8 94.3 DMM_CRC[14] 97.3 96.1 98.7 97.4 98.6 98.7 100.0 99.1 96.2 83.2 92.0 90.5 95.7 本文方法 97.0 98.2 96.5 97.2 100.0 100.0 99.1 99.7 91.7 95.1 93.9 93.6 96.8 注：加粗字体表示最优值。

Table 3 Comparison of our method with other existing methods in experiment setting 2

 /% 方法 精确率 HOPC[8] 91.64 HON4D[9] 88.89 SNV[10] 93.45 DMM[13] 88.73 DMM_LBP_FF[15] 91.90 DMM_LBP_DF[15] 93.00 Multi_Fused Features[19] 93.30 本文方法 93.82 注：加粗字体表示最优值。

# 3.2.2 自建数据库

Table 4 Comparison of our method with other existing methods in our database

 /% 方法 精确率 MHPC[11] 94.00 Multi_perspective_MHPCs 96.97 本文方法 97.98 注：加粗字体表示最优值。

# 参考文献

• [1] Weinland D, Boyer E. Action recognition using exemplar-based embedding[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-7.[DOI: 10.1109/CVPR.2008.4587731]
• [2] Guo K, Ishwar P, Konrad J. Action recognition in video by sparse representation on covariance manifolds of silhouette tunnels[C]//Proceedings of ICPR 2010 Contests. Istanbul, Turkey: Springer, 2010: 294-305.[DOI: 10.1007/978-3-642-17711-8_30]
• [3] Zhang L, Lu M M, Jiang H. An improved scheme of visual words description and action recognition using local enhanced distribution information[J]. Journal of Electronics & Information Technology, 2016, 38(3): 549–556. [张良, 鲁梦梦, 姜华. 局部分布信息增强的视觉单词描述与动作识别[J]. 电子与信息学报, 2016, 38(3): 549–556. ] [DOI:10.11999/JEIT150410]
• [4] Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints[C]//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, RI, USA: IEEE, 2012: 20-27.[DOI: 10.1109/CVPRW.2012.6239233]
• [5] Ran X Y, Liu K, Li G, et al. Human action recognition algorithm based on adaptive skeleton center[J]. Journal of Image and Graphics, 2018, 23(4): 519–525. [冉宪宇, 刘凯, 李光, 等. 自适应骨骼中心的人体行为识别算法[J]. 中国图象图形学报, 2018, 23(4): 519–525. ] [DOI:10.11834/jig.170420]
• [6] Wang C Y, Wang Y Z, Yuille A L. Mining 3D key-pose-motifs for action recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2639-2647.[DOI: 10.1109/CVPR.2016.289]
• [7] Liu M Y, Liu H, Chen C. 3D action recognition using multiscale energy-based global ternary image[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(8): 1824–1838. [DOI:10.1109/TCSVT.2017.2655521]
• [8] Rahmani H, Mahmood A, Du Q H, et al. HOPC: histogram of oriented principal components of 3D pointclouds for action recognition[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 742-757.[DOI: 10.1007/978-3-319-10605-2_48]
• [9] Oreifej O, Liu Z C. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 716-723.[DOI: 10.1109/CVPR.2013.98]
• [10] Yang X D, Tian Y L. Super normal vector for human activity recognition with depth cameras[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(5): 1028–1039. [DOI:10.1109/TPAMI.2016.2565479]
• [11] Liu W P, Jiang Y F, Wang H L, et al. Action description using point clouds[C]//Proceedings Volume 10443, Second International Workshop on Pattern Recognition. Singapore: SPIE, 2017: 104430X.[DOI: 10.1117/12.2280342]
• [12] Shen X X, Zhang H, Gao Z, et al. Human behavior recognition based on axonometric projections and PHOG feature[J]. Journal of Computational Information Systems, 2014, 10(8): 3455–3463. [DOI:10.12733/jcis9956]
• [13] Yang X D, Zhang C Y, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradients[C]//Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan: ACM, 2012: 1057-1060.[DOI: 10.1145/2393347.2396382]
• [14] Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps[J]. Journal of Real-Time Image Processing, 2016, 12(1): 155–163. [DOI:10.1007/s11554-013-0370-1]
• [15] Chen C, Jafari R, Kehtarnavaz N. Action recognition from depth sequences using depth motion maps-based local binary patterns[C]//Proceedings of 2015 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE, 2015: 1092-1099.[DOI: 10.1109/WACV.2015.150]
• [16] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005: 886-893.[DOI: 10.1109/CVPR.2005.177]
• [17] Suykens J A K, Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999, 9(3): 293–300. [DOI:10.1023/A:1018628609742]
• [18] Li W Q, Zhang Z Y, Liu Z C. Action recognition based on a bag of 3D points[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco, CA, USA: IEEE, 2010: 9-14.[DOI: 10.1109/CVPRW.2010.5543273]
• [19] Jalal A, Kim Y H, Kim Y J, et al. Robust human activity recognition from depth video using spatiotemporal multi-fused features[J]. Pattern Recognition, 2016, 61: 295–308. [DOI:10.1016/j.patcog.2016.08.003]