Current Issue Cover
基于局部时空特征方向加权的人体行为识别

李俊峰, 张飞燕(浙江理工大学自动化研究所, 杭州 310012)

摘 要
目的 对人体行为的描述是行为识别中的关键问题,为了能够充分利用训练数据从而保证特征对行为的高描述性,提出了基于局部时空特征方向加权的人体行为识别方法。方法 首先,将局部时空特征的亮度梯度特征分解为3个方向(XYZ)分别来描述行为, 通过直接构造视觉词汇表分别得到不同行为3方向特征描述子集合的标准视觉词汇码本,并利用训练视频得到每个行为的标准3方向词汇分布;进而,根据不同行为3方向特征描述子集合的标准视觉词汇码本,分别计算测试视频相应的3方向的词汇分布,并利用与各行为标准3方向词汇分布的加权相似性度量进行行为识别;结果 在Weizmann数据库和KTH数据库中进行实验,Weizmann数据库中的平均识别率高达96.04%,KTH数据库中的平均识别率也高达96.93%。结论 与其他行为识别方法相比可以明显提高行为平均识别率。
关键词
Human behavior recognition based on directional weighting local space-time features

Li Junfeng, Zhang Feiyan(Institute of Automation, Zhejiang Sci-Tech University, Hangzhou 310012, China)

Abstract
Objective Human action recognition aims to detect and analyze human behavior intelligently on the basis of information captured by cameras. The applications of this technology include surveillance, video content retrieval robotics, and human-computer interfaces. Human behavior description is a key problem of behavior recognition. To utilize training data fully and to ensure a highly descriptive feature descriptor of behavior, a new human activity recognition method is proposed in this study.Method First, the brightness gradient was decomposed into three directions (X,Y,Z) to describe the behavior from different perspectives.Second, the standard visual vocabulary codebooks of the three directions for different behaviors could be obtained by directly constructing a visual vocabulary. Moreover, the standard visual vocabulary codebooks of the three directions for each behavior serve as bases to calculate the corresponding vocabulary distributions of the test video separately. The behavior of the test video might be recognized by using the weighted similarity measure between the standard vocabulary distribution of each behavior and the vocabulary distribution of the test video. Result The performance was investigated in the KTH and Weizmann action datasets. We obtained an average recognition rate of 96.04% accuracy in the Weizmann action dataset and 96.93% accuracy in the KTH action dataset. Conclusion Our method could generate a comprehensive and effective representation of action videos. Furthermore, this approach can reduce clustering time by producing the codebooks of each direction. Experimental results show that the proposed method significantly improves action recognition performance and is superior to all available identification methods.
Keywords

订阅号|日报