Yi Yun, Wang Hanli. Action recognition from unconstrained videos via salient and robust trajectory[J]. Journal of Image and Graphics, 2015, 20(2): 245-253. DOI: 10.11834/jig.20150211.
we have witnessed a great success of social networks and multimedia technologies
leading to the generation of vast amount of Internet videos. To organize these videos and to provide value-added services to users
human activities from videos should be automatically recognized. A number of research studies have focused on this challenging topic. Human action recognition is a significant research topic in computer vision. The recognition of human actions from unconstrained videos is difficult because of complex background and camera motion. A robust and salient trajectory-based approach is proposed to address such problem. Dense optical flow is utilized to track the scale invariant feature transform keypoints at multiple spatial scales.The histogram of oriented gradient
histogram of optical flow
and motion boundary histogram are employed to depict the trajectory efficiently. To eliminate the influence of camera motions
a camera motion estimation approach based on adaptive background segmentation is utilized to improve the robustness of trajectory. The Fisher vector model is utilized to compute one Fisher vector over a complete video for each descriptor separately
and the linear support vector machine is employed for classification. The average improvement of salient trajectory algorithm over dense trajectory algorithm is 1% on four challenging datasets. After utilizing the camera motion elimination approach
the average experimental result over salient trajectory is improved by 2%. The state-of-the-art results on four datasets ( Hollywood2
YouTube
Olympic Sports and UCF50)
the proposed algorithm obtains 65.8%
91.6%
93.6%
92.1%
and the state-of-the-art results have been improved by 1.5%
2.6%
2.5%
0.9% respectively. Experimental results on four challenging datasets demonstrate that the proposed algorithm can effectively recognize human actions from unconstrained videos in a more computationally efficient manner compared with a number of state-of-the-art approaches.