Wang Taiqing, Wang Shengjin. Human action recognition using mid-level spatial-temporal features[J]. Journal of Image and Graphics, 2015, 20(4): 520-526. DOI: 10.11834/jig.20150408.
Human action recognition is an important research topic in the field of computer vision;this method has promising potential applications. On the basis of the limitations of local and global spatial-temporal features
a novel and effective middle-level spatial-temporal feature is proposed for action recognition. The proposed feature encodes the structural distribution of local features in the neighborhood of the spatial-temporal interest point (STIP)
there by improving the discriminative power of STIP.This feature can model the flexible intra-action variations. Pointwise mutual information is introduced to measure the correlation between the mid-level feature and the action.The video clip is finally classified as the action category that has the greatest mutual information with the mid-level features. Experimental results validated the advantage of the proposed mid-level feature over the local-feature-based baseline methods and other published results. The mid-level feature achieved 96.3% and 98.0% recognition accuracies on the KTH and ADL(Activities of daily living) action datasets
respectively. The proposed mid-level spatial-temporal feature enhances the discriminative power of actions by harnessing the spatial-temporal distribution of local spatial-temporal features.Consequently
this feature is capable of recognizing realistic human actions.