复合时空特征的双模态情感识别

王晓华; 侯登永; 胡敏; 任福继

doi:10.11834/jig.20170105

图像分析和识别 | 浏览量 : 0 下载量: 1807 CSCD: 3

PDF
导出
分享
收藏
专辑

复合时空特征的双模态情感识别
Dual-modality emotion recognition based on composite spatio-temporal features
2017年22卷第1期页码：39-48
网络出版：2016-12-29，

纸质出版：2017
DOI： 10.11834/jig.20170105
稿件说明：

移动端阅览

王晓华, 侯登永, 胡敏, 任福继. 复合时空特征的双模态情感识别[J]. 中国图象图形学报, 2017,22(1):39-48. DOI： 10.11834/jig.20170105.

Wang Xiaohua, Hou Dengyong, Hu Min, Ren Fuji. Dual-modality emotion recognition based on composite spatio-temporal features[J]. Journal of Image and Graphics, 2017, 22(1): 39-48. DOI： 10.11834/jig.20170105.

摘要

针对体积局部二值模式应用到视频帧特征提取上，特征维数大，对光照及噪声鲁棒性差等问题，提出一种新的特征描述算法—时空局部三值模式矩（TSLTPM）。考虑到TSLTPM描述的仅是纹理特征，本文进一步融合3维梯度方向直方图（3DHOG）特征来增强对情感视频的描述。首先对情感视频进行预处理获得表情和姿态序列；然后对表情和姿态序列分别提取TSLTPM和3DHOG特征，计算测试序列与已标记的情感训练集特征间的最小欧氏距离，并将其作为独立证据来构造基本概率分配；最后使用D-S证据联合规则得到情感识别结果。在FABO数据库上进行实验，表情和姿态单模态分别取得83.06%和94.78%的平均识别率，在表情上分别比VLBP（体积局部二值模式）、LBP-TOP（三正交平面局部二值模式）、TSLTPM、3DHOG高9.27%、12.89%、1.87%、1.13%；在姿态上分别比VLBP、LBP-TOP、TSLTPM、3DHOG高24.61%、27.55%、1.18%、0.98%。将两种模态进行融合以后平均识别率达到96.86%，说明了融合表情和姿态进行情感识别的有效性。本文提出的TSLTPM特征将VLBP扩展成时空三值模式，能够有效降低维数，减少光照和噪声对识别的影响，与3DHOG特征形成复合时空特征有效增强了情感视频的分类性能，与典型特征提取算法的对比实验也表明了本文算法的有效性。另外，与其他方法的对比实验也验证了本文融合方法的优越性。

Abstract

In view of existing algorithms

volume local binary pattern is applied to the feature extraction of video frames. However

problems such as large feature dimension

weak robustness to illumination

and noise exist. This study proposes a new feature description algorithm

which is temporal-spatial local ternary pattern moment. This algorithm introduces three value patterns

and it is extended to the temporal-spatial series to describe the variety of pixel values among adjacent frames. The value of texture feature is represented by the energy values of the three value model matrixes

which are calculated according to the gray-level co-occurrence matrix. Considering that the temporal-spatial local ternary pattern moment only describes the texture feature

it lacks the expression of image edge and direction information. Therefore

it cannot fully describe the characteristics of emotional videos. The feature of 3D histograms of oriented gradients is further fused to enhance the description of the emotion feature. Composite spatio-temporal features are obtained by combining two different features. First

the emotional videos are preprocessed

and five frame images are obtained by K mean clustering

which are used as the expression and body posture emotion sequences. Second

TSLTPM and 3DHOG features are extracted from the expression and gesture emotion sequences

and the minimum Euclidean distance of the feature between the test sequence and labeled emotion training set is calculated. The calculated value is used as independent evidence to construct the basic probability assignment function. Finally

according to the rules of D-S evidence theory

the expression recognition result is obtained by fused BPA. Experimental results on the bimodal expression and body posture emotion database show that complex spatio-temporal features exhibit good recognition performance. The average recognition rates of 83.06% and 94.78% are obtained in the single model identification of facial expressions and gestures

respectively

compared with other algorithms. The average recognition rate of the single-expression model is 9.27%

12.89%

1.87%

and 1.13% higher than those of VLBP

LBP-TOP

TSLTPM

and 3DHOG

respectively. The average recognition rate of the single-gesture model is 24.61%

27.55%

1.18%

and 0.98% higher than those of VLBP

LBP-TOP

TSLTPM

and 3DHOG

respectively. The average recognition rate after the fusion of these two models is 96.86%

which is higher than the rate obtained by a single model. This result confirms the effectiveness of emotion recognition under the fusion of expression and gesture. The TSLTPM feature proposed in our paper extends the VLBP

which is effective in describing the local features of video images

into the temporal–spatial local ternary pattern. The proposed feature has low dimensionality

and it can enhance the robustness to illumination and noise. The composite spatio-temporal features fused with 3DHOG and TSLTPM can fully describe the effective information of emotional videos

and it enhances the classification performance of such videos. The effectiveness of the proposed algorithm in comparison with other typical feature extraction algorithms is also demonstrated. The proposed algorithm is proven suitable for identifying the emotion of static background videos

and the superiority of the fusion method in this study is verified.

关键词

Keywords

references

文章被引用时，请邮件提醒。

提交

融合表情和BVP生理信号的双模态视频情感识别