小数据样本深度迁移网络自发表情分类
Classification of small spontaneous expression database based on deep transfer learning network
- 2019年24卷第5期 页码:753-761
收稿:2018-07-23,
修回:2018-11-24,
纸质出版:2019-05-16
DOI: 10.11834/jig.180462
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-07-23,
修回:2018-11-24,
纸质出版:2019-05-16
移动端阅览
目的
2
相较于传统表情,自发表情更能揭示一个人的真实情感,在国家安防、医疗等领域有巨大的应用潜力。由于自发表情具有诱导困难、样本难以采集等特殊性,因此数据样本较少。为判别自发表情的种类,结合在越来越多的场景得到广泛应用的神经网络学习方法,提出基于深度迁移网络的表情种类判别方法。
方法
2
为保留原始自发表情图片的特征,即使在小数据样本上也不使用数据增强技术,并将光流特征3维图像作为对比样本。将样本置入不同的迁移网络模型中进行训练,然后将经过训练的同结构的网络组合成同构网络并输出结果,从而实现自发表情种类的判别。
结果
2
实验结果表明本文方法在不同数据库上均表现出优异的自发表情分类判别特性。在开放的自发表情数据库CASME、CASMEⅡ和CAS(ME)
2
上的测试平均准确率分别达到了94.3%、97.3%和97.2%,比目前最好测试结果高7%。
结论
2
本文将迁移学习方法应用于自发表情种类的判别,并对不同网络模型以及不同种类的样本进行比较,取得了目前最优的自发表情种类判别的平均准确率。
Objective
2
Expression is important in human-computer interaction. As a special expression
spontaneous expression features shorter duration and weaker intensity in comparison with traditional expressions. Spontaneous expressions can reveal a person's true emotions and present immense potential in detection
anti-detection
and medical diagnosis. Therefore
identifying the categories of spontaneous expression can make human-computer interaction smooth and fundamentally change the relationship between people and computers. Given that spontaneous expressions are difficult to be induced and collected
the scale of a spontaneous expression dataset is relatively small for training a new deep neural network. Only ten thousand spontaneous samples are present in each database. The convolutional neural network shows excellent performance and is thus widely used in a large number of scenes. For instance
the approach is better than the traditional feature extraction method in the aspect of improving the accuracy of discriminating the categories of spontaneous expression.
Method
2
This study proposes a method on the basis of different deep transfer network models for discriminating the categories of spontaneous expression. To preserve the characteristics of the original spontaneous expression
we do not use the technique of data enhancement to reduce the risk of convergence. At the same time
training samples
which comprise three-dimensional images that are composed of optical flow and grayscale images
are compared with the original RGB images. The three-dimensional image contains spatial information and temporal displacement information. In this study
we compare three network models with different samples. The first model is based on Alexnet that only changes the number of output layer neurons that is equal to the number of categories of spontaneous expression. Then
the network is fine-tuned to obtain the best training and testing results by fixing the parameters of different layers several times. The second model is based on InceptionV3. Two fully connected layers whose neuron numbers are equal to 512 and the number of spontaneous expression categories
respectively
are added to the output results. Thus
we only need to fine-tune the parameters of the two layers. Network depth increases with a reduction of the number of parameters due to the 3×3 convolution kernel replacing the 7×7 convolution kernel. The third model is based on Inception-ResNet-v2. Similar to the first model
we only change the number of output layer neurons. Finally
the isomorphic network model is proposed to identify the categories of spontaneous expression. The model is composed of two transfer learning networks of the same type that are trained by different samples and then takes the maximum as the final output. The isomorphic network makes decisions with high accuracy because the same output of the isomorphic network is infinitely close to the standard answer. From the perspective of probability
we take the maximum of different outputs as a prediction value.
Result
2
Experimental results indicate that the proposed method exhibits excellent classification performance on different samples. The single network output clearly shows that the features extracted from RGB images are as effective as the features extracted from the three-dimensional images of optical flow. This result indicates that spatiotemporal features extracted by the optical flow method can be replaced by features that are extracted from the deep neural network. Simultaneously
the method shows that at a certain degree
features extracted from the neural network can replace the lost information and features
such as the temporal features of RGB images or color features of OF+ images. The high average accuracy of a single network indicates that it has good testing performance on each dataset. Networks with high complexity perform well because the samples of spontaneous expression can train the deep transfer learning network effectively. The proposed models achieve state-of-the-art performance and an average accuracy of over 96%. After analyzing the result of the isomorphic network model
we know that its expression is not better than that of a single network in some cases because a single network has a high confidence degree in discriminating the categories of spontaneous expression and thus
the isomorphic network cannot easily improve the average accuracy. The Titan Xp used for this research was donated by the NVIDIA Corporation.
Conclusion
2
Compared with traditional expression
spontaneous expression is able to change subtly and extract features in a difficult manner. In the study
different transfer learning networks are applied to discriminate the categories of spontaneous expression. Concurrently
the testing accuracies of different networks
which are trained by different kinds of samples
are compared. Experimental results show that in contrast to traditional methods
deep learning has obvious advantages in spontaneous expression feature extraction. The findings also prove that deep network can extract complete features from spontaneous expression and that it is robust on different databases because of its good testing results. In the future
we will extract spontaneous expressions directly from videos and identify the categories of spontaneous expression with high accuracy by removing distracting occurrences
such as blinking.
Gavrilescu M. Proposed architecture of a fully integrated modular neural network-based automatic facial emotion recognition system based on facial action coding system[C ] //Proceedings of the 10th International Conference on Communications. Bucharest, Romania: IEEE, 2014: 1-6.[ DOI: 10.1109/ICComm.2014.6866754 http://dx.doi.org/10.1109/ICComm.2014.6866754 ]
Petrantonakis P C, Hadjileontiadis L J. An emotion elicitation metric for the valence/arousal and six basic emotions affective models: a comparative study[C ] //Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine. Corfu, Greece: IEEE, 2010: 1-4.[ DOI: 10.1109/ITAB.2010.5687675 http://dx.doi.org/10.1109/ITAB.2010.5687675 ]
Xue Y L, Mao X, Guo Y, et al. The research advance of facial expression recognition in human computer interaction[J]. Journal of Image and Graphics, 2009, 14(5):764-772.
薛雨丽, 毛峡, 郭叶, 等.人机交互中的人脸表情识别研究进展[J].中国图象图形学报, 2009, 14(5):764-772. [DOI:10.11834/jig.20090503]
Michael N, Dilsizian M, Metaxas D, et al. Motion profiles for deception detection using visual cues[C ] //Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer, 2010: 462-475.[ DOI: 10.1007/978-3-642-15567-3_34 http://dx.doi.org/10.1007/978-3-642-15567-3_34 ]
Ekman P. Emotions revealed:recognizing faces and feelings to improve communication and emotional life[M]. Broché:Holt McDougal, 2007.
Yan W J, Wu Q, Liu Y J, et al. CASME database: a dataset of spontaneous micro-expressions collected from neutralized faces[C ] //Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Shanghai, China: IEEE, 2013: 1-7.[ DOI: 10.1109/FG.2013.6553799 http://dx.doi.org/10.1109/FG.2013.6553799 ]
Mohammadi M R, Fatemizadeh E, Mahoor M H. Intensity estimation of spontaneousfacial action units based on their sparsity properties[J]. IEEE Transactions on Cybernetics, 2016, 46(3):817-826.[DOI:10.1109/TCYB.2015.2416317]
Li Z C, Tang J H. Unsupervised feature selection via nonnegative spectral analysis and redundancy control[J]. IEEE Transactions on Image Processing, 2015, 24(12):5343-5355.[DOI:10.1109/TIP.2015.2479560]
Li Z C, Tang J H, He X F. Robust structured nonnegative matrix factorization for image representation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(5):1947-1960.[DOI:10.1109/TNNLS.2017.2691725]
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C ] //Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.[ DOI: 10.1145/3065386 http://dx.doi.org/10.1145/3065386 ]
Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C ] //Proceedings of 2016 Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2818-2826.[ DOI: 10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ]
Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Proceedings of 2017 Association for the Advance of Artificial Intelligence. San Francisco, California, USA: AAAI, 2017: 4-12.
PengX L, Li L, Feng X Y, et al. Spontaneous facial expression recognition by heterogeneous convolutional networks[C ] //Proceedings of 2017 International Conference on the Frontiers and Advances in Data Science. Xi'an, China: IEEE, 2017: 70-73.[ DOI: 10.1109/FADS.2017.8253196 http://dx.doi.org/10.1109/FADS.2017.8253196 ]
Takalkar M A, Xu M. Image based facial micro-expression recognition using deep learning on small datasets[C ] //Proceedings of 2017 International Conference on Digital Image Computing: Techniques and Applications. Sydney, NSW, Australia: IEEE, 2017.[ DOI: 10.1109/DICTA.2017.8227443 http://dx.doi.org/10.1109/DICTA.2017.8227443 ]
Keshari R, Vatsa M, Singh R, et al. Learning Structure and strength of CNN filters for small sample size training[C ] //Proceedings of 2018 Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 9349-9358.[ DOI: 10.1109/CVPR.2018.00974 http://dx.doi.org/10.1109/CVPR.2018.00974 ]
Zhang S F, Zhang W S, Ding H, et al. Background modeling and object detecting based on optical flow velocity field[J]. Journal of Image and Graphics, 2011, 16(2):236-243.
张水发, 张文生, 丁欢, 等.融合光流速度与背景建模的目标检测方法[J].中国图象图形学报, 2011, 16(2):236-243. [DOI:10.11834/jig.20110220]
Liong S T, See J, Wong K S, et al. Automatic apex frame spotting in micro-expression database[C ] //Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur, Malaysia: IEEE, 2015: 665-669.[ DOI: 10.1109/ACPR.2015.7486586 http://dx.doi.org/10.1109/ACPR.2015.7486586 ]
Peng X L, Xia Z Q, Li L, et al. Towards facial expression recognition in the wild: a new database and deep recognition system[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, NV, USA: IEEE, 2016: 1544-1550.[ DOI: 10.1109/CVPRW.2016.192 http://dx.doi.org/10.1109/CVPRW.2016.192 ]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
相关作者
相关机构
京公网安备11010802024621