多任务学习和对抗学习结合的自发与非自发表情识别
Posed and spontaneous expression distinction through multi-task and adversarial learning
- 2020年25卷第11期 页码:2370-2379
收稿:2020-06-05,
修回:2020-9-3,
录用:2020-9-10,
纸质出版:2020-11-16
DOI: 10.11834/jig.200264
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-06-05,
修回:2020-9-3,
录用:2020-9-10,
纸质出版:2020-11-16
移动端阅览
目的
2
如何提取与个体身份无关的面部特征以及建模面部行为的时空模式是自发与非自发表情识别的核心问题,然而现有的自发与非自发表情识别工作尚未同时兼顾两者。针对此,本文提出多任务学习和对抗学习结合的自发与非自发表情识别方法,通过多任务学习和对抗学习捕获面部行为的时空模式以及与学习身份无关的面部特征,实现有效的自发与非自发表情区分。
方法
2
所提方法包括4部分:特征提取器、多任务学习器、身份判别器以及多任务判别器。特征提取器用来获取与自发和非自发表情相关的特征;身份判别器用来监督特征提取器学习到的特征,与身份标签无关;多任务学习器预测表情高峰帧相对于初始帧之间的特征点偏移量以及表情类别,并试图迷惑多任务判别器;多任务判别器辨别输入是真实的还是预测的人脸特征点偏移量与表情类别。通过多任务学习器和多任务判别器之间的对抗学习,捕获面部行为的时空模式。通过特征提取器、多任务学习器和身份判别器的协同学习,学习与面部行为有关而与个体身份无关的面部特征。
结果
2
在MMI(M&M initiative)、NVIE(natural visible and infrared facial expression)和BioVid(biopotential and video)数据集上的实验结果表明本文方法可以学习出与个体身份相关性较低的特征,通过同时预测特征点偏移量和表情类别,有效捕获自发和非自发表情的时空模式,从而获得较好的自发与非自发表情识别效果。
结论
2
实验表明本文所提出的基于对抗学习的网络不仅可以有效学习个体无关但表情相关的面部中特征,而且还可以捕捉面部行为中的空间模式,而这些信息可以很好地改善自发与非自发表情识别。
Objective
2
Posed and spontaneous expression distinction is a major problem in the field of facial expression analysis. Posed expressions are deliberately performed to confuse or cheat others
while spontaneous expressions occur naturally. The difference between the posed and spontaneous delivery of the same expression by a person is little due to the subjective fraud of posed expressions. At the same time
posed and spontaneous expression distinction suffers from the problem of high intraclass differences caused by individual differences. These limitations bring difficulties in posed and spontaneous expression distinction. However
behavioral studies have shown that significant differences exist between posed and spontaneous expressions in spatial patterns. For example
compared with spontaneous smiles
the contraction of zygomatic muscles is more likely to be asymmetric in posed smiles. Moreover
constricted orbicularis oculi muscle is presented in spontaneous smiles but absent in posed smiles. Such inherent spatial patterns in posed and spontaneous expressions can be utilized to facilitate posed and spontaneous expression distinction. Therefore
modeling spatial patterns inherent in facial behavior and extracting subject-independent facial features are important for posed and spontaneous expression distinction. Previous works typically focused on spatial pattern modeling in the facial behavior. Researchers commonly use landmarks to describe motion patterns of facial muscles approximately and capture spatial patterns inherent in facial behavior based on landmark information due to the difficulty in obtaining motion patterns of facial muscles. According to the difference in modeling spatial patterns inherent in facial behavior
studies on posed and spontaneous expression distinction can be categorized into two approaches
namely
feature- and probabilistic graphical model (PGM)-based methods. Feature-based methods implicitly capture spatial patterns using handcrafted low-level or deep features extracted by deep convolution networks. However
handcrafted low-level features have difficulty in describing complex spatial patterns inherent in the facial behavior. PGM-based methods model the distribution among landmarks and explicitly capture spatial patterns existing in facial behavior using PGMs. However
PGMs frequently simplify reasoning and calculation of models through independence or energy distribution assumptions
which are sometimes inconsistent with the ground truth distribution. At the same time
PGM-based methods typically use handcrafted low-level features and thus face similar defects. An adversarial network for posed and spontaneous expression distinction is proposed to solve the problems.
Method
2
On the one hand
we use landmark displacements between onset and corresponding apex frames to describe motion patterns of facial muscles approximately and capture spatial patterns inherent in facial behavior explicitly by modeling the joint distribution between expressions and landmark displacements. On the other hand
we alleviate the problem of high intraclass differences by extracting subject-independent features. Specifically
the proposed adversarial network consists of a feature extractor
a multitask learner
a multitask discriminator
and a feature discriminator. The feature extractor attempts to extract facial features
which are discriminative for posed and spontaneous expression distinction and robust for subjects. The multitask learner is used to classify posed and spontaneous expressions as well as predict facial landmark displacement simultaneously. The multitask discriminator distinguishes the predicted expression and landmark displacement from ground truth ones. The feature discriminator is a subject classifier that can be used to measure the correlation and independence between extracted facial features and subject identities. The feature extractor is trained cooperatively with the multitask learner but in an adversarial way with the feature discriminator. Thus
the feature extractor can learn good facial features for expression distinction and landmark displacement regression but not for subject recognition. The multitask learner competes with the multitask discriminator. The distribution of predicted expression and landmark displacement converges to the distribution of ground truth labels through adversarial learning. Thus
spatial patterns can be thoroughly explored for posed and spontaneous expression distinction.
Result
2
Experimental results on three benchmark datasets
i.e. MMI(M&M Initiative)
NVIE(Natural visible and infrared facial expression)
and BioVid(Biopotential and Video) demonstrate that the proposed adversarial network can not only effectively learn subject-independent and expression-discriminative facial features
so as to improve the generalization ability of the model on unseen subjects
but also make full use of spatial and temporal patterns inherent in facial behaviors to improve the performance of posed and spontaneous expression distinction
leading to the superior performance compared to state of the arts.
Conclusion
2
Experiments demonstrate the effectiveness of the proposed method.
Dibeklioǧlu H, Salah A A and Gevers T. 2015. Recognition of genuine smiles. IEEE Transactions on Multimedia, 17(3):279-294[DOI:10.1109/tmm.2015.2394777]
Ekman P and Friesen W V. 1982. Felt, false, and miserable smiles. Journal of Nonverbal Behavior, 6(4):238-252[DOI:10.1007/bf00987191]
Ekman P, Hager J C and Friesen W V. 1981. The symmetry of emotional and deliberate facial actions. Psychophysiology, 18(2):101-106[DOI:10.1111/j.1469-8986.1981.tb02919.x]
Gan Q, Nie S Q, Wang S F and Ji Q. 2017. Differentiating between posed and spontaneous expressions with latent regression Bayesian network//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco: AAAI: 4039-4045
Goodfellow I. 2016. NIPS 2016 tutorial: generative adversarial networks[EB/OL].[2020-06-02] . https://arxiv.org/pdf/1701.00160.pdf https://arxiv.org/pdf/1701.00160.pdf
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Fartey D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Advances in Neural Information Processing Systems. MIT Press: 2672-2680
Hong Y J, Hwang U, Yoo J and Yoon S. 2019. How generative adversarial networks and their variants work:an overview. ACM Computing Surveys, 52(1):1-43[DOI:10.1145/3301282]
Maaten L and Hinton G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579-2605[DOI:0.1007/s10846-008-9235-4]
MacQueen J. 1967. Some methods for classification and analysis of multivariate observations.//The 5th Berkeley Symposium on Mathematical Statistics and Probability. 1(14): 281-297
Pantic M, Valstar M, Rademaker R and Maat L. 2005. Web-based database for facial expression analysis//Proceedings of 2005 IEEE International Conference on Multimedia and Expo. Amsterdam: IEEE: 317-321[ DOI: 10.1109/icme.2005.1521424 http://dx.doi.org/10.1109/icme.2005.1521424 ]
Tavakolian M, Cruces C G B and Hadid A. 2019. Learning to detect genuine versus posed pain from facial expressions using residual generative adversarial networks//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition. Lille: IEEE: 1-8[ DOI: 10.1109/fg.2019.8756540 http://dx.doi.org/10.1109/fg.2019.8756540 ]
Valstar M F and Pantic M. 2010. Induced disgust, happiness and surprise: an addition to the mmi facial expression database//Proceedings of International Conference on Language Resources and Evaluation, Workshop on EMOTION. Malta: [s.n.]: #65
Vinh N X, Epps J and Bailey J. 2010. Information theoretic measures for clusterings comparison:variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 11:2837-2854[DOI:10.1145/1553374.1553511]
Walter S, Gruss S, Ehleiter H, Tan J W, Traue H C, Werner P, AlHamadi A, Crawcour S, Andrade A O and da Silva G M. 2013. The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system//Proceedings of 2013 IEEE International Conference on Cybernetics. Lausanne: IEEE: 128-131[ DOI: 10.1109/cybconf.2013.6617456 http://dx.doi.org/10.1109/cybconf.2013.6617456 ]
Wang S F, Liu Z L, Lv S L, Lv Y P, Wu G B, Peng P, Chen F and Wang X F. 2010. A natural visible and infrared facial expression database for expression recognitionand emotion inference. IEEE Transactions on Multimedia, 12(7):682-691[DOI:10.1109/tmm.2010.2060716]
Wang S F, Wu C L and Ji Q. 2016. Capturing global spatial patterns for distinguishing posed and spontaneous expressions. Computer Vision and Image Understanding, 147:69-76[DOI:10.1016/j.cviu.2015.08.007]
Wang S F, Wu C L, He M H, Wang J and Ji Q. 2015. Posed and spontaneous expression recognition through modeling their spatial patterns. Machine Vision and Applications, 26(2/3):219-231[DOI:10.1007/s00138-015-0657-2]
Wu P P, Liu H, Zhang X W and Gao Y. 2017. Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics. Frontiers of Information Technology and Electronic Engineering, 18(7):955-967[DOI:10.1631/fitee.1600041]
Xu C, Qin T, Bar Y, Wang G and Liu T Y. 2017. Convolutional neural networks for posed and spontaneous expression recognition//Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China:IEEE: 769-774[ DOI: 10.1109/icme.2017.8019373 http://dx.doi.org/10.1109/icme.2017.8019373 ]
Yang J J and Wang S F. 2017. Capturing spatial and temporal patterns for distinguishing between posed and spontaneous expressions//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View: ACM: 469-477[ DOI: 10.1145/3123266.3123350 http://dx.doi.org/10.1145/3123266.3123350 ]
相关作者
相关机构
京公网安备11010802024621