持续学习改进的人脸表情识别
Facial expression recognition improved by continual learning
- 2020年25卷第11期 页码:2361-2369
收稿:2020-07-17,
修回:2020-9-7,
录用:2020-9-14,
纸质出版:2020-11-16
DOI: 10.11834/jig.200315
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-07-17,
修回:2020-9-7,
录用:2020-9-14,
纸质出版:2020-11-16
移动端阅览
目的
2
大量标注数据和深度学习方法极大地提升了图像识别性能。然而,表情识别的标注数据缺乏,训练出的深度模型极易过拟合,研究表明使用人脸识别的预训练网络可以缓解这一问题。但是预训练的人脸网络可能会保留大量身份信息,不利于表情识别。本文探究如何有效利用人脸识别的预训练网络来提升表情识别的性能。
方法
2
本文引入持续学习的思想,利用人脸识别和表情识别之间的联系来指导表情识别。方法指出网络中对人脸识别整体损失函数的下降贡献最大的参数与捕获人脸公共特征相关,对表情识别来说为重要参数,能够帮助感知面部特征。该方法由两个阶段组成:首先训练一个人脸识别网络,同时计算并记录网络中每个参数的重要性;然后利用预训练的模型进行表情识别的训练,同时通过限制重要参数的变化来保留模型对于面部特征的强大感知能力,另外非重要参数能够以较大的幅度变化,从而学习更多表情特有的信息。这种方法称之为参数重要性正则。
结果
2
该方法在RAF-DB(real-world affective faces database),CK+(the extended Cohn-Kanade database)和Oulu-CASIA这3个数据集上进行了实验评估。在主流数据集RAF-DB上,该方法达到了88.04%的精度,相比于直接用预训练网络微调的方法提升了1.83%。其他数据集的实验结果也表明了该方法的有效性。
结论
2
提出的参数重要性正则,通过利用人脸识别和表情识别之间的联系,充分发挥人脸识别预训练模型的作用,使得表情识别模型更加鲁棒。
Objective
2
Facial expression recognition (FER) has become an important research topic in the field of computer vision. FER plays an important role in human-computer interaction. Most studies focus on classifying basic discrete expressions (i.e.
anger
disgust
fear
happiness
sadness
and surprise) using static image-based approaches. Recognition performance in deep learning-based methods has progressed considerably. Deep neural networks
especially convolutional neural networks (CNNs)
achieve outstanding performance in image classification tasks. A large amount of labeled data is needed for training deep networks. However
insufficient samples in many widely used FER datasets lead to overfitting in the trained model. Fine-tuning a network that has been well pre-trained on a large face recognition dataset is commonly performed to solve the shortage of samples in FER datasets and prevent overfitting. The pre-trained network can capture facial information and the similarity between face recognition (FR) and FER domains facilitates the transfer of features. Although this transfer learning strategy demonstrates satisfactory performance
the fine-tuned FR network may still contain face-dominated information
which can weaken the network's ability to represent different expressions. On the one hand
we expect to reserve the strong ability of the FR network to capture important facial information
such as face contour
and guide the FER network training in real cases. On the other hand
we want the network to learn additional expression-specific information. The FER model training using a continual learning approach is proposed to utilize the close relationship between FR and FER effectively and exploit the ability of the pre-trained FR network.
Method
2
This study aims to train an expression recognition network with auxiliary significant information of face recognition network instead of only using a fine-tuning approach. We first introduce a continual learning approach into the field of FER. Continual learning analyzes the problem learning from an infinite stream of data with the objective of gradually extending the acquired knowledge and using it for future learning. Synaptic intelligence consolidates important parameters of previous tasks to solve the problem of catastrophic forgetting and alleviate the reduction in performance by preventing those important parameters from changing in future tasks. Similar to continual learning
we conduct the FR task before the FER task is added. However
we only focus on the performance of the later task while continual learning also aims to alleviate the catastrophic forgetting of the original task. Sequential tasks in continual learning commonly contain a small number of classes so that important parameters are related to current classes. However
important parameters are more likely to capture common facial features rather than specific classes due to the large amount of categories in the FR task
thereby remarkably increasing their contributions to the total loss. Hence
a two-stage training strategy is proposed in this study. We train a FR network and compute each parameter's importance while training in the first stage. We refine the pre-trained network with the supervision of expression label information while preventing important parameters from excessively changing in the second stage. The loss function for expression classification is composed of two parts
namely
softmax loss and parameter-wise importance regularization.
Result
2
We conduct experiments on three widely used FER datasets
including CK+(the extended Cohn-Kanade database)
Oulu-CASIA
and RAF-DB(real-word affective faces database). RAF-DB is an in-the-wild database while the two other databases are laboratory-controlled. The use of RAF-DB achieves an accuracy of 88.04%
which improves the performance of direct fine-tuning by 1.83% and surpasses the state-of-the-art algorithm self-cure network (SCN) by 1.01%. The result using CK+ improves the fine-tuning baseline by 1.1%. The experiment using Oulu-CASIA also indicated that the network has satisfactory generalization performance with the addition of parameter-wise importance regularization. Meanwhile
the effect of such regularization improves the performance on in-the-wild datasets more remarkblely due to the more complex faces under occlusion and pose variations.
Conclusion
2
We exploit the relationship between FR and FER and adopt the idea and algorithm of continual learning in FER to avoid overfitting in this study. The main purpose and effect of continual learning is to preserve the powerful feature extraction ability of the FR network via parameter-wise importance regularization and allow less-important parameters to learn additional expression-specific information. The experimental results showed that our training strategy helps the FER network to learn additional discriminative features and thus promote recognition performance.
Acharya D, Huang Z W, Paudel D P and Van Gool L. 2018. Covariance pooling for facial expression recognition//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City: IEEE: 367-374[ DOI: 10.1109/cvprw.2018.00077 http://dx.doi.org/10.1109/cvprw.2018.00077 ]
Cai J, Meng Z B, Khan A S, Li Z Y, O'Reilly J and Tong Y. 2018. Probabilistic attribute tree in convolutional neural networks for facial expression recognition[EB/OL].[2020-06-21] . https://arxiv.org/pdf/1812.07067.pdf https://arxiv.org/pdf/1812.07067.pdf
Chen S, Liu Y, Gao X and Han Z. 2018. MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices//Proceedings of the 13th Chinese Conference on Biometric Recognition. Cham: Springer: 428-438[ DOI: 10.1007/978-3-319-97909-0_46 http://dx.doi.org/10.1007/978-3-319-97909-0_46 ]
De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G and Tuytelaars T. 2019. A continual learning survey: defying forgetting in classification tasks[EB/OL].[2020-06-21] . https://arxiv.org/pdf/1909.08383.pdf https://arxiv.org/pdf/1909.08383.pdf
Deng J K, Guo J, Xue N N and Zafeiriou S. 2019. ArcFace: additive angular margin loss for deep face recognition//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE: 4690-4699[ DOI: 10.1109/cvpr.2019.00482 http://dx.doi.org/10.1109/cvpr.2019.00482 ]
Ding H, Zhou S K and Chellappa R. 2017. FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition//The 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017). Washington: IEEE: 118-126[ DOI: 10.1109/fg.2017.23 http://dx.doi.org/10.1109/fg.2017.23 ]
Fan Y R, Li V and Lam J C K. 2020. Facial expression recognition with deeply-supervised attention network. IEEE Transactions on Affective Computing: #9075283[ DOI: 10.1109/taffc.2020.2988264 http://dx.doi.org/10.1109/taffc.2020.2988264 ]
Gan Y L, Chen J Y and Xu L H. 2019. Facial expression recognition boosted by soft label with a diverse ensemble. Pattern Recognition Letters, 125:105-112[DOI:10.1016/j.patrec.2019.04.002]
Gera D and Balasubramanian S. 2020. Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition[EB/OL].[2020-06-21] . https://arxiv.org/pdf/2007.10298.pdf https://arxiv.org/pdf/2007.10298.pdf
Jung H, Lee S, Yim J, Park S and Kim J. 2015. Joint fine-tuning in deep neural networks for facial expression recognition//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE: 2983-2991[ DOI: 10.1109/iccv.2015.341 http://dx.doi.org/10.1109/iccv.2015.341 ]
LeCun Y, Cortes C and Burges C J C. 1998. The MNIST database of handwritten digits[EB/OL].[2020-06-21] . http://yann.lecun.com/exdb/mnist/ http://yann.lecun.com/exdb/mnist/
Levi G and Hassner T. 2015. Emotion recognition in the wild via convolutional neural networks and mapped binary patterns//Proceedings of 2015 ACM on International Conference on Multimodal Interaction. New York: ACM: 503-510[ DOI: 10.1145/2818346.2830587 http://dx.doi.org/10.1145/2818346.2830587 ]
Li S, Deng W H and Du J P. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 2852-2861[ DOI: 10.1109/cvpr.2017.277 http://dx.doi.org/10.1109/cvpr.2017.277 ]
Li Z Y, Han S Z, Khan A S, Cai J, Meng Z B, O'Reilly J and Tong Y. 2019. Pooling map adaptation in convolutional neural network for facial expression recognition//Proceedings of 2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai: IEEE: 1108-1113[ DOI: 10.1109/icme.2019.00194 http://dx.doi.org/10.1109/icme.2019.00194 ]
Liu X F, Kumar B V K V, Jia P and You J. 2019. Hard negative generation for identity-disentangled facial expression recognition. Pattern Recognition, 88:1-12[DOI:10.1016/j.patcog.2018.11.001]
Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z and Matthews I. 2010. The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco: IEEE: 94-101[ DOI: 10.1109/cvprw.2010.5543262 http://dx.doi.org/10.1109/cvprw.2010.5543262 ]
Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL] .[2020-08-21]. https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf
Ng H W, Nguyen V D, Vonikakis V and Winkler S. 2015. Deep learning for emotion recognition on small datasets using transfer learning//Proceedings of 2015 ACM on International Conference on Multimodal Interaction. New York: ACM: 443-449[ DOI: 10.1145/2818346.2830593 http://dx.doi.org/10.1145/2818346.2830593 ]
Parkhi O M, Vedaldi A and Zisserman A. 2015. Deep face recognition//Proceedings of the British Machine Vision Conference. Swansea, UK: BMVA Press: #6[ DOI: 10.5244/C.29.41 http://dx.doi.org/10.5244/C.29.41 ]
Sun N, Li Q, Huan R Z, Liu J X and Han G. 2019. Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recognition Letters, 119:49-61[DOI:10.1016/j.patrec.2017.10.022]
Wang K, Peng X J, Yang J F, Lu S J and Qiao Y. 2020. Suppressing uncertainties for large-scale facial expression recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 6897-6906[ DOI: 10.1109/cvpr42600.2020.00693 http://dx.doi.org/10.1109/cvpr42600.2020.00693 ]
Yang H Y, Ciftci U and Yin L J. 2018a. Facial expression recognition by de-expression residue learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 2168-2177[ DOI: 10.1109/cvpr.2018.00231 http://dx.doi.org/10.1109/cvpr.2018.00231 ]
Yang H Y, Zhang Z and Yin L J. 2018b. Identity-adaptive facial expression recognition through expression regeneration using conditional generative adversarial networks//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018). Xi'an: IEEE: 294-301[ DOI: 10.1109/FG.2018.00050 http://dx.doi.org/10.1109/FG.2018.00050 ]
Yi D, Lei Z, Liao S C and Li S Z. 2014. Learning face representation from scratch[EB/OL].[2020-06-21] . https://arxiv.org/pdf/1411.7923.pdf https://arxiv.org/pdf/1411.7923.pdf
Zeng J B, Shan S G and Chen X L. 2018. Facial expression recognition with inconsistently annotated datasets//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 222-237[ DOI: 10.1007/978-3-030-01261-8_14 http://dx.doi.org/10.1007/978-3-030-01261-8_14 ]
Zenke F, Poole B and Ganguli S. 2017. Continual learning through synaptic intelligence//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM: 3987-3995
Zhang K P, Zhang Z P, Li Z F and Qiao Y. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499-1503[DOI:10.1109/lsp.2016.2603342]
Zhao G Y, Huang X H, Taini M, Li S Z and PietikäInen M. 2011. Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9):607-619[DOI:10.1016/j.imavis.2011.07.002]
Zhao X Y, Liang X D, Liu L Q, Li T, Han Y G, Vasconcelos N and Yan S C. 2016. Peak-piloted deep network for facial expression recognition//Proceedings of the 14th European Conference on Computer Vision. Cham: Springer: 425-442[ DOI: 10.1007/978-3-319-46475-6_27 http://dx.doi.org/10.1007/978-3-319-46475-6_27 ]
相关作者
相关机构
京公网安备11010802024621