结合改进卷积神经网络与通道加权的轻量级表情识别
A CNN-improved and channel-weighted lightweight human facial expression recognition method
- 2022年27卷第12期 页码:3491-3502
收稿:2021-09-22,
修回:2021-12-10,
录用:2021-12-17,
纸质出版:2022-12-16
DOI: 10.11834/jig.210945
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-09-22,
修回:2021-12-10,
录用:2021-12-17,
纸质出版:2022-12-16
移动端阅览
目的
2
表情是人机交互过程中重要的信息传递方式,因此表情识别具有重要的研究意义。针对目前表情识别方法存在背景干扰大、网络模型参数复杂、泛化性差等问题,本文提出了一种结合改进卷积神经网络(convolutional neural network,CNN)与通道加权的轻量级表情识别方法。
方法
2
首先,采用标准卷积和深度可分离卷积组合神经网络结构,再利用全局平均池化层作为输出层,简化网络的复杂程度,有效降低网络参数;其次,网络引入SE(squeeze-and-excitation)模块进行通道加权,通过在不同卷积层后设置不同的压缩率增强表情特征提取能力,提升网络模型精度;最后,用softmax分类函数实现各类表情的准确分类。
结果
2
本文网络参数量为6 108 519,相较于识别性能较好的Xception神经网络参数减少了63%,并且通过对网络模型的实时性测试,平均识别速度可达128帧/s。在5个公开的表情数据集上验证网络模型对7种表情的识别效果,与7种卷积神经网络方法相比,在FER2013 (Facial Expression Recognition 2013)、CK+ (the extended Cohn-Kanade)和JAFFE (Japanses Female Facial Expression) 3个表情数据集的识别精确度提高了5.72%、0.51%和0.28%,在RAF-DB (Real-world Affective Faces Database)、AffectNet这两个in-the-wild表情数据库的识别精确度分别提高了2.04%和0.68%。
结论
2
本文提出的轻量级表情识别方法在不同通道具有不同的加权能力,获取更多表情关键特征信息,提高了模型的泛化性。实验结果表明,本文方法在简化网络的复杂程度、减少计算量的同时能够准确识别人脸表情,能够有效提升网络的识别能力。
Objective
2
Human facial expression can be as a human emotion style and information transmission carrier in the process of human-robot interaction. Thanks to the artificial intelligence (AI) development
facial expression recognition (FER) has been developing in the context of emotion understanding
human-robot interaction
safe driving
medical treatment
and communications. However
current facial expression recognition studies have been challenging of some problems like large background interference
complex network model parameters
and poor generalization. We develop a lightweight facial expression recognition method based on improved convolutional neural network (CNN-improved) and channel-weighted in order to improve its recognition and classification and the key feature information mining of facial expressions.
Method
2
Human facial expression recognition network is focused on facial-related image gathering
image preprocessing
feature extraction
and expression-related classification and recognition
amongst feature extraction is as the key step of the network structure. Our demonstration is illustrated as following: 1) different collections of expression-related datasets are obtained for indoor and outdoor scenarios. 2) Data-enhanced method is used to pre-process the expression-related image through avoiding the distorted background information and resolving the problems of over-fitting and poor robustness related to deep learning algorithms. 3) The lightweight expression network is designed and trained in terms of the enhanced depth-segmented convolutional channel feature. To reduce the network parameters effectively
deep-segmented convolution and global average pooling layer are deployed. The squeeze-and-excitation(SE) module is also embedded to optimize the model. Multi-channels-related compression rates are set to extract facial expression features more efficiently and thus the recognition ability of the network is improved. Our main contributions are clarified as mentioned below: 1) data preprocessing module: it is mainly based on data enhancement operations
such as image size normalization
random rotation and cropping
and random noise-added. The interference information is removed and the generalization of the model is improved. 2) Network model: a convolutional neural network (CNN) is adopted and an enhanced depth-segmented convolution channel feature module (also called basic block) for channel weighting is designed. The space and channel information in the local receptive field are extended by setting different compression rates originated from different convolution layers. 3) Verification: facial expression recognition method is performed on a number of popular public datasets and achieved high recognition accuracy.
Result
2
The best compression ratio combinations of SE modules are sorted out through experiments and embedded into the constructed lightweight network
and experimental evaluation is carried out on five commonly-used expression datasets. It shows that our recognition accuracy of the three indoor-related expression datasets of FER2013(Facial Expression Recognition 2013)
CK+(the extended Cohn-Kanade) and JAFFE(Japanses Female Facial Expression) are 79.73%
99.32%
and 98.48%
which are improved 5.72%
0.51% and 0.28%. The two outdoor expression datasets of RAF-DB(Real-world Affective Faces Database) and AffectNet are obtained recognition accuracy of 86.14% and 61.78%
which are improved 2.01% and 0.67%. In contrast to the Xception neural network
a lightweight network is facilitated while the parameters are reduced by 63%. The average recognition speed can reach 128frame/s
which meets the real-time requirements.
Conclusion
2
Our lightweight expression recognition method has different weighting capabilities in different channels. The key expression information can be obtained. The generalization of this model is enhanced. To improve the recognition ability of the network effectively
our method can recognize facial expressions accurately based on network simplification and calculation cost optimization.
Ben X Y, Ren Y, Zhang J P, Wang S J, Kpalma K, Meng W X and Liu Y J. 2021. Video-based facial micro-expression analysis: a survey of datasets, features and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(1): 1-20[DOI: 10.1109/TPAMI.2021.3067464]
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 1800-1807[DOI: 10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Ding H, Zhou S K and Chellappa R. 2017. FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition//Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition. Washington, USA: IEEE: 118-126[DOI: 10.1109/FG.2017.23 http://dx.doi.org/10.1109/FG.2017.23 ]
Du J, Chen Y H, Zhang L and Mai Y C. 2018. Energy-efficient facial expression recognition based on improved deep residual networks. Computer Science, 45(9): 303-307, 319
杜进, 陈云华, 张灵, 麦应潮. 2018. 基于改进深度残差网络的低功耗表情识别. 计算机科学, 45(9): 303-307, 319)[DOI: 10.11896/j.issn.1002-137X.2018.09.051]
Fernandez P D M, Peña F A G, Ren T I and Cunha A. 2019. FERAtt: facial expression recognition with attention net//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Long Beach, USA: IEEE: 1-9[DOI: 10.1109/CVPRW.2019.00112 http://dx.doi.org/10.1109/CVPRW.2019.00112 ]
Goodfellow I J, Erhan D, Carrier P L, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y C, Thaler D, Lee D H, Zhou Y B, Ramaiah C, Feng F X, Li R F, Wang X J, Athanasakis D, Shawe-Taylor J, Milakov M, Park J, Ionescu R, Popescu M, Grozea C, Bergstra J, Xie J J, Romaszko L, Xu B, Chuang Z and Bengio Y. 2013. Challenges in representation learning: a report on three machine learning contests//Proceedings of the 20th International Conference on Neural Information Processing. Daegu, Korea(South): Springer: 117-124[DOI: 10.1007/978-3-642-42051-1_16 http://dx.doi.org/10.1007/978-3-642-42051-1_16 ]
Hinton G E, Osindero S and Teh Y W. 2006. A fast learning algorithm for deep belief nets. Neural Computation, 18(7): 1527-1554[DOI: 10.1162/neco.2006.18.7.1527]
Howard A, Sandler M, Chen B, Wang W J, Chen L C, Tan M X, Chu G, Vasudevan V, Zhu Y K, Pang R M, Adam H and Le Q. 2019. Searching for MobileNetV3//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 1314-1324[DOI: 10.1109/ICCV.2019.00140 http://dx.doi.org/10.1109/ICCV.2019.00140 ]
Hu J, Shen L, Albanie S, Sun G and Albanie S. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023[DOI: 10.1109/TPAMI.2019.2913372]
Huang B B and Ying Z L. 2015. Sparse autoencoder for facial expression recognition//Proceedings of the 12th IEEE Intl Conf on Ubiquitous Intelligence and Computing and the 12th IEEE Intl Conf on Autonomic and Trusted Computing and the 15th IEEE Intl Conf on Scalable Computing and Communications and Its Associated Workshops. Beijing, China: IEEE: 1529-1532[DOI: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.274 http://dx.doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.274 ]
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI: 10.1145/3065386]
Lecun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI: 10.1109/5.726791]
Li S and Deng W H. 2019. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1): 356-370[DOI: 10.1109/TIP.2018.2868382]
Lin M, Chen Q and Yan S C. 2013. Network in network[EB/OL]. [2020-05-20] . https://arxiv.org/pdf/1312.4400.pdf https://arxiv.org/pdf/1312.4400.pdf
Liu P, Han S Z, Meng Z B and Tong Y. 2014. Facial expression recognition via a boosted deep belief network//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1805-1812[DOI: 10.1109/CVPR.2014.233 http://dx.doi.org/10.1109/CVPR.2014.233 ]
Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z and Matthews I. 2010. The Extended Cohn-Kanade Dataset (CK+): a complete dataset for action unit and emotion-specified expression//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco, USA: IEEE: 94-101[DOI: 10.1109/CVPRW.2010.5543262 http://dx.doi.org/10.1109/CVPRW.2010.5543262 ]
Lyu H, Tong Q Q and Yuan Z Y. 2020. Real time architecture for facial expression recognition in complex scenes based on face region segmentation. Computer Engineering and Applications, 56(12): 134-140
吕诲, 童倩倩, 袁志勇. 2020. 基于人脸分割的复杂环境下表情识别实时框架. 计算机工程与应用, 56(12): 134-140)[DOI: 10.3778/j.issn.1002-8331.1903-0416]
Lyons M, Akamatsu S, Kamachi M and Gyoba J. 1998. Coding facial expressions with Gabor wavelets//Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. Nara, Japan: IEEE: 200-205[DOI: 10.1109/AFGR.1998.670949 http://dx.doi.org/10.1109/AFGR.1998.670949 ]
Ma H, Celik T and Li H C. 2021. Lightweight attention convolutional neural network through network slimming for robust facial expression recognition. Signal, Image and Video Processing, 15(7): 1507-1515[DOI: 10.1007/s11760-021-01883-9]
Mollahosseini A, Hasani B and Mahoor M H. 2019. AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1): 18-31[DOI: 10.1109/TAFFC.2017.2740923]
Pantic M and Rothkrantz L J M. 2000. Automatic analysis of facial expressions: the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12): 1424-1445[DOI: 10.1109/34.895976]
Ren F J, Yu M L, Hu M and Li Y Q. 2018. Dual-modality video emotion recognition based on facial expression and BVP physiological signal. Journal of Image and Graphics, 23(5): 688-697
任福继, 于曼丽, 胡敏, 李艳秋. 2018. 融合表情和BVP生理信号的双模态视频情感识别. 中国图象图形学报, 23(5): 688-697)[DOI: 10.11834/jig.170401]
Sun N, Chen Z and Day R. 2016. Facial expression recognition using digitalised facial features based on active shape model//Proceedings of the 6th International Conference on Computer Science, Engineering and Applications. Dubai, UAE: ICEA: 39-46[DOI: 10.5121/csit.2016.61104 http://dx.doi.org/10.5121/csit.2016.61104 ]
Sun X and Ding X L. 2020. Data augmentation method based on generative adversarial networks for facial expression recognition sets. Computer Engineering and Applications, 56(4): 115-121
孙晓, 丁小龙. 2020. 基于生成对抗网络的人脸表情数据增强方法. 计算机工程与应用, 56(4): 115-121)[DOI: 10.3778/j.issn.1002-8331.1904-0309]
Szegedy C, Ioffe S, Vanhoucke V and Alemi A A. 2016. Inception-v4, Inception-ResNet and the impact of residual connections on learning//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4278-4284
Xu L L, Zhang S M and Zhao J L. 2019. Expression recognition algorithm for parallel convolutional neural networks. Journal of Image and Graphics, 24(2): 227-236
徐琳琳, 张树美, 赵俊莉. 2019. 构建并行卷积神经网络的表情识别算法. 中国图象图形学报, 24(2): 227-236)[DOI: 10.11834/jig.180346]
Xu X M, Quan C Q and Ren F J. 2015. Facial expression recognition based on Gabor wavelet transform and histogram of oriented gradients//Proceedings of 2015 IEEE International Conference on Mechatronics and Automation. Beijing, China: IEEE: 2117-2122[DOI: 10.1109/ICMA.2015.7237813 http://dx.doi.org/10.1109/ICMA.2015.7237813 ]
Yang S, Luo P, Loy C C and Tang X O. 2018. Faceness-Net: face detection through deep facial part responses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8): 1845-1859[DOI: 10.1109/TPAMI.2017.2738644]
Yao N M, Guo Q P, Qiao F C, Chen H and Wang H A. 2018. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 44(5): 865-877
姚乃明, 郭清沛, 乔逢春, 陈辉, 王宏安. 2018. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 44(5): 865-877)[DOI: 10.16383/j.aas.2018.c170477]
Yu Z D and Zhang C. 2015. Image based static facial expression recognition with multiple deep network learning//Proceedings of 2015 ACM on International Conference on Multimodal Interaction. Seattle, USA: ACM: 435-442[DOI: 10.1145/2818346.2830595 http://dx.doi.org/10.1145/2818346.2830595 ]
Zhao G Z, Yang H T and Yu M. 2020. Expression recognition method based on a lightweight convolutional neural network. IEEE Access, 8: 38528-38537[DOI:10.1109/ACCESS.2020.2964752]
相关作者
相关机构
京公网安备11010802024621