微表情峰值帧定位引导的分类算法

李博凯; 吴从中; 项柏杨; 臧怀娟; 任永生; 詹曙

发布时间： 2024-05-20
摘要点击次数： 320
全文下载次数： 275
DOI: 10.11834/jig.230537
2024 | Volume 29 | Number 5

微表情峰值帧定位引导的分类算法

李博凯^1,2, 吴从中^1,2, 项柏杨^1,2, 臧怀娟^1,2, 任永生³, 詹曙^1,2(1.合肥综合性国家科学中心人工智能研究院, 合肥 230601;2.合肥工业大学计算机与信息学院, 合肥 230601;3.昆明理工大学冶金与能源工程学院, 昆明 650093)

摘要

目的微表情是人在外界信息和刺激下做出的无意识面部动作，是判断受试人情绪和行为的重要佐证，在社会安全、商业谈判和心理辅导等领域都有着广泛的应用。微表情不同于一般的表情，分类与定位较为困难。针对这种情况，提出了一种基于光流窗口的双分支微表情定位网络（dual-branch optical flow spotting network，DFSN）和一种利用峰值帧光流信息的微表情分类网络，以识别视频中的微表情。方法在定位任务中，首先提取面部图像，选择光流窗口大小和位置，计算面部光流并进行预处理；接下来输入双分支网络中进行两次分类，分别针对有无微表情和在有微表情前提下微表情所处阶段分类，并结合两个损失函数抑制过拟合；最后绘制出微表情强度曲线，曲线峰值所处位置即为所求微表情峰值帧。在分类任务中，选取视频起始帧和定位网络取得的峰值帧作为光流窗口，并利用欧拉运动放大算法（Eulerian motion magnification，EMM）放大微表情，最后采用峰值帧光流信息分类微表情视频。结果微表情定位网络分别在CASME II （Chinese Academy of Sciences Micro-expression Database II）数据集和CASME数据集上按照使用留一被试交叉验证法进行了实验，与目前最好的定位方法比较，此网络在CASME II上获得了最低的NMAE（normalized mean absolute error）值0.101 7，比Optical flow+UPC方法提高了9%。在CASME上获得的NMAE值为0.137 8，在此数据集上为次优定位方法。在定位网络得到的峰值基础上，分类网络在CASME II上取得了89.79%的准确率，在CASME上取得了66.06%的准确率。若采用数据集标注的峰值，分类网络在CASME II上取得了91.83%的准确率，在CASME上取得了76.96%的准确率。结论提出的微表情定位网络可以有效定位视频中微表情峰值帧的位置，帮助后续网络进行分类，微表情分类网络可以有效区分不同种类的微表情视频。

关键词

微表情定位情感计算峰值帧微表情分类图像识别深度学习

Apex frame spotting and recognition of micro-expression by optical flow

Li Bokai^1,2, Wu Congzhong^1,2, Xiang Baiyang^1,2, Zang Huaijuan^1,2, Ren Yongsheng³, Zhan Shu^1,2(1.Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230601, China;2.School of Computer and Information, Hefei University of Technology, Hefei 230601, China;3.School of Metallurgy and Energy Engineering, Kunming University of Science and Technology, Kunming 650093, China)

Abstract

Objective Micro-expressions are unconscious facial actions made by people under external information and stimulation. These expressions are crucial proofs to judge people’s emotions and thoughts. Micro-expressions are widely used in the fields of social security，business negotiation，and psychological counseling. This type of expression is different from the general macro-expression and demonstrates characteristics of short duration，low expression intensity，and fast change speed. Therefore，compared with macro-expressions，micro-expressions are more difficult to recognize and locate. Before the emergence of deep learning，researchers mostly used the traditional hand-crafted method，which utilizes the artificially designed micro-expression extractors and complex parameter adjustment processes and algorithms to extract features. Some excellent algorithms can achieve competitive results，such as local binary pattern-three orthogonal plane and main directional mean optical flow（MDMO）. However，these algorithms mostly only extract shallow features，and improving their accuracy is difficult. With the development of machine learning in the field of computer vision，the research method of micro-expression based on deep learning has immediately become the mainstream. This method generally uses convolutional neural network to extract and classify the image or video features. The accuracy of micro-expression identification is markedly improved due to its powerful feature extraction and learning capability. However，the spotting and classification of micro-expressions are still difficult tasks due to the subtle characteristics of micro-expressions and the difficulty of extracting effective features. Therefore，this paper proposes a dual-branch optical flow spotting network based on optical flow window，which can promote the solution of these problems. Method First，the size of the optical flow window is selected in accordance with the number of video frames，and three frames at both ends of the window are taken to stabilize the optical flow intensity. Dlib library is used to detect faces，and Farneback method is used to extract facial optical flow features and preprocess the optical flow image. The image size is finally converted into 224×224 pixels. The dual-branch network is then inputted for two classifications to address the presence or absence of micro-expression and the rising or falling state of micro-expression. The twice classification should be judged in accordance with the same characteristics. Therefore，the same network backbone is used，and then the branches are utilized to process the characteristics，thereby focusing on different directions. Combining two loss functions can suppress the overfitting of the network，complete classification，and improve the network performance. Finally，the micro-expression state in the video window is obtained by sliding the window，and the intensity curve is drawn. Multiple windows are selected for positioning due to the different durations of micro-expression，and the highest point among them is taken as the apex frame. The classification network is different from the location network in two aspects. First，the front end of the window is the second to the fourth frame of the video and the back end uses the micro-expression part of the video. Second，Euler motion magnification is used to process video. This method can amplify facial motion and improve expression intensity but will destroy some optical flow features；thus，the method is not used in the positioning network. When classifying videos，the apex frame of the positioning network is taken as the center，and the five surrounding positions are selected as the input of the classification network. The classification network uses the uncomplicated network structure and obtains good results，proving the importance of apex frame spotting. Result The micro-expression spotting network is based on leave-one-subject-out cross-validation method on the Chinese Academy of Sciences Micro-expression Database II（CASME II）and the Chinese Academy of Sciences Micro-expression Database（CASME），which is the most commonly used validation method in the current micro-expression identification research. Compared with the current best spotting method，the lowest normalized mean absolute error（NMAE）value of 0. 101 7 is obtained on the CASME II，which is 9% lower than the current best spotting method. The NMAE value obtained on the CASME is 0. 137 8，which is currently the second lowest number. Using this micro-expression spotting network，the classification network achieved 89. 79% accuracy of three categories （positive，negative，and surprise） in the microexpression classification experiment of CASME II and 66. 06% accuracy of four categories（disgust，tense，repression，and surprise）in the micro-expression classification experiment of CASME. Using the apex frame in dataset，the classification network achieved 91. 83% and 76. 96% accuracy on CASME II and CASME，respectively. Conclusion The proposed micro-expression spotting network can effectively locate the position of the apex frame in the video and then extract its effective micro-expression information. Extensive experimental evaluation proved that the spotting network has good spotting effect. The subsequent classification network shows that the extraction of effective micro-expression information such as an apex frame can significantly help the network in classifying micro-expressions. Overall，the proposed micro-expression spotting network can substantially improve the accuracy of micro-expression recognition.

Keywords

micro-expression spotting affective computing apex frame micro-expression classification image recognition deep learning