微表情峰值帧定位引导的微表情分类算法
李博凯1, 吴从中2, 项柏杨1, 臧怀娟1, 任永生3, 詹曙1(1.合肥工业大学;2.合肥工业大学计算机学院;3.昆明理工大学冶金与能源工程学院) 摘 要
目的 微表情是人在外界信息和刺激下做出的无意识面部动作,是判断受试人情绪和行为的重要佐证,在社会安全、商业谈判、心理辅导等领域都有着广泛的应用。微表情不同于一般的表情,分类与定位较为困难。针对这种情况,本文提出了一种基于光流窗口的双分支微表情定位网络(dual-branch optical flow spotting network,DFSN)和一种利用峰值帧光流信息的微表情分类网络,以识别视频中的微表情。方法 在定位任务中,首先提取面部图像,选择光流窗口大小和位置,计算面部光流并进行预处理;接下来输入双分支网络中进行两次分类,分别针对有无微表情和在有微表情前提下微表情所处阶段分类,并结合两个损失函数抑制过拟合;最后画出微表情强度曲线,曲线峰值所处位置即为所求微表情峰值帧。在分类任务中,选取视频起始帧和定位网络取得的峰值帧作为光流窗口,并利用欧拉运动放大算法(Eulerian Motion Magnification,EMM)放大微表情,最后采用峰值帧光流信息分类微表情视频。结果 微表情定位网络分别在CASME II数据集和CASME数据集上按照使用留一被试交叉验证法进行了实验并与目前最好定位的方法进行比较,此网络在CASME II上获得了最低的NAME值0.1017,比目前最好的定位方法提高了9%。在CASME上获得的NAME值为0.1378,在此数据集上为目前第二好的定位方法。在定位网络得到的峰值基础上,分类网络在CASME II上取得了89.79%的准确率,在CASME上取得了66.06%的准确率。若采用数据集标注的峰值,分类网络在CASME II上取得了91.83%的准确率,在CASME上取得了76.96%的准确率。结论 提出的微表情定位网络可以有效定位视频中微表情峰值帧的位置,帮助后续网络进行分类,微表情分类网络可以有效区分不同种类的微表情视频。
关键词
Apex frame spotting and recognition of micro-expression by optical flow
(1.Institute of Artificial Intelligence,Hefei Comprehensive National Science Center;2.National Engineering Research Center for Vacuum Metallurgy) Abstract
Objective Micro-expression is an unconscious facial action made by people under external information and stimulation. It is an important proof to judge people"s emotions and thoughts. It is widely used in the fields of social security, business negotiation, psychological counseling and so on. Micro-expression is different from the general macro expression. It has the characteristics of short duration, low expression intensity and fast change speed. Therefore, compared with macro expression, it is more difficult to recognize and locate. Before the emergence of deep learning, researchers mostly used the traditional hand-crafted method, using the artificially designed micro-expression extractors, complex parameter adjustment process and algorithm to extract features. Some excellent algorithms can achieve competitive results, such as Local Binary Pattern-Three Orthogonal Plane (LBP-TOP) and Mian Directional Mean Optical Flow (MDMO), but most of the time they can only extract shallow features, and the accuracy is difficult to improve. Soon, with the development of machine learning in the field of computer vision, the research method of micro-expression based on deep learning has become the mainstream. This method usually uses convolutional neural network to extract the features of image or video and classify them. Because of its powerful feature extraction and learning ability, the accuracy of micro-expression identification is greatly improved. However, due to the subtle characteristics of micro-expressions and the difficulty of extracting effective features, the spotting and classification of micro-expressions is still a difficult task. Therefore, this paper proposes a dual-branch optical flow spotting network (DFSN) based on optical flow window, which can promote the solution of these problems. Method Firstly, select the size of the optical flow window according to the number of frames of the video, and take 3 frames at both ends of the window to stabilize the optical flow intensity. Using dlib library to detect faces, and Farneback method is used to extracts facial optical flow features and preprocesses the optical flow image, and finally converts the image size to 224 * 224 pixels; Next, input the dual branch network for two classifications, respectively for the presence or absence of micro-expression and the rising or falling state of micro-expression. The twice classification should be judged according to the same characteristics, so the same network backbone is used, and then the branches are used to process the characteristics respectively, achieving the effect of focusing on different directions. Combining two loss functions can suppress the over fitting of the network, complete classification and improve the network performance. Finally, the micro-expression state in the video window is obtained by sliding the window, and the intensity curve is drawn. Because the duration of micro-expression is different, multiple windows are selected for positioning, and the highest point among them is taken as the peak frame. As for the classification network, it is different from the location network in two aspects. First, the front end of the window is the second frame to the fourth frame of the video, and the back end uses the micro-expression part of the video. The other is the use of Euler motion magnification to process video. This method can amplify facial motion and improve expression intensity, but it will destroy some optical flow features, so it is not used in the positioning network. When classifying video, take the peak frame of the positioning network as the center, select the surrounding five positions as the input of the classification network. The classification network uses the uncomplicated network structure and gets good results, which proves the importance of apex frame spotting. Result The micro-expression spotting network is based on leave-one-subject-out (LOSO) cross-validation method on the Chinese Academy of Sciences Micro-expression Database II(CASME II) and Chinese Academy of Sciences Micro-expression Database (CASME), which is the most commonly used validation method in the current micro-expression identification research. Compared with the current best spotting method, the lowest name value of 0.1017 is obtained on the CASME II, which is 10% lower than the current best spotting method. The name value obtained on the CASME is 0.1378, which is the second lowest number at present. Using this micro-expression spotting network, the classification network achieved 89.79% accuracy of three categories (positive, negative and surprise) in the micro-expression classification experiment of CASME II, and 66.67% accuracy of four categories (disgust, tense, repression and surprise) in the micro-expression classification experiment of CASME. Using the apex frame in dataset, the classification network achieved 91.83% accuracy on CASME II and 76.21% accuracy on CASME. Conclusion The proposed micro-expression spotting network can effectively locate the position of the apex frame in the video, and then extract the effective micro-expression information of the video. Through extensive experimental evaluation, it can be proved that the spotting network has good spotting effect. Through the subsequent classification network, we can see that the extraction of effective information of micro-expression, such as apex frame, can significantly help the network to classify micro-expression. In conclusion, the proposed micro-expression spotting network can significantly improve the accuracy of micro-expression recognition.
Keywords
micro-expression spotting affective computing apex frame micro-expression classification image recognition deep learning
|