Current Issue Cover
构建并行卷积神经网络的表情识别算法

徐琳琳, 张树美, 赵俊莉(青岛大学数据科学与软件工程学院, 青岛 266071)

摘 要
目的 表情识别在商业、安全、医学等领域有着广泛的应用前景,能够快速准确地识别出面部表情对其研究与应用具有重要意义。传统的机器学习方法需要手工提取特征且准确率难以保证。近年来,卷积神经网络因其良好的自学习和泛化能力得到广泛应用,但还存在表情特征提取困难、网络训练时间过长等问题,针对以上问题,提出一种基于并行卷积神经网络的表情识别方法。方法 首先对面部表情图像进行人脸定位、灰度统一以及角度调整等预处理,去除了复杂的背景、光照、角度等影响,得到了精确的人脸部分。然后针对表情图像设计一个具有两个并行卷积池化单元的卷积神经网络,可以提取细微的表情部分。该并行结构具有3个不同的通道,分别提取不同的图像特征并进行融合,最后送入SoftMax层进行分类。结果 实验使用提出的并行卷积神经网络在CK+、FER2013两个表情数据集上进行了10倍交叉验证,最终的结果取10次验证的平均值,在CK+及FER2013上取得了94.03%与65.6%的准确率。迭代一次的时间分别为0.185 s和0.101 s。结论 为卷积神经网络的设计提供了一种新思路,可以在控制深度的同时扩展广度,提取更多的表情特征。实验结果表明,针对数量、分辨率、大小等差异较大的表情数据集,该网络模型均能够获得较高的识别率并缩短训练时间。
关键词
Expression recognition algorithm for parallel convolutional neural networks

Xu Linlin, Zhang Shumei, Zhao Junli(College of Data Science and Software Engineering, Qingdao University, Qingdao 266071, China)

Abstract
Objective Face emotion recognition is widely applied in the fields of commercial, security, and medicine. Rapid and accurate identification of facial expressions are of great significance for their research and application. Several traditional machine learning methods, such as support vector machine (SVM), principal component analysis (PCA), and local binary pattern (LBP) are used to identify facial expressions. However, these traditional machine learning algorithms require manual feature extraction. In this process, some features are hidden or deliberately enlarged due to many human interventions, which affect accuracy. In recent years, convolutional neural networks (CNNs) have been used extensively in image recognition due to their good self-learning and generalization capabilities. However, several problems, such as difficulty in facial expression feature extraction and long training time of neural network, are still observed with neural network training. This study presents an expression recognition method based on parallel CNN to solve the aforementioned problems. Method First, a series of preprocessing operations is performed on facial expression images. For example, an original image is detected by using an AdaBoost cascade classifier to remove the complex background and obtain the face part. Then, a face image is compensated by illumination, a histogram equalization method is used to stretch the image nonlinearly, and the pixel value of the image is reallocated. Finally, affine transformation is used to achieve face alignment. The preceding preprocessing can remove complex background effects, compensate lighting, and adjust the angle to obtain more accurate face parts than that of the original image. Then, a CNN with two parallel convolution and pooling structures, which can extract subtle expressions, is designed for facial expression images. This parallel unit is the core unit of the CNN and comprises a convolutional layer, a pooling layer, and an activation function ReLu. This parallel structure has three different channels, in which each channel has different number of convolutions, pooling layers, and ReLu to extract different image features and fuse the extracted features. The second parallel processing unit can perform convolution and pooling on the extracted features by the first parallel processing unit and reduce the dimension of the image and shorten the training time of CNN. Finally, the previously merged features are sent to the SoftMax layer for expression classification. Result CK+ and FER2013 expression datasets that have undergone pre-processing and data enhancement are divided into 10 equal parts. Then, training and testing are performed on 10 parts, and the final accuracy is the average of the 10 results. Experimental results show that the accuracy increases and time decreases remarkably compared with traditional machine learning methods, such as SVM, PCA, and LBP or their combination and other classical CNNs, such as AlexNet and GoogLeNet. Finally, CK+ and FER2013 achieve 94.03% and 65.6% accuracy, and the iteration time reaches 0.185 s and 0.101 s, respectively. Conclusion This study presents a new parallel CNN structure that extracts the features of facial expressions by using three different convolutional and pooling structures. The three paths have different combinations of convolutional and pooling layers, and they can extract different image features. The different extracted features are combined and sent to the next layer for processing. This study provides a new concept for the design of CNNs, which can extend the breadth of CNN and control the depth. The proposed CNN can extract many expressions that are ignored or difficult to extract. CK+ and FER2013 expression datasets have large difference in quantity, size, and resolution. The experiments of CK+ and FER2013 show that the model can extract the precise and subtle features of facial expression images in a relatively short time under the premise of ensuring the recognition rate.
Keywords

订阅号|日报