Current Issue Cover
联合多任务学习的人脸超分辨率重建

王欢1,2, 吴成东2, 迟剑宁2, 于晓升2, 胡倩1,2(1.东北大学信息科学与工程学院, 沈阳 110819;2.东北大学机器人科学与工程学院, 沈阳 110819)

摘 要
目的 人脸超分辨率重建是特定应用领域的超分辨率问题,为了充分利用面部先验知识,提出一种基于多任务联合学习的深度人脸超分辨率重建算法。方法 首先使用残差学习和对称式跨层连接网络提取低分辨率人脸的多层次特征,根据不同任务的学习难易程度设置损失权重和损失阈值,对网络进行多属性联合学习训练。然后使用感知损失函数衡量HR(high-resolution)图像与SR(super-resolution)图像在语义层面的差距,并论证感知损失在提高人脸语义信息重建效果方面的有效性。最后对人脸属性数据集进行增强,在此基础上进行联合多任务学习,以获得视觉感知效果更加真实的超分辨率结果。结果 使用峰值信噪比(PSNR)和结构相似度(SSIM)两个客观评价标准对实验结果进行评价,并与其他主流方法进行对比。实验结果显示,在人脸属性数据集(CelebA)上,在放大8倍时,与通用超分辨率MemNet(persistent memory network)算法和人脸超分辨率FSRNet(end-to-end learning face super-resolution network)算法相比,本文算法的PSNR分别提升约2.15 dB和1.2 dB。结论 实验数据与效果图表明本文算法可以更好地利用人脸先验知识,产生在视觉感知上更加真实和清晰的人脸边缘和纹理细节。
关键词
Face super-resolution reconstruction based on multitask joint learning

Wang Huan1,2, Wu Chengdong2, Chi Jianning2, Yu Xiaosheng2, Hu Qian1,2(1.College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;2.College of Robot Science and Engineering, Northeastern University, Shenyang 110819, China)

Abstract
Objective The image captured by the monitor will be affected by the ambiguity of the atmosphere and motion transformation of the target, resulting in the low resolution of the captured face image, which cannot be recognized by human or machine. Therefore, the clarity of face images must be urgently improved. The method of enhancing the resolution of face image by using super-resolution (SR) restoration technology has become an important means for solving this problem. Face SR reconstruction is the process of predicting high-resolution (HR) face images from one or more observed low-resolution (LR) face images, which is a typical pathological problem. As a domain-specific super-resolution task, we can use facial priori knowledge to improve the effect of super-resolution. We propose a deep end-to-end face SR reconstruction algorithm based on multitask joint learning method. Multitask learning algorithm is an inductive transfer mechanism, which can improve the generalization performance of backbone models by utilizing specific domain information hidden in training signals. Existing SR methods use different methods to fuse face priori information, which substantially improves the performance of face super-resolution algorithm. However, these networks generally use the method of the direct fusion of facial geometry information and image features to integrate the priori feature information, but they do not fully utilize the semantic information, such as facial landmark, gender, and facial expression. Moreover, at a large magnification, the priori features obtained by these methods are rough to reconstruct detailed facial edges and texture details. To solve this problem, we propose a face SR reconstruction algorithm based on multitask joint learning. MTFSR combines face SR with assistant tasks, such as facial feature point detection, gender classification, and facial expression recognition, by using multitask learning method to obtain the shared representation of facial features among related tasks, acquire rich facial prior knowledge, and optimize the performance of face SR algorithm. Method First, a face SR reconstruction algorithm based on multitask joint learning is proposed. The model uses residual learning and symmetrical cross-layer connection to extract multilevel features. Local residual mapping improves the expressive capability of the network to enhance performance, solves gradient dissipation in network training, and reduces the number of convolution cores in the model through feature reuse. To reduce the parameters of the convolution kernel, the dimension transformation of the input of the residual block is performed by using the convolution kernel with 1×1 scale, which initially reduces and then increases the dimension. The network adopts the encoder-decoder structure. In the encoder structure, the dimension of feature space is gradually reduced, and the redundant information in the input image is discarded. The feature expression of the face image at the high-dimensional visual level is obtained. The visual feature is sent to the decoder through the cross-layer connection structure. The decoder cascades and fuses all levels of visual features in the unit to achieve accurate filtering of effective information. The deconvolution layer is used to restore the spatial dimension gradually and repair the details and texture features of the face. We design a joint training method for face multiattribute learning tasks:setting loss weights and loss thresholds on the basis of the learning difficulty of different tasks and avoiding the influence of subtasks on the learning of head tasks after fitting the training set to obtain considerable abundant face prior knowledge information. The perceptual loss function is used to measure the semantic gap between HR and SR images, and the output feature map of perceptual loss network is visually processed to demonstrate the effectiveness of perceptual loss in improving the reconstruction effect of facial semantic information. Finally, we enhance the dataset of face attributes and filter the data that are missing relevant attribute labels. The key point detection algorithm is used to re-extract the attributes of feature points. On this basis, joint multitask learning is conducted to obtain numerous realistic SR results of visual perception. Result In this experiment, a total of 35 000 face images are selected, and two sets of LR/HR face image data pairs with different resolution magnifications are produced via double cubic interpolation downsampling using×4 and×8 scales. The sizes of each pair of images are 322/1282 and 162/1282. The first 30 000 face images are used as training set, and the last 5 000 face images are used as test set. Six models are trained for each of the three cases:×4 and×8 resolution amplifications, whether to use multitask joint learning, and whether to use perceptual loss function. The single task face SR network using pixel-by-pixel loss is defined as STFSR-MSE, and the face SR network using pixel-by-pixel loss is defined as MTFSR-MSE. Face SR network based on multitask joint learning with sense perception loss is defined as MTFSR-Perce. Two objective evaluation criteria, namely, peak signal-to-noise ratio and structural similarity index, are used to test the experimental results. The effect of this algorithm is improved by 2.15 dB at the scale of×8 magnification, compared with the general SR MemNet algorithm, and approximately 1.2 dB at the scale of×8 magnification, compared with the face SR FSRNet algorithm. A joint training method for the multiattribute learning task of the face is designed. On the basis of the learning difficulty of different tasks, loss weights and loss thresholds are set to avoid the influence of subtasks on the learning of head services after the training set has been fitted and to obtain abundant face prior knowledge information. Conclusion Experimental data and results show that the proposed algorithm can further utilize the face prior knowledge and create further realistic and clear facial edges and texture details in visual perception.
Keywords

订阅号|日报