联合多任务学习的人脸超分辨率重建

王欢; 吴成东; 迟剑宁; 于晓升; 胡倩

doi:10.11834/jig.190233

图像处理和编码 | 浏览量 : 0 下载量: 99 CSCD: 4

PDF
导出
分享
收藏
专辑

联合多任务学习的人脸超分辨率重建
Face super-resolution reconstruction based on multitask joint learning
2020年25卷第2期页码：229-240
收稿：2019-05-27，

修回：2019-8-16，

录用：2019-8-23，

纸质出版：2020-02-16
DOI： 10.11834/jig.190233
稿件说明：

移动端阅览

王欢, 吴成东, 迟剑宁, 于晓升, 胡倩. 联合多任务学习的人脸超分辨率重建[J]. 中国图象图形学报, 2020,25(2):229-240. DOI： 10.11834/jig.190233.

Huan Wang, Chengdong Wu, Jianning Chi, Xiaosheng Yu, Qian Hu. Face super-resolution reconstruction based on multitask joint learning[J]. Journal of Image and Graphics, 2020, 25(2): 229-240. DOI： 10.11834/jig.190233.

摘要

目的

人脸超分辨率重建是特定应用领域的超分辨率问题，为了充分利用面部先验知识，提出一种基于多任务联合学习的深度人脸超分辨率重建算法。

方法

首先使用残差学习和对称式跨层连接网络提取低分辨率人脸的多层次特征，根据不同任务的学习难易程度设置损失权重和损失阈值，对网络进行多属性联合学习训练。然后使用感知损失函数衡量HR（high-resolution）图像与SR（super-resolution）图像在语义层面的差距，并论证感知损失在提高人脸语义信息重建效果方面的有效性。最后对人脸属性数据集进行增强，在此基础上进行联合多任务学习，以获得视觉感知效果更加真实的超分辨率结果。

结果

使用峰值信噪比（PSNR）和结构相似度（SSIM）两个客观评价标准对实验结果进行评价，并与其他主流方法进行对比。实验结果显示，在人脸属性数据集（CelebA）上，在放大8倍时，与通用超分辨率MemNet（persistent memory network）算法和人脸超分辨率FSRNet（end-to-end learning face super-resolution network）算法相比，本文算法的PSNR分别提升约2.15 dB和1.2 dB。

结论

实验数据与效果图表明本文算法可以更好地利用人脸先验知识，产生在视觉感知上更加真实和清晰的人脸边缘和纹理细节。

Abstract

Objective

The image captured by the monitor will be affected by the ambiguity of the atmosphere and motion transformation of the target

resulting in the low resolution of the captured face image

which cannot be recognized by human or machine. Therefore

the clarity of face images must be urgently improved. The method of enhancing the resolution of face image by using super-resolution (SR) restoration technology has become an important means for solving this problem. Face SR reconstruction is the process of predicting high-resolution (HR) face images from one or more observed low-resolution (LR) face images

which is a typical pathological problem. As a domain-specific super-resolution task

we can use facial priori knowledge to improve the effect of super-resolution. We propose a deep end-to-end face SR reconstruction algorithm based on multitask joint learning method. Multitask learning algorithm is an inductive transfer mechanism

which can improve the generalization performance of backbone models by utilizing specific domain information hidden in training signals. Existing SR methods use different methods to fuse face priori information

which substantially improves the performance of face super-resolution algorithm. However

these networks generally use the method of the direct fusion of facial geometry information and image features to integrate the priori feature information

but they do not fully utilize the semantic information

such as facial landmark

gender

and facial expression. Moreover

at a large magnification

the priori features obtained by these methods are rough to reconstruct detailed facial edges and texture details. To solve this problem

we propose a face SR reconstruction algorithm based on multitask joint learning. MTFSR combines face SR with assistant tasks

such as facial feature point detection

gender classification

and facial expression recognition

by using multitask learning method to obtain the shared representation of facial features among related tasks

acquire rich facial prior knowledge

and optimize the performance of face SR algorithm.

Method

First

a face SR reconstruction algorithm based on multitask joint learning is proposed. The model uses residual learning and symmetrical cross-layer connection to extract multilevel features. Local residual mapping improves the expressive capability of the network to enhance performance

solves gradient dissipation in network training

and reduces the number of convolution cores in the model through feature reuse. To reduce the parameters of the convolution kernel

the dimension transformation of the input of the residual block is performed by using the convolution kernel with 1×1 scale

which initially reduces and then increases the dimension. The network adopts the encoder-decoder structure. In the encoder structure

the dimension of feature space is gradually reduced

and the redundant information in the input image is discarded. The feature expression of the face image at the high-dimensional visual level is obtained. The visual feature is sent to the decoder through the cross-layer connection structure. The decoder cascades and fuses all levels of visual features in the unit to achieve accurate filtering of effective information. The deconvolution layer is used to restore the spatial dimension gradually and repair the details and texture features of the face. We design a joint training method for face multiattribute learning tasks:setting loss weights and loss thresholds on the basis of the learning difficulty of different tasks and avoiding the influence of subtasks on the learning of head tasks after fitting the training set to obtain considerable abundant face prior knowledge information. The perceptual loss function is used to measure the semantic gap between HR and SR images

and the output feature map of perceptual loss network is visually processed to demonstrate the effectiveness of perceptual loss in improving the reconstruction effect of facial semantic information. Finally

we enhance the dataset of face attributes and filter the data that are missing relevant attribute labels. The key point detection algorithm is used to re-extract the attributes of feature points. On this basis

joint multitask learning is conducted to obtain numerous realistic SR results of visual perception.

Result

In this experiment

a total of 35 000 face images are selected

and two sets of LR/HR face image data pairs with different resolution magnifications are produced via double cubic interpolation downsampling using×4 and×8 scales. The sizes of each pair of images are 32

/128

and 16

/128

. The first 30 000 face images are used as training set

and the last 5 000 face images are used as test set. Six models are trained for each of the three cases:×4 and×8 resolution amplifications

whether to use multitask joint learning

and whether to use perceptual loss function. The single task face SR network using pixel-by-pixel loss is defined as STFSR-MSE

and the face SR network using pixel-by-pixel loss is defined as MTFSR-MSE. Face SR network based on multitask joint learning with sense perception loss is defined as MTFSR-Perce. Two objective evaluation criteria

namely

peak signal-to-noise ratio and structural similarity index

are used to test the experimental results. The effect of this algorithm is improved by 2.15 dB at the scale of×8 magnification

compared with the general SR MemNet algorithm

and approximately 1.2 dB at the scale of×8 magnification

compared with the face SR FSRNet algorithm. A joint training method for the multiattribute learning task of the face is designed. On the basis of the learning difficulty of different tasks

loss weights and loss thresholds are set to avoid the influence of subtasks on the learning of head services after the training set has been fitted and to obtain abundant face prior knowledge information.

Conclusion

Experimental data and results show that the proposed algorithm can further utilize the face prior knowledge and create further realistic and clear facial edges and texture details in visual perception.

关键词

Keywords

references

Baker S and Kanade T. 2000. Hallucinating faces//Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition. Grenoble, France: IEEE: 83-88[ DOI:10.1109/AFGR.2000.840616 http://dx.doi.org/10.1109/AFGR.2000.840616 ]

Cao Q X, Lin L, Shi Y K, Liang X D and Li G B. 2017. Attention-aware face hallucination via deep reinforcement learning//Proceedings of 2017 IEEE Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1656-1664[ DOI:10.1109/CVPR.2017.180 http://dx.doi.org/10.1109/CVPR.2017.180 ]

Caruana R. 1994. Learning many related tasks at the same time with backpropagation//Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver, Colorado: MIT Press: 657-664

Chen Y, Tai Y, Liu X M, Shen C H and Yang J. 2018. FSRNet: end-to-end learning face super-resolution with facial priors//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 2492-2501[ DOI:10.1109/CVPR.2018.0026 http://dx.doi.org/10.1109/CVPR.2018.0026 ]

Dong C, Loy C C, He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295-307[DOI:10.1109/TPAMI.2015.2439281]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034[ DOI:10.1109/ICCV.2015.123 http://dx.doi.org/10.1109/ICCV.2015.123 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778[ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 694-711[ DOI:10.1007/978-3-319-46475-6_43 http://dx.doi.org/10.1007/978-3-319-46475-6_43 ]

Kazemi V and Sullivan J. 2014. One millisecond face alignment with an ensemble of regression trees//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 1867-1874[ DOI:10.1109/CVPR.2014.241 http://dx.doi.org/10.1109/CVPR.2014.241 ]

Kim J, Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 1646-1654[ DOI:10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]

Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. Amsterdam: Amsterdam Machine Learning lab: 1-15

Kolouri S and Rohde G K. 2015. Transport-based single frame super resolution of very low resolution face images//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 4876-4884[ DOI:10.1109/CVPR.2015.7299121 http://dx.doi.org/10.1109/CVPR.2015.7299121 ]

Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5835-5843[ DOI:10.1109/CVPR.2017.618 http://dx.doi.org/10.1109/CVPR.2017.618 ]

Le V, Brandt J, Lin Z, Bourdev L and Huang T S. 2012. Interactive facial feature localization//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 679-692[ DOI:10.1007/978-3-642-33712-3_49 http://dx.doi.org/10.1007/978-3-642-33712-3_49 ]

Liu Z W, Luo P, Wang X and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3730-3738[ DOI:10.1109/ICCV.2015.425 http://dx.doi.org/10.1109/ICCV.2015.425 ]

Mao X J, Shen C H and Yang Y B. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections//Proceedings of the 29th Conference on Neural Information Processing Systems. Barcelona, Spain: [s.n.]: 2810-2818

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations.[s.l.]: [s.n.]

Song Y B, Zhang J W, He S F, Bao L C and Yang Q X. 2017. Learning to hallucinate face images via component generation and enhancement//Proceedings of the 26th International Joint Conference on Artificial Intelligence.[s.l.]: IJCAI: 4537-4543[ DOI:10.24963/ijcai.2017/633 http://dx.doi.org/10.24963/ijcai.2017/633 ]

Tai Y, Yang J and Liu X M. 2017. Image super-resolution via deep recursive residual network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 2790-2798[ DOI:10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]

Tai Y, Yang J, Liu X M and Xu C Y. 2017. MemNet: a persistent memory network for image restoration//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4549-4557[ DOI:10.1109/ICCV.2017.486 http://dx.doi.org/10.1109/ICCV.2017.486 ]

Tong J C, Fei J L, Chen J S, Li H and Ding D D. 2019. Multi-level feature fusion image super-resolution algorithm with recursive neural network. Journal of Image and Graphics, 24(2):302-312

佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹. 2019.递归式多阶特征融合图像超分辨率算法.中国图象图形学报, 24(2):302-312[DOI:10.11834/jig.180410]

Wang X G and Tang X O. 2005. Hallucinating face by eigentransformation. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35(3):425-434[DOI:10.1109/TSMCC.2005.848171]

Yang C Y, Liu S F and Yang M H. 2013. Structured face hallucination//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 1099-1106[ DOI:10.1109/CVPR.2013.146 http://dx.doi.org/10.1109/CVPR.2013.146 ]

Yu X and Porikli F. 2016. Ultra-resolving face images by discriminative generative networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 318-333[ DOI:10.1007/978-3-319-46454-1_20 http://dx.doi.org/10.1007/978-3-319-46454-1_20 ]

Zhang Y and Yeung D Y. 2010. A convex formulation for learning task relationships in multi-task learning//Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. Catalina Island, CA, USA: [s.n.]: 733-742

Zhu S Z, Liu S F, Loy C C and Tang X O. 2016. Deep cascaded bi-network for face hallucination//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 614-630[ DOI:10.1007/978-3-319-46454-1_37 http://dx.doi.org/10.1007/978-3-319-46454-1_37 ]