Current Issue Cover
人脸老化/去龄化的高质量图像生成模型

宋昊泽, 吴小俊(江南大学物联网工程学院, 无锡 214122)

摘 要
目的 近年来关于人脸老化/去龄化的研究在深度学习的推动下取得了飞速发展,2017年提出的条件对抗自编码器(CAAE)人脸老化/去龄化模型生成的人脸不仅可信度高,而且更贴近目标年龄。然而在人脸老化/去龄化过程中仍存在生成图像分辨率低、人工鬼影噪声严重(生成人脸器官扭曲)等问题。为此,在CAAE的基础上,提出一个人脸老化/去龄化的高质量图像生成模型(HQGM)。方法 用边界平衡对抗生成网络(BEGAN)替换CAAE中的对抗生成网络(GAN)。BEGAN在人脸图像生成上不仅分辨率更高而且具有更好的视觉效果。在此基础上,添加两个提高生成图像质量的损失函数:图像梯度差损失函数和人脸特征损失函数。图像梯度差损失函数通过缩小生成图像和真实图像的图像梯度,使生成图像具有更多轮廓等高频信息;人脸特征损失函数将生成图像和真实图像分别输入到配置预训练参数的VGG-FACE网络模型中,输出各自的特征图。通过缩小两幅特征图的对应点差值,使生成图像具有更多真实图像的人脸特征信息。结果 实验使用UTKface、FGnet和Morph数据集,经过训练,每幅测试图像分别生成10幅不同年龄的图像。与CAAE相比,HQGM可以有效去除人工鬼影噪声,峰值信噪比高3.2 dB,结构相似性高0.06,提升显著。结论 HQGM可以生成具有丰富纹理信息和人脸特征信息的人脸老化/去龄化图像。
关键词
High-quality image generation model for face aging/processing

Song Haoze, Wu Xiaojun(School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China)

Abstract
Objective Face aging/processing aims to render face information with or without the age information but still preserve personalized features of the face. It has considerable influence to a wide range of applications, such as face prediction of wanted/missing person, age-invariant verification, and entertainment. Our background modeling is based on the conditional adversarial autoencoder (CAAE) model proposed by Zhang et al. In the CAAE network, the face is initially mapped to a latent vector through a convolutional encoder, and the vector is then projected to the face manifold condition on age through a deconvolutional generator. The latent vector preserves personalized face features, and the age condition controls aging/processing. The generated images from CAAE networks is more believable than the those from traditional face aging/processing approaches, and the CAAE model reduces the demand of data for training. However, face aging/processing still has many challenges, such as low resolution and artificial ghost. Therefore, we proposed a high-quality image generation model (HQGM) for face aging/processing to enrich the texture and face feature information. Method HQGM is based on CAAE model, where two generative adversarial networks (GANs) are imposed on the encoder and generator. HQGM only adds a GAN on the generator and replaces the GAN with the boundary equilibrium GAN (BEGAN). BEGAN is one of the best face generation models, and its discriminator is an autoencoder. Two loss functions are imposed on the encoder and generator, that is, the image gradient loss function and the face feature loss function using the VGG-FACE model. The VGG-FACE model is used as a face feature extractor. The image gradient loss function aims to reduce the image gradient difference between the input and generated images. The face feature loss function aims to reduce the difference between the face feature information of the input images and that of the generated images. We place the input and generated images into the VGG-FACE model to obtain the face feature information. HQGM has one encoder, one decoder (generator), and one discriminator. An encoder E maps the input face to a vector z. Concatenating the label l (age) to z, the new latent vector[z, l] is fed to a generator G. The encoder and generator are updated based on the L2, image gradient, face feature, and BEGAN losses. The discriminator forces the output face to become photorealistic and plausible for a given age label. The generator and discriminator are trained alternatingly until the loss imposed on the generator has good convergence. In practice, only the encoder and generator are used to generate the target age face. Result UTKface, FGnet, and Morph datasets are used in the experiment. Approximately 21 000 faces and corresponding labels are used to train the network. During testing, four groups of contrast experiments include image gradient loss/no image gradient loss, face feature loss/no face feature loss, BEGAN loss/no BEGAN loss, and HQGM/CAAE model. The comparison tests use two methods; one is to contrast the effect of the generated images by observing with the naked eye, and the other is to contrast the quality of the generated images by the assessment criteria. PSNR and SSIM are adopted as the assessment criteria. The comparative result shows that the BEGAN, image gradient, and face feature losses can generate higher-quality images. Moreover, the generated images show that using the face feature loss function can reduce the noise, which causes twisted organs. When comparing the HQGM network with the CAAE network, the numerical value of the PSNR computed by the generated images of HQGM is greater than that computed by the generated images of the CAAE network at approximately 3.2; the numerical value of SSIM computed by the generated images of HQGM is greater than that computed by the generated image of the CAAE network at approximately 0.06; and the generated images from HQGM that removed the artificial ghost become more believable than that from the CAAE network. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with some state-of-the-art works. Conclusion In comparison with GAN, the generated images from BEGAN have more texture information and better visual effect. The generated images have more high-frequency information, such as profile information using the image gradient loss function. The generated images have more texture information and more face feature information using the face feature loss function. More face feature information can reduce the artificial ghost of the generated images. In summary, the generated images from our HQGM have more texture information and more face feature information. In the future work, the proposed method will be improved and optimized by using the new structure of network or some new loss functions.
Keywords

订阅号|日报