Current Issue Cover
单幅人脸图像的全景纹理图生成方法

刘洋1, 樊养余1, 郭哲1, 吕国云1, 刘诗雅2(1.西北工业大学电子信息学院, 西安 710072;2.虚拟现实内容制作中心, 北京 101318)

摘 要
目的 针对从单幅人脸图像中恢复面部纹理图时获得的信息不完整、纹理细节不够真实等问题,提出一种基于生成对抗网络的人脸全景纹理图生成方法。方法 将2维人脸图像与3维人脸模型之间的特征关系转换为编码器中的条件参数,从图像数据与人脸条件参数的多元高斯分布中得到隐层数据的概率分布,用于在生成器中学习人物的头面部纹理特征。在新创建的人脸纹理图数据集上训练一个全景纹理图生成模型,利用不同属性的鉴别器对输出结果进行评估反馈,提升生成纹理图的完整性和真实性。结果 实验与当前最新方法进行了比较,在CelebA-HQ和LFW (labled faces in the wild)数据集中随机选取单幅正面人脸测试图像,经生成结果的可视化对比及3维映射显示效果对比,纹理图的完整度和显示效果均优于其他方法。通过全局和面部区域的像素量化指标进行数据比较,相比于UVGAN,全局峰值信噪比(peak signal to noise ratio,PSNR)和全局结构相似性(structural similarity index,SSIM)分别提高了7.9 dB和0.088,局部PSNR和局部SSIM分别提高了2.8 dB和0.053;相比于OSTeC,全局PSNR和全局SSIM分别提高了5.45 dB和0.043,局部PSNR和局部SSIM分别提高了0.4 dB和0.044;相比于MVF-Net (multi-view 3D face network),局部PSNR和局部SSIM分别提高了0.6和0.119。实验结果证明,提出的人脸全景纹理图生成方法解决了从单幅人脸图像中重建面部纹理不完整的问题,改善了生成纹理图的显示细节。结论 本文提出的人脸全景纹理图生成方法,利用人脸参数和网络模型的特性,使生成的人脸纹理图更完整,尤其是对原图不可见区域,像素恢复自然连贯,纹理细节更真实。
关键词
Single face image-based panoramic texture map generation

Liu Yang1, Fan Yangyu1, Guo Zhe1, Lyu Guoyun1, Liu Shiya2(1.School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710072, China;2.Content Production Center of Virtual Reality, Beijing 101318, China)

Abstract
Objective Face texture map generation is a key part of face identification research, in which the face texture can be used to map the pixel information in a two-dimensional (2D) image to the corresponding 3D face model. Currently, there are two initial ways to acquire a face texture. The first one is based on full coverage of head scanning by a laser machine, and the other one is on the face image information. The high accuracy scanning process is assigned for a manipulated circumstance, and captures appearance information well. However, this method is mostly adopted for collecting images for database. The original face texture map based on a 2D image is obtained via splicing the captured image of a targeted head in various of viewing angles simply. Some researchers use raw texture images from five views jointly, which means face texture reconstruction is done under restricted conditions. This method can recover all of the details of the human head according to the pixel information between the complementary face images precisely, but it is difficult to apply in reality, and the different angles images capture illustrate transformations in facial lighting and camera parameters that will cause discontinuous pixel changes in the generated texture. As the pixel information is incomplete for a solo face image, the general method is to perform the texture mapping based on the pixel distribution of the 3D face model in the ultraviolet (UV) space. The overall face-and-head texture can be recovered with pixel averaging and pixel interpolation processes by filling the missing area, but the obtained pixel distribution is quite inconsistent with the original image. A 3D morphable model (3DMM) can restore the facial texture map in a single image, and the 3DMM texture can assign 3D pixel data into the 2D plane with per-pixel alignment based on UV map interpretation. Nevertheless, the texture statistical model is demonstrated to scan under constrained conditions to acquire the low-high frequency and albedo information. This kind of texture model is obtained with some difficulty and it is also challenging for "in-the-wild" image analysis. Meanwhile, such methods cannot recover complicated skin pigment changes and identify layered texture details (such as freckles, pores, moles and surface hair).In general, facial texture reconstruct maps from a solo face image is to be challenged. First, effective pixel information of the profile and the head region will be lost in a solo face image due to the fixed posture, and the UV texture map obtained by conventional ways is incomplete; Next, it is difficult to recover the photorealistic texture from the unrestricted image because the light conditions and camera parameters cannot be confirmed in unconstrained circumstances. Method A method for generating face panoramic texture maps is proposed based on the generative adversarial networks. The method illustrates the correlative feature between the 2D face image and 3D face mode to obtain the face parameters from the input face image, and an structure is designed that integrates the characteristics of the variational auto-encoder and generative adversarial networks to learn the face-and-head texture features. These face parameter vectors are converted into latent vectors and added as the condition attributes to constrain the generation process of the networks. A panoramic texture map generation model training is conducted on our facial texture dataset. Simultaneously, various attribute discriminators are demonstrated to evaluate and feed the output results back to improve the integrity and authenticity of the result. A face UV texture database is to be built up, some of the samples of which are from the WildUV dataset, which contains nearly 2 000 texture images of individuals with different identities and 5 638 unique facial UV texture maps. In addition, some texture data are obtained via professional 3D scanning testbed. Approximately 400 testers with different identities (250 males, 150 females) offered 2 000 various UV texture maps. Moreover, data augmentation was implemented on perfect texture images. Finally, a total of 10 143 texture samples were used in the demonstration. The samples provide credible data for the generative model. Result The results were compared with the state-of-the-art face texture map generation methods. Test images were randomly opted from the CelebA-HQ and labled faces in the wild (LFW) dataset. Based on visual comparison of the generated results, the generated textures are mapped to improve the corresponding 3D models, and it is clear that the results are mapped more completely on the model and reveal more realistic digital examples. Meanwhile, a quantitative evaluation for the completeness of the generated face texture map and the accuracy of the facial region are conducted. The reliability of the restoration of the invisible area in the original image and the capability to retain the facial features were evaluated with peak signal to noise ratio(PSNR) and structural similarity index(SSIM) parameters quantitatively. Conclusion The results of comparative tests demonstrate that the method for generating a panoramic texture map of solo face can improve incomplete facial texture reconstruction from a solo face image, and facilitate the texture details of the generated texture map. The characteristics of face parameters and generative network models can make the output facial texture maps more complete, especially for the invisible areas of the original image. The pixels are restored clearly and consistently and the texture details are more real.
Keywords

订阅号|日报