Single face image-based panoramic texture map generation

Yang Liu; Yangyu Fan; Zhe Guo; Guoyun Lyu; Shiya Liu

doi:10.11834/jig.210332

Image & Video Analysis | Views : 0 下载量: 1 CSCD: 2

PDF
Export
Share
Collection
Album

Single face image-based panoramic texture map generation
Vol. 27, Issue 2, Pages: 602-613(2022)
Published： 16 February 2022 ，

Accepted： 22 September 2021
DOI： 10.11834/jig.210332
稿件说明：

移动端阅览

Yang Liu, Yangyu Fan, Zhe Guo, Guoyun Lyu, Shiya Liu. Single face image-based panoramic texture map generation. [J]. Journal of Image and Graphics 27(2):602-613(2022)
DOI：

Yang Liu, Yangyu Fan, Zhe Guo, Guoyun Lyu, Shiya Liu. Single face image-based panoramic texture map generation. [J]. Journal of Image and Graphics 27(2):602-613(2022) DOI： 10.11834/jig.210332.

摘要

目的

针对从单幅人脸图像中恢复面部纹理图时获得的信息不完整、纹理细节不够真实等问题，提出一种基于生成对抗网络的人脸全景纹理图生成方法。

方法

将2维人脸图像与3维人脸模型之间的特征关系转换为编码器中的条件参数，从图像数据与人脸条件参数的多元高斯分布中得到隐层数据的概率分布，用于在生成器中学习人物的头面部纹理特征。在新创建的人脸纹理图数据集上训练一个全景纹理图生成模型，利用不同属性的鉴别器对输出结果进行评估反馈，提升生成纹理图的完整性和真实性。

结果

实验与当前最新方法进行了比较，在CelebA-HQ和LFW（labled faces in the wild）数据集中随机选取单幅正面人脸测试图像，经生成结果的可视化对比及3维映射显示效果对比，纹理图的完整度和显示效果均优于其他方法。通过全局和面部区域的像素量化指标进行数据比较，相比于UVGAN，全局峰值信噪比（peak signal to noise ratio，PSNR）和全局结构相似性（structural similarity index，SSIM）分别提高了7.9 dB和0.088，局部PSNR和局部SSIM分别提高了2.8 dB和0.053；相比于OSTeC，全局PSNR和全局SSIM分别提高了5.45 dB和0.043，局部PSNR和局部SSIM分别提高了0.4 dB和0.044；相比于MVF-Net（multi-view 3D face network），局部PSNR和局部SSIM分别提高了0.6和0.119。实验结果证明，提出的人脸全景纹理图生成方法解决了从单幅人脸图像中重建面部纹理不完整的问题，改善了生成纹理图的显示细节。

结论

本文提出的人脸全景纹理图生成方法，利用人脸参数和网络模型的特性，使生成的人脸纹理图更完整，尤其是对原图不可见区域，像素恢复自然连贯，纹理细节更真实。

Abstract

Objective

Face texture map generation is a key part of face identification research

in which the face texture can be used to map the pixel information in a two-dimensional (2D) image to the corresponding 3D face model. Currently

there are two initial ways to acquire a face texture. The first one is based on full coverage of head scanning by a laser machine

and the other one is on the face image information. The high accuracy scanning process is assigned for a manipulated circumstance

and captures appearance information well. However

this method is mostly adopted for collecting images for database. The original face texture map based on a 2D image is obtained via splicing the captured image of a targeted head in various of viewing angles simply. Some researchers use raw texture images from five views jointly

which means face texture reconstruction is done under restricted conditions. This method can recover all of the details of the human head according to the pixel information between the complementary face images precisely

but it is difficult to apply in reality

and the different angles images capture illustrate transformations in facial lighting and camera parameters that will cause discontinuous pixel changes in the generated texture. As the pixel information is incomplete for a solo face image

the general method is to perform the texture mapping based on the pixel distribution of the 3D face model in the ultraviolet (UV) space. The overall face-and-head texture can be recovered with pixel averaging and pixel interpolation processes by filling the missing area

but the obtained pixel distribution is quite inconsistent with the original image. A 3D morphable model (3DMM) can restore the facial texture map in a single image

and the 3DMM texture can assign 3D pixel data into the 2D plane with per-pixel alignment based on UV map interpretation. Nevertheless

the texture statistical model is demonstrated to scan under constrained conditions to acquire the low-high frequency and albedo information. This kind of texture model is obtained with some difficulty and it is also challenging for "in-the-wild" image analysis. Meanwhile

such methods cannot recover complicated skin pigment changes and identify layered texture details (such as freckles

pores

moles and surface hair).In general

facial texture reconstruct maps from a solo face image is to be challenged. First

effective pixel information of the profile and the head region will be lost in a solo face image due to the fixed posture

and the UV texture map obtained by conventional ways is incomplete; Next

it is difficult to recover the photorealistic texture from the unrestricted image because the light conditions and camera parameters cannot be confirmed in unconstrained circumstances.

Method

A method for generating face panoramic texture maps is proposed based on the generative adversarial networks. The method illustrates the correlative feature between the 2D face image and 3D face mode to obtain the face parameters from the input face image

and an structure is designed that integrates the characteristics of the variational auto-encoder and generative adversarial networks to learn the face-and-head texture features. These face parameter vectors are converted into latent vectors and added as the condition attributes to constrain the generation process of the networks. A panoramic texture map generation model training is conducted on our facial texture dataset. Simultaneously

various attribute discriminators are demonstrated to evaluate and feed the output results back to improve the integrity and authenticity of the result. A face UV texture database is to be built up

some of the samples of which are from the WildUV dataset

which contains nearly 2 000 texture images of individuals with different identities and 5 638 unique facial UV texture maps. In addition

some texture data are obtained via professional 3D scanning testbed. Approximately 400 testers with different identities (250 males

150 females) offered 2 000 various UV texture maps. Moreover

data augmentation was implemented on perfect texture images. Finally

a total of 10 143 texture samples were used in the demonstration. The samples provide credible data for the generative model.

Result

The results were compared with the state-of-the-art face texture map generation methods. Test images were randomly opted from the CelebA-HQ and labled faces in the wild (LFW) dataset. Based on visual comparison of the generated results

the generated textures are mapped to improve the corresponding 3D models

and it is clear that the results are mapped more completely on the model and reveal more realistic digital examples. Meanwhile

a quantitative evaluation for the completeness of the generated face texture map and the accuracy of the facial region are conducted. The reliability of the restoration of the invisible area in the original image and the capability to retain the facial features were evaluated with peak signal to noise ratio(PSNR) and structural similarity index(SSIM) parameters quantitatively.

Conclusion

The results of comparative tests demonstrate that the method for generating a panoramic texture map of solo face can improve incomplete facial texture reconstruction from a solo face image

and facilitate the texture details of the generated texture map. The characteristics of face parameters and generative network models can make the output facial texture maps more complete

especially for the invisible areas of the original image. The pixels are restored clearly and consistently and the texture details are more real.

关键词

人脸图像人脸纹理图生成对抗网络(GAN)纹理映射3维可变人脸模型(3DMM)

Keywords

face imageface texture mapgenerative adversarial networks(GAN)texture mapping3D morphable model (3DMM)

references

Bagautdinov T, Wu C L, Saragih J, Fua P and Sheikh Y. 2018. Modeling facial geometry using compositional VAEs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3877-3886[DOI: 10.1109/CVPR.2018.00408http://dx.doi.org/10.1109/CVPR.2018.00408]

Bao J M, Dong C, Fang W, Li H Q and Gang H. 2017. CVAE-GAN: fine-grained image generation through asymmetric training//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2764-2773[DOI: 10.1109/ICCV.2017.299http://dx.doi.org/10.1109/ICCV.2017.299]

Blanz V and Vetter T. 1999. A morphable model for the synthesis of 3D faces//Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM Press: 187-194[DOI: 10.1145/311535.311556http://dx.doi.org/10.1145/311535.311556]

Blanz V and Vetter T. 2003. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9): 1063-1074[DOI:10.1109/TPAMI.2003.1227983]

Booth J, Roussos A, Ponniah A, Dunaway D and Zafeiriou S. 2018. Large scale 3D morphable models. International Journal of Computer Vision, 126(2): 233-254[DOI:10.1007/s11263-017-1009-7]

Cao C, Weng Y L, Zhou S, Tong Y Y and Zhou K. 2014. FaceWarehouse: a 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer graphics, 20(3): 413-425[DOI:10.1109/TVCG.2013.249]

Cao X, Chen Z, Chen A P, Chen X, Li S Y and Yu J Y. 2018. Sparse photometric 3D face reconstruction guided by morphable models//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4635-4644[DOI: 10.1109/CVPR.2018.00487http://dx.doi.org/10.1109/CVPR.2018.00487]

Dai H, Pears N, Smith W and Duncan C. 2017. A 3D morphable model of craniofacial shape and texture variation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3085-3093[DOI: 10.1109/iccv.2017.335http://dx.doi.org/10.1109/iccv.2017.335]

Deng J K, Cheng S Y, Xue N N, Zhou Y X and Zafeiriou S. 2018. UV-GAN: adversarial facial UV map completion for pose-invariant face recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7093-7102[DOI: 10.1109/CVPR.2018.00741http://dx.doi.org/10.1109/CVPR.2018.00741]

Deng J K, Guo J, Xue N N and Zafeiriou S. 2019. ArcFace: additive angular margin loss for deep face recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4685-4694[DOI: 10.1109/CVPR.2019.00482http://dx.doi.org/10.1109/CVPR.2019.00482]

Fan Q N, Yang J L, Wipf D, Chen B Q and Tong X. 2018. Image smoothing via unsupervised learning. ACM Transactions on Graphics, 37(6): 1-14[DOI:10.1145/3272127.3275081]

Feng Y, Wu F, Shao X, Wang Y and Zhou X. 2018. Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network//Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany: IEEE: 534-551[DOI: 10.1007/978-3-030-01264-9_33http://dx.doi.org/10.1007/978-3-030-01264-9_33]

Gao W S, Zhao X, Gao Z M, Zou J H, Dou P F and Kakadiaris I A. 2019. 3D face reconstruction from volumes of videos using a mapreduce framework. IEEE Access, 7: 165559-165570[DOI:10.1109/ACCESS.2019.2938671]

Gecer B, Deng J K and Zafeiriou S. 2021. OSTeC: one-shot texture completion//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Nashville, USA: IEEE: 7628-7638[DOI: 10.1109/CVPR46437.2021.00754http://dx.doi.org/10.1109/CVPR46437.2021.00754]

Gecer B, Ploumpis S, Kotsia I and Zafeiriou S. 2019. GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1155-1164[DOI: 10.1109/CVPR.2019.00125http://dx.doi.org/10.1109/CVPR.2019.00125]

Hu L W, Saito S, Wei L Y, Nagano K, Seo J, Fursund J, Sadeghi I, Sun C, Chen Y C and Li H. 2017. Avatar digitization from a single image for real-time rendering. ACM Transactions on Graphics, 36(6): 1-14[DOI:10.1145/3130800.31310887]

Huang G B, Mattar M, Berg T and Learned-Miller E. 2008. Labeled Faces in the Wild: a Database Forstudying Face Recognition in Unconstrained Environments. inria-00321923. Workshop on Faces in "Real-Life" Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric: [s.p.]

Huber P, Hu G S, Tena R, Mortazavian P, Koppen W P, Christmas W J, Rätsch M and Kittler J. 2016. A multiresolution 3D morphable face model and fitting framework//Proceedingsof the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications-Volume 4: VISAPP. Rome, Italy: SciTePress[DOI: 10.5220/0005669500790086http://dx.doi.org/10.5220/0005669500790086]

Ichim A E, Bouaziz S and Pauly M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics, 34(4): #45[DOI:10.1145/2766974]

Jiang L, Zhang J Y, Deng B L, Li H and Liu L G. 2018. 3D face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing, 27(10): 4756-4770[DOI:10.1109/TIP.2018.2845697]

Karras T, Aila T, Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality, stability, and variation//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: Vancouver Convention Center: [s.p.]

Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI:10.1145/3065386]

Li C, Lin S, Zhou K and Ikeuchi K. 2017. Specular highlight removal in facial images//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3107-3116[DOI: 10.1109/CVPR.2017.297http://dx.doi.org/10.1109/CVPR.2017.297]

Li T Y, Bolkart T, Black M J, Li H and Romero J. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, 36(6): 1-17[DOI:10.1145/3130800.3130813]

Lin J K, Yuan Y, Shao T J and Zhou K. 2020. Towards high-fidelity 3D face reconstruction from in-the-wild images using graph convolutional networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5891-5900[DOI: 10.1109/CVPR42600.2020.00593http://dx.doi.org/10.1109/CVPR42600.2020.00593]

Liu Y, Z H, Lin Z X, Peng S D, Bao H J and Zhou X W. 2019. GIFT: learning transformation-invariant dense visual descriptors via group CNNs//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc: 6992-7003[DOI: 10.1109/1911.05932http://dx.doi.org/10.1109/1911.05932]

Liu Z W, Luo P, Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3730-3738[DOI: 10.1109/ICCV.2015.425http://dx.doi.org/10.1109/ICCV.2015.425]

Ma D S, Correll J and Wittenbrink B. 2015. The Chicago face database: a free stimulus set of faces and norming data. Behavior Research Methods, 47(4): 1122-1135[DOI:10.3758/s13428-014-0532-5]

Paysan P, Knothe R, Amberg B, Romdhani S and Vetter T. 2009. A 3D face model for pose and illumination invariant face recognition//Proceedings of the 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. Genova, Italy: IEEE: 296-301[DOI: 10.1109/AVSS.2009.58http://dx.doi.org/10.1109/AVSS.2009.58]

Ploumpis S, Ververas E, O'Sullivan E, Moschoglou S, Wang H Y, Pears N, Smith W A P, Gecer B and Zafeiriou S. 2021. Towards a complete 3D morphable model of the human head. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11): 4142-4160[DOI:10.1109/TPAMI.2020.2991150]

Ploumpis S, Wang H Y, Pears N, Smith W A P and Zafeiriou S. 2019. Combining 3D morphable models: a large scale face-and-head model//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10926-10935[DOI: 10.1109/CVPR.2019.01119http://dx.doi.org/10.1109/CVPR.2019.01119]

Saito S, Wei L Y, Hu L W, Nagano K and Li H. 2017. Photorealistic facial texture inference using deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5144-5153[DOI: 10.1109/CVPR.2017.250http://dx.doi.org/10.1109/CVPR.2017.250]

Tewari A, Bernard F, Garrido P, Bharaj G, Elgharib M, Seidel H P, Pérez P, Zollhöfer M and Theobalt C. 2019. FML: face model learning from videos//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10804-10814[DOI: 10.1109/CVPR.2019.01107http://dx.doi.org/10.1109/CVPR.2019.01107]

Wu F Z, Bao L C, Chen Y J, Ling Y G, Song Y B, Li S N, Ngan K N and Liu W. 2019. MVF-Net: multi-view 3D face morphable model regression//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 959-968[DOI: 10.1109/CVPR.2019.00105http://dx.doi.org/10.1109/CVPR.2019.00105]

Yao G M, Yuan Y, Shao T J, Li S, Liu S Q, Liu Y, Wang M M and Zhou K. 2021. One-shot face reenactment using appearance adaptive normalization//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual: AAAI Press

Zheng Z R, Yu T, Wei Y X, Dai Q H and Liu Y B. 2019. DeepHuman: 3D human reconstruction from a single image//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7738-7748[DOI: 10.1109/ICCV.2019.00783http://dx.doi.org/10.1109/ICCV.2019.00783]

Alert me when the article has been cited

提交

Multi-scale Transformer based point cloud completion network

Research on 3-D Nonlinear Effect Algorithm

The Construction of Harmonic Map and Its Application in Computer Graphics

A New Method of Shading Based on Environment Texture Mapping

Cloth Simulated in 2D Virtual Environment Based on Mesh Model