Current Issue Cover
三维人脸成像及重建技术综述

刘菲1, 张堃博2, 杨青1, 周树波3, 王云龙2, 孙哲南2(1.首都经济贸易大学;2.中国科学院自动化研究所;3.东华大学)

摘 要
目的 近年来,得益于新型三维视觉测量技术及深度学习模型的飞速发展,三维视觉成为人工智能、虚拟现实等领域的重要支撑技术,三维人脸成像及重建技术取得了突破性进展,不仅能够更好地应对光照、遮挡、表情、姿态等变化,同时增大了伪造攻击难度,大大推动了真实感“虚拟数字人”的重建与渲染,有效提升了人脸系统的安全性。方法 对三维人脸成像技术和重建模型进行了全面综述,尤其对基于深度学习的三维人脸重建进行了系统深入的分析。首先,对三维人脸成像设备及采集系统进行详细梳理及对比归纳,并介绍了基于新传感技术的人脸成像系统;然后,对基于深度学习的三维人脸重建模型进行系统分析,从输入数据源角度分为基于单目图像、基于多目图像、基于视频和基于语音的三维人脸重建算法四大类。结果 通过深入分析,总结三维人脸成像的研究现状及面临的难点与挑战,对未来发展方向及应用进行积极探讨与展望。结论 本文是一篇全面系统的综述,涵盖了近五年经典的三维人脸成像及重建相关的技术与研究,为人脸研究、发展、应用提供了很好的参考。
关键词
3D face imaging and reconstruction technology: a review

(Institute of Automation, Chinese Academy of Sciences)

Abstract
Objective As the breakthrough technology of Artificial Intelligence (AI) in the Big Data Era, deep learning has promoted the face technology to a new upsurge. In recent years, powered by rapid developments of new technologies, such as three-dimensional (3D) vision measurement, image processing chips and deep learning models, 3D vision turns into the key supporting technology in AI, Visual Reality (VR), etc. The studies and applications on 3D face imaging and reconstruction technologies have made important breakthroughs. 3D face data represents exactly multi-dimensional facial attributes on account of the rich visual information as texture, shape, space, etc. Moreover, 3D face data is more robust on changes of large occlusions, expressions, poses, and makes the forgery attack harder. Therefore, 3D face imaging and reconstruction effectively promote the performance of realistic “virtual digital human” reconstruction and rendering. And it contributes to better security of the face system. Method In this paper, we make a comprehensive study on 3D face imaging technology and reconstruction models. In particular, the 3D face reconstruction methods based on deep learning (DL) is systematically and deeply analyzed in details. Firstly, the development and innovation of 3D face imaging devices and capturing systems are discussed through summarizing the public 3D face datasets. It includes consumer imaging devices (such as Kinect) and complex hybrid systems that fusing active and passive 3D imaging technologies to achieve precise geometry and appearance. Moreover, the 3D face imaging based on new sensing technologies are introduced. Then from the perspective of input resources, 3D face reconstruction based on DL are categorized into monocular, multi-view, video and audio reconstruction methods. Specially, the 3D face imaging technology introduces the public classic 3D face datasets, the popular 3D face imaging devices and capturing systems. We find that most high-quality 3D face datasets, such as BU-3DFE, FaceScape and FaceVerse, are captured through a large imaging volume with amount of high-resolution cameras and controlled lighting conditions. They play key roles in applications of realistic rendering, driven animation, re-targeting, etc. On the other hand, it is of great importance to innovate novel optical devices and imaging modules with small size and lightweight algorithm for tiny AI as intelligent mobile devices. For the 3D face reconstruction based on DL, monocular reconstruction has become the most popular technology at present. The state-of-the-art (SOTA) method of 3D face reconstruction is generally the self-supervised training on a large-scale 2D face database. The difficulties for 3D face reconstruction are lack of large-scale 3D face datasets, occlusions and poses of in-the-wild 2D face images, continuous expression deformations, etc. According to the deep learning network structure, it is categorized into general deep convolutional neural network (DCNN, such as ResNet, U-Net, Autoencoder), generative adversarial networks (GAN), implicit neural representation (INR, such as NeRF, SDF) and Transformer. In particular, the 3DMM and FLAME are the two widely used 3D face representation models. The StyleGAN model gives excellent performance in recovering high-quality face texture. Recently INR has achieved remarkable results in 3D scene reconstruction, and the NeRF model plays an important role in reconstructing accurate head avatars. Combining NeRF with GAN shows great potential in reconstructing high-fidelity 3D face geometry and realistic rendering appearances. Moreover, the Transformer model is mainly used in the audio-driven 3D face reconstruction. which greatly improves the breakthrough of accuracy and speed. Result Through deeply analyses, the research difficulties and challenges of 3D face are summarized. The developments in the future are actively discussed and explored. Although the recent researches have made amazing progresses, there are also many challenges in how to enhance the robustness and generalization to real-world lighting, extreme expressions/poses, and how to effectively disentangle the facial attributes (such as identity, expression, albedo, specular reflectance) and recover accurate detailed geometry of facial motions (such as wrinkles). Conclusion In this study, we proposed a comprehensive and systematic review. It covers the classic technologies and researches of 3D face imaging and reconstruction in the last five years, which will be a good reference for the face researches, developments and applications.
Keywords

订阅号|日报