Current Issue Cover
深度伪造及其取证技术综述

丁峰1, 匡仁盛1, 周越1, 孙珑2, 朱小刚3,4, 朱国普2(1.南昌大学软件学院, 南昌 330047;2.哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150006;3.南昌大学公共政策与管理学院, 南昌 330047;4.江西省物联网产业技术研究院, 鹰潭 335003)

摘 要
深度学习作为机器学习的一个具有前景的重要分支,在计算机视觉方面取得了重大突破。深度伪造(Deepfake)通常指的是使用深度学习(deep learning)进行涉及人脸和人声的多媒体伪造技术,如果被恶意滥用会给社会带来灾难。深度伪造不仅限于面部的替换,还有修改面部特征、修改表情、唇形同步、姿势变换、完整脸生成、篡改音频到视频以及文本到视频等方式。人类面部在社会、政治、经济等方面的敏感性,使得深度伪造技术威胁着社会和个人的安全。对深度伪造产物进行检测也成为数字取证领域的一个重要研究课题。为了提供对Deepfake检测研究工作的最新概述,本文描述了各种针对解决Deepfake相关问题的处理方法。本文主要参考了谷歌学术检索2018—2022共5年的深度伪造论文,分为不同类别进行分析比较,并且详细介绍了深度伪造数据集的特点以及伪造方法,简述了深度伪造技术及其基本原理,介绍了检测器在深度伪造技术数据集上的性能效果,分别从输入维度、浅层特征和深层特针对深度伪造检测技术进行分类,并对未来发展前景进行展望。
关键词
A survey of Deepfake and related digital forensics

Ding Feng1, Kuang Rensheng1, Zhou Yue1, Sun Long2, Zhu Xiaogang3,4, Zhu Guopu2(1.School of Software, Nanchang University, Nanchang 330047, China;2.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150006, China;3.School of Public Policy and Administration, Nanchang University, Nanchang 330047, China;4.Jiangxi Institute of Interest of Things Industry Technology, Yingtan 335003, China)

Abstract
Deep learning,a promising branch of machine learning,has made significant breakthroughs in computer vision.However,Deepfake,which refers to the set of techniques for forging human-related multimedia data using deep learning,can bring disasters to society if used maliciously.It is not only limited to facial replacement,but also other manipulations,such as fabricating facial features,manipulating expressions,synchronizing lips,modifying head gestures,entire face synthesis,and tampering related audios to videos and related texts to videos.Moreover,it can be used to generate faked pornographic videos or even faked speeches to subvert state power.Thus,deep forgery technology can greatly threaten society and individuals,thereyby detecting Deepfake has also become an important research topic in digital forensics.We conducted a systematic and critical survey to provide an overview of the latest research on Deepfake detection by exploring the recent developments in Deepfake and related forensic techniques.This survey mainly referred to papers on Deepfake in Google Scholar during 2018—2022.This survey divided the Deepfake detection techniques into two categories for analysis and comparison:input dimensions and forensic features.First,a comprehensive and systematic introduction of digital forensics is presented from the following aspects:1) the development and security of deep forgery detection technology,2) Deepfake technology architecture,and 3) the prevailing datasets and evaluation metrics.Then,this survey presents Deepfake techniques in several categories.Finally,future challenges and development prospects are discussed.In terms of image and video effects,Deepfake techniques are usually divided into four categories:face replacement,lip synchronization,head puppets,and attribute modification.The most commonly used Deepfake algorithms are based on selfencoders,generative adversarial networks,and diffusion models.An typical autoencoder consists of two convolutional neural networks acting as an encoder and a decoder.The encoder reduces the dimensions of the input targets' facial image and encodes it into a vector corresponding to facial features.We share the parameters of the encoder;that is,we use the same encoder to learn only the common feature information for the encoder network.The structure of a generative adversarial network is based on a generator and a discriminator.The generator is similar to the decoder in an autoencoder,which converts the input noise into a picture and sends it to the discriminator for discrimination along with the real existing picture.The discriminator and the generator use back-propagation to optimize the parameters.Moreover,diffusion model is a parameterized Markov chain trained using variational inference to produce samples that match the data after a finite time.There are always two processes to train a diffusion model.One is the forward process,also called the diffusion process.The other process is reverse diffusion,also known as the reverse process,which slowly restores the original image from noise through continuous sampling.In the Deepfake detection task,the datasets have also evolved to fill past gaps.In general,this survey divides the Deepfake datasets into two generations.The first-generation datasets are often not large enough,and the quality of the content is not satisfying because of the low degree of research fervor.These source videos are usually from video sites or existing face datasets,which can lead to copyright and privacy concerns.The main first-generation datasets are UADFV,DF-TIMIT,FaceForensics,and diverse fake face dataset(DFFD).The second generation of face forgery datasets has improved forgery effects and image clarity.The main second-generation datasets are Celeb-DF,Deepfake detection challenge dataset(DFDC) preview,DeeperForensic-1.0,DFDC,Korean Deepfake detection dataset(KoDF),etc.In terms of input dimension,detecting Deepfake can be roughly divided into three categories:1) the first category is inputting the image or key frame from the video,namely,inputting the image or key frame extracted from the video and judging the input data from the visual performance.This category is commonly used because it can be promoted easily to other computer vision classification models,and most Deepfake videos are conducted by frame-by-frame images.2) The second is inputting continuous frames from video.In particular,multiple consecutive frames are inputted to allow the model to perceive the difference in the relationship between the frames from real and fake videos.3) The third is inputting multiple frames and audio simultaneously from the video;that is,the video's authenticity is detected by examining its video frames and audio together.The features focused on by Deepfake detection techniques also vary.This survey divides them into four categories:1) the frequency domain-based approach looks for anomalies in the video at the signal level,treating the video as a sequence of frames and a synchronized audio signal.Such anomalies,including image mismatches and mismatches in audio-video synchronization,are usually generated from the mismatches at the signal level during Deepfake video generation.2) The texture and spatio-temporal approaches tend to focus only on face position and feature matching in the forged video generation process,where breakdowns that violate the laws of physics and human physiology may occur.3) The reconstruction – classification learning methods emphasize the common compact representations of genuine faces and enhance the learned representations to be aware of unknown forgery patterns.Classification learning involves mining the essential discrepancy between real and fake images,facilitating the understanding of forgeries.4) Data-driven methods are detection methods that do not target specific features.However,they use supervised learning to feed real and fake videos into the model for training.The road to the research on deep forgery techniques and deep forgery detection is still long.We must overcome the existing shortcomings and face the challenges of future technological advances.
Keywords

订阅号|日报