多级特征全局一致性的伪造人脸检测
Multi-level features global consistency for human facial deepfake detection
- 2022年27卷第9期 页码:2708-2720
纸质出版日期: 2022-09-16 ,
录用日期: 2022-05-26
DOI: 10.11834/jig.211254
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-09-16 ,
录用日期: 2022-05-26
移动端阅览
杨少聪, 王健, 孙运莲, 唐金辉. 多级特征全局一致性的伪造人脸检测[J]. 中国图象图形学报, 2022,27(9):2708-2720.
Shaocong Yang, Jian Wang, Yunlian Sun, Jinhui Tang. Multi-level features global consistency for human facial deepfake detection[J]. Journal of Image and Graphics, 2022,27(9):2708-2720.
目的
2
随着深度伪造技术的快速发展
人脸伪造图像越来越难以鉴别
对人们的日常生活和社会稳定造成了潜在的安全威胁。尽管当前很多方法在域内测试中取得了令人满意的性能表现
但在检测未知伪造类型时效果不佳。鉴于伪造人脸图像的伪造区域和非伪造区域具有不一致的源域特征
提出一种基于多级特征全局一致性的人脸深度伪造检测方法。
方法
2
使用人脸结构破除模块加强模型对局部细节和轻微异常信息的关注。采用多级特征融合模块使主干网络不同层级的特征进行交互学习
充分挖掘每个层级特征蕴含的伪造信息。使用全局一致性模块引导模型更好地提取伪造区域的特征表示
最终实现对人脸图像的精确分类。
结果
2
在两个数据集上进行实验。在域内实验中
本文方法的各项指标均优于目前先进的检测方法
在高质量和低质量FaceForensics++数据集上
AUC (area under the curve)分别达到99.02%和90.06%。在泛化实验中
本文的多项评价指标相比目前主流的伪造检测方法均占优。此外
消融实验进一步验证了模型的每个模块的有效性。
结论
2
本文方法可以较准确地对深度伪造人脸进行检测
具有优越的泛化性能
能够作为应对当前人脸伪造威胁的一种有效检测手段。
Objective
2
Human facial images interpretation is based on personal identity information like communication
access control and payment. However
advanced deep forgery technology causes faked facial information intensively. It is challenged to distinguish faked information from real ones. Most of the existing deep learning methods have weak generalization ability to unseen forgeries. Our method is focused on detecting the consistency of source features. The source features of deepfakes are inconsistent while source features are consistent in real images.
Method
2
First
a destructive module of facial structure is designed to reshuffle image patches. It allows our model to local details and abnormal regions. It restricts overfitting to face structure semantics
which are irrelevant to our deepfake detection task. Next
we extract the shallow
medium and deep features from the backbone network. We develop a multi-level feature fusion module to guide the fusion of features at different levels. Specifically
shallower leveled features can provide more detailed forged clues to the deeper level
while deeper features can suppress some irrelevant details in the features at the shallower level and extend the regional details of the abnormal region. The network can pay attention to the real or fake semantics better at all levels. In the backbone network
the shallow
medium and deep features are obtained via a channel attention module
and then merge them into a guided dual feature fusion module. It is accomplished based on the guided dual fusion of shallow features to deep features and the guided fusion of deep features to shallow features. The feature maps output are added together by the two fusion modules. In this way
we can mine forgery-related information better in each layer. Third
we extract a global feature vector from the fused features. To obtain a consistency map
we calculate the similarity between the global feature vector and each local feature vector (i.e.
the feature vector at each local position). The inconsistent areas are highlighted in this map. We multiply the output of the multi-level feature fusion module by the consistency map. The obtained result is combined with the output of the backbone network to the classifier for the final binary classification. We use forged area labels to learn better and label the forged area of each face image in sequential: 1) to align the forged face image for its corresponding real face image and calculate the difference between their corresponding pixel values; 2) to generate a difference image of those spatial size are kept equivalent between real and fake face images; 3) to convert the difference image to a grayscale image through converting each pixel value to [0
1] linearly
the difference image is binarized with a threshold of 0.1
resulting in the final forged area label. Our main contributions are shown as below: 1) to capture the inconsistent source features in forged images
a global consistency module is developed; 2) to make the network pay more attention to forged information and suppress irrelevant background details at the same time
a multi-level feature guided fusion module is facilitated; 3) to prevent our model from overfitting to face structure semantics
a destructive module of facial structure is designed to distinguish fake faces from real ones. Our method achieves good performance for the intra-dataset test and the cross-dataset test both. The test results show that we achieve highly competitive detection accuracy and generalization performance. During the experiment
we take 30 frames with equal spacing for each video in the training set
100 frames for each video in the test set. For each image
we choose the largest detected sample and convert its size to 320×320 pixels. For experimental settings
we use Adam optimization with a learning rate of 0.000 2. The batch size is 16.
Result
2
Our method is compared to 8 latest methods on two datasets (including five forgery methods). In the FaceForensics++ (FF++) dataset
we obtain the best performance. On the FF++ (low-quality) dataset
the area under the curve(AUC) value is increased by 1.6% in comparison with the second best result. For the generalization experiment of four forgery methods in FF++
we achieve better result beyond the baseline. For the cross-dataset generalization experiment (trained on FF++ dataset and tested on Celeb-DF dataset)
we achieve the best AUC on both datasets. In addition
the ablation experiments are carried out on FF++ dataset to verify the effectiveness of each module as well.
Conclusion
2
This method can detect deepfakes accurately and has good generalization performance. It has potential to deal with the threat of deep forgery.
人脸伪造检测深度伪造多级特征学习全局一致性注意力机制
face forgery detectiondeep fakesmulti-level feature learningglobal consistencyattention mechanism
Afchar D, Nozick V, Yamagishi J and Echizen I. 2018. MesoNet: a compact facial video forgery detection network//Proceedings of 2018 IEEE International Workshop on Information Forensics and Security (WIFS). Hong Kong, China: IEEE: 1-7 [DOI: 10.1109/WIFS.2018.8630761http://dx.doi.org/10.1109/WIFS.2018.8630761]
Chai L, Bau D, Lim S N and Isola P. 2020. What makes fake images detectable? Understanding properties that generalize//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 103-120 [DOI: 10.1007/978-3-030-58574-7_7http://dx.doi.org/10.1007/978-3-030-58574-7_7]
Cozzolino D, Poggi G and Verdoliva L. 2017. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection//Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. Philadelphia, USA: ACM: 159-164 [DOI: 10.1145/3082031.3083247http://dx.doi.org/10.1145/3082031.3083247]
Cozzolino D, Thies J, Rössler A, Riess C, Nießner M and Verdoliva L. 2019. ForensicTransfer: weakly-supervised domain adaptation for forgery detection [EB/OL]. [2021-11-27].https://arxiv.org/pdf/1812.02510.pdfhttps://arxiv.org/pdf/1812.02510.pdf
Dang H, Liu F, Stehouwer J, Liu X M and Jain A K. 2020. On the detection of digital face manipulation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR). Seattle, USA: IEEE: 5780-5789 [DOI: 10.1109/CVPR42600.2020.00582http://dx.doi.org/10.1109/CVPR42600.2020.00582]
Das S, Seferbekov S, Datta A, Islam M S and Amin M R. 2021. Towards solving the DeepFake problem: an analysis on improving DeepFake detection using dynamic face augmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, Canada: IEEE: 3769-3778 [DOI: 10.1109/ICCVW54120.2021.00421http://dx.doi.org/10.1109/ICCVW54120.2021.00421]
Gao S H, Cheng M M, Zhao K, Zhang X Y, Yang M H and Torr P. 2021. Res2Net: a new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2): 652-662 [DOI: 10.1109/TPAMI.2019.2938758]
Gunawan T S, Hanafiah S A M, Kartiwi M, Ismail N, Za′bah N F and Nordin A N. 2017. Development of photo forensics algorithm by detecting photoshop manipulation using error level analysis. Indonesian Journal of Electrical Engineering and Computer Science, 7(1): 131-137 [DOI: 10.11591/ijeecs.v7.i1.pp131-137]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huh M, Liu A, Owens A and Efros A A.2018. Fighting fake news: image splice detection via learned self-consistency//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 106-124 [DOI: 10.1007/978-3-030-01252-6_7http://dx.doi.org/10.1007/978-3-030-01252-6_7]
Jiang X Y and Liu C X. 2021. Edge and region inconsistency-guided image splicing tamper detection network. Journal of Image and Graphics, 26(10): 2411-2420
蒋小玉, 刘春晓. 2021. 边缘与区域不一致性引导下的图像拼接篡改检测网络. 中国图象图形学报, 26(10): 2411-2420) [DOI: 10.11834/jig.200298]
Li L Z, Bao J M, Zhang T, Yang H, Chen D, Wen F and Guo B N. 2020a. Face X-ray for more general face forgery detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 5000-5009 [DOI: 10.1109/CVPR42600.2020.00505http://dx.doi.org/10.1109/CVPR42600.2020.00505]
Li Y Z, Yang X, Sun P, Qi H G and Lyu S. 2020b. Celeb-DF: a large-scale challenging dataset for deepfake forensics//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 3204-3213 [DOI: 10.1109/CVPR42600.2020.00327http://dx.doi.org/10.1109/CVPR42600.2020.00327]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu H G, Li X D, Zhou W B, Chen Y F, He Y, Xue H, Zhang W M and Yu N H. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 772-781 [DOI: 10.1109/CVPR46437.2021.00083http://dx.doi.org/10.1109/CVPR46437.2021.00083]
Liu Z Z, Qi X J and Torr P H S. 2020. Global texture enhancement for fake face detection in the wild//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 8057-8066 [DOI: 10.1109/CVPR42600.2020.00808http://dx.doi.org/10.1109/CVPR42600.2020.00808]
Lukas J, Fridrich J and Goljan M. 2005. Determining digital image origin using sensor imperfections//Proceedings Volume 5685, Image and Video Communications and Processing 2005. San Jose, USA: SPIE: 249-260 [DOI: 10.1117/12.587105http://dx.doi.org/10.1117/12.587105]
Masi I, Killekar A, Mascarenhas R M, Gurudatt S P and AbdAlmageed W. 2020. Two-branch recurrent network for isolating deepfakes in videos//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 667-684 [DOI: 10.1007/978-3-030-58571-6_39http://dx.doi.org/10.1007/978-3-030-58571-6_39]
Matern F, Riess C and Stamminger M. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations//Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). Waikoloa, USA: IEEE: 83-92 [DOI: 10.1109/WACVW.2019.00020http://dx.doi.org/10.1109/WACVW.2019.00020]
Nguyen H H, Fang F M, Yamagishi J and Echizen I. 2019a. Multi-task learning for detecting and segmenting manipulated facial images and videos//Proceedings of the 10th IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS). Tampa, USA: IEEE: 1-8 [DOI: 10.1109/BTAS46853.2019.9185974http://dx.doi.org/10.1109/BTAS46853.2019.9185974]
Nguyen H H, Yamagishi J and Echizen I. 2019b. Capsule-forensics: using capsule networks to detect forged images and videos//Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, UK: IEEE: 2307-2311 [DOI: 10.1109/ICASSP.2019.8682602http://dx.doi.org/10.1109/ICASSP.2019.8682602]
Qian Y Y, Yin G J, Sheng L, Chen Z X and Shao J. 2020. Thinking in frequency: face forgery detection by mining frequency-aware clues//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 86-103 [DOI: 10.1007/978-3-030-58610-2_6http://dx.doi.org/10.1007/978-3-030-58610-2_6]
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J and Niessner M. 2019. FaceForensics++: learning to detect manipulated facial images//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul. Korea (South): IEEE: 1-11 [DOI: 10.1109/ICCV.2019.00009http://dx.doi.org/10.1109/ICCV.2019.00009]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xu Y Y, Yan W, Yang G K, Luo J L, Li T and He J N. 2020. CenterFace: joint face detection and alignment using face as point. Scientific Programming, 2020: #7845384 [DOI: 10.1155/2020/7845384]
Yu N, Davis L and Fritz M. 2019. Attributing fake images to GANs: learning and analyzing GAN fingerprints//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 7555-7565 [DOI: 10.1109/ICCV.2019.00765http://dx.doi.org/10.1109/ICCV.2019.00765]
Zhao H Q, Wei T Y, Zhou W B, Zhang W M, Chen D D and Yu N H. 2021. Multi-attentional deepfake detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 2185-2194 [DOI: 10.1109/CVPR46437.2021.00222http://dx.doi.org/10.1109/CVPR46437.2021.00222]
Zhou P, Han X T, Morariu V I and Davis L S. 2017. Two-stream neural networks for tampered face detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 1831-1839 [DOI: 10.1109/CVPRW.2017.229http://dx.doi.org/10.1109/CVPRW.2017.229]
相关作者
相关机构