Semantic consistency-relevant multitask splicing-tampered detection

Yulin Zhang; Hongxia Wang; Rui Zhang; Jingyuan Zhang

doi:10.11834/jig.220549

Image Forensics | Views : 0 下载量: 0 CSCD: 1

PDF
Export
Share
Collection
Album

Semantic consistency-relevant multitask splicing-tampered detection
Vol. 28, Issue 3, Pages: 775-788(2023)
Published： 16 March 2023 ，

Accepted： 28 September 2022
DOI： 10.11834/jig.220549
稿件说明：

移动端阅览

Yulin Zhang, Hongxia Wang, Rui Zhang, Jingyuan Zhang. Semantic consistency-relevant multitask splicing-tampered detection. [J]. Journal of Image and Graphics 28(3):775-788(2023)
DOI：

Yulin Zhang, Hongxia Wang, Rui Zhang, Jingyuan Zhang. Semantic consistency-relevant multitask splicing-tampered detection. [J]. Journal of Image and Graphics 28(3):775-788(2023) DOI： 10.11834/jig.220549.

摘要

目的

随着数字图像及编辑软件的广泛应用，伪造图像层出不穷，对新闻传播、法律取证等行业造成了影响。拼接伪造是一种常见的伪造方式，这种伪造方式往往会向原始图像中添加新的对象，导致原始图像语义受到改变、曲解。现有很多基于卷积神经网络的篡改检测方法都更关注篡改痕迹的特征提取，但忽略了伪造图像中的语义不一致。针对拼接伪造中原始图像发生的语义变化，提出了一种以篡改检测为主任务，语义分割和噪声重建为辅助任务的多分辨率全卷积神经网络。

方法

通过多任务策略将语义分割和噪声重建作为辅助任务。语义分割任务旨在捕捉拼接伪造图像过程中产生的语义不一致现象，噪声重建任务允许网络获得更全面的图像噪声分布。为了使网络获取更全面、准确的特征，网络中的RGB流、噪声流和融合模块都使用多分辨率思想从多个分辨率上提取处理不同形状和大小的拼接对象。

结果

本文与其他几种先进的篡改检测网络和基于HRNet(high-resolution network)的基线网络进行了对比实验，在Fantastic Reality和Spliced Dataset两个数据集中，本文方法均取得了最优性能，

分数分别为0.946和0.961。对JPEG(joint photographic experts group)压缩、亮度调节、对比度调节和添加噪声进行鲁棒性实验，结果表明，本文方法针对常见的图像后处理手段具有良好的鲁棒性。

结论

提出的语义一致性引导的多任务多分辨率拼接篡改检测网络检测更加准确，具有良好的鲁棒性，拓展了数字图像取证研究新思路。

Abstract

Objective

Forensics-oriented digital faked images and its editing and modification software have been emerging nowadays. To fake and misinterpret semantics of the original image

forgery-spliced is a commonly-used method in terms of new instances modification to the original image. Conventional methods are mainly concerned about the statistical information and physical features of the image itself in terms of convolutional neural network based (CNN-based) anomaly detection of forged images like edge features and noise features. But

it is still challenged for its semantic inconsistencies. In addition

image-tampered detection is challenged for human-behavioral image post-processing like compression or image filters.

Method

To detect images-forged splicing

semantic segmentation and noise reconstruction are used for CNN and multi-resolution-based detection. Our network-proposed consists of 4 aspects as mentioned below: 1) RGB stream

2) noise stream

3) fusion module

and 4) multi-task module. The RGB stream is used to extract the boundary-tampered artifacts and its semantic information. To extract the noise features of the forged regions

a filter layer-based steganalysis is used because the RGB and noise information can offer multifaceted forgery detection. The semantic segmentation task is oriented to capture the semantic inconsistencies. The noise reconstruction task can yield the network to obtain a more diversified image noise distribution; and the forgery detection task is used to locate the tampered regions. Similar to recent multi-task networks-popular

a discrete loss function is used as well

and the sum of the loss functions for each task is regarded as the overall loss function of the network. To enhance the spatial co-occurrence of the two features further

the RGB and noise stream-derived fusion module can be used to fuse the features before the features are melted into the forgery detection task. Additionally

to obtain more complicated and accurate features

the multi-resolution pathway is implemented to the RGB streams

noise streams and feature fusion modules in the network. To enhance the network's ability

multi-resolution pathway is tailored to perceive semantic and precise location information

and it is beneficial to location-oriented forgery detection tasks.

Result

The comparative experiments are carried out based on 6 tamper detection networks of those are 1) manipulation tracing network(ManTra-Net)

2) coarse to refined network(C2Rnet)

3) multi-task wavelet corrected network(MWC-Net)

4) compression artifact tracing network(CAT-Net)

5) ringed residual U-Net(RRU-Net)

and 6) high-resolution network(HRNet)-based baseline networks on Fantastic Reality and Spliced Dataset. Model training and testing are equipped with Intel Core i7-9700k CPU and NVIDIA GeForce RTX2080Ti GPU. During training

stochastic gradient descent with a momentum of 0.9 is used as the optimizer with an initial learning rate of 0.005 and an exponential decay. The

scores on Fantastic Reality and Spliced Dataset are 0.946 and 0.961 of each. For temporal comparison experiment

our optimization is effective for balancing computational cost and network ability. The commonly-regular compression is in relevant to JPEG

whereas the image filters are used to adjust its contrast pairs and brightness. Therefore

to meet its natural scenario requirement

we design robustness experiments on the Fantastic Reality dataset based on 4 sorts of human-behavioral image post-processing methods of JPEG compression

contrast

brightness and noise distortion adjustment.

Conclusion

To detect forged regions effectively and accurately

a semantic consistency-relevant multi-task and multi-resolution tampering detection network is demonstrated. The multitask strategy is implemented to extract certain semantic features and detect forgery regions in terms of semantic inconsistencies in forged images

while the multi-resolution network enables the network to obtain more diversified image information. Furthermore

robustness-based experiments demonstrate that our network-robust has its potentials for JPEG-compressed image post-processing.

关键词

图像篡改检测语义一致性多任务策略多分辨率高分辨率网络(HRNet)

Keywords

image tampering detectionsemantic consistencymulti-task strategymulti-resolutionhigh-resolution network(HRNet)

references

Alonso H M and Plank B. 2017. When is multitask learning effective? Semantic sequence prediction under varying data conditions//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL: 44-53 [DOI: 10.18653/V1/E17-1005http://dx.doi.org/10.18653/V1/E17-1005]

Bi X L, Wei Y, Xiao B and Li W S. 2019. RRU-Net: the ringed residual U-Net for image splicing forgery detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE: 30-39 [DOI: 10.1109/CVPRW.2019.00010http://dx.doi.org/10.1109/CVPRW.2019.00010]

Bi X L, Zhang Z P, Liu Y B, Xiao B and Li W S. 2021. Multi-task wavelet corrected network for image splicing forgery detection and localization//Proceedings of 2021 IEEE International Conference on Multimedia and Expo (ICME). Shenzhen, China: IEEE: 1-6 [DOI: 10.1109/ICME51207.2021.9428466http://dx.doi.org/10.1109/ICME51207.2021.9428466]

Boroumand M, Chen M and Fridrich J. 2019. Deep residual network for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 14(5): 1181-1193 [DOI: 10.1109/TIFS.2018.2871749]

Caruana R. 1997. Multitask learning. Machine Learning, 28(1): 41-75 [DOI: 10.1023/A:1007379606734]

Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-12-05].https://arxiv.org/pdf/1706.05587.pdfhttps://arxiv.org/pdf/1706.05587.pdf

Chen X R, Dong C B, Ji J Q, Cao J and Li X R. 2021. Image manipulation detection by multi-view multi-scale supervision//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: #1392 [DOI: 10.1109/ICCV48922.2021.01392http://dx.doi.org/10.1109/ICCV48922.2021.01392]

Cherian A and Sullivan A. 2019. Sem-GAN: semantically-consistent image-to-image translation//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 1797-1806 [DOI: 10.1109/WACV.2019.00196http://dx.doi.org/10.1109/WACV.2019.00196]

Cipolla R, Gal Y and Kendall A. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7482-7491 [DOI: 10.1109/CVPR.2018.00781http://dx.doi.org/10.1109/CVPR.2018.00781]

Dong J, Wang W and Tan T N. 2013. CASIA image tampering detection evaluation database//Proceedings of 2013 IEEE China Summit and International Conference on Signal and Information Processing. Beijing, China: IEEE: 422-426 [DOI: 10.1109/ChinaSIP.2013.6625374http://dx.doi.org/10.1109/ChinaSIP.2013.6625374]

Everingham M, van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4]

Guan H Y, Lee Y Y, Yates A, Delgado A, Zhou D, Joy D and Pereira A. 2016. NIST nimble 2016 datasets [DB/OL]. [2021-05-20].https://mfc.nist.gov/https://mfc.nist.gov/

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Hsu Y F and Chang S F. 2006. Detecting image splicing using geometry invariants and camera characteristics consistency//Proceedings of 2006 IEEE International Conference on Multimedia and Expo. Toronto, Canada: IEEE: 549-552 [DOI: 10.1109/ICME.2006.262447http://dx.doi.org/10.1109/ICME.2006.262447]

Jiang X Y and Liu C X. 2021. Edge and region inconsistency-guided image splicing tamper detection network. Journal of Image and Graphics, 26(10): 2411-2420

蒋小玉, 刘春晓. 2021. 边缘与区域不一致性引导下的图像拼接篡改检测网络. 中国图象图形学报, 26(10): 2411-2420 [DOI: 10.11834/jig.200298]

Kniaz V V, Knyaz V A and Remondino F. 2019. The point where reality meets fantasy: mixed adversarial generators for image splice detection//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 215-226

Kwon M J, Yu I J, Nam S H and Lee H K. 2021. CAT-Net: compression artifact tracing network for detection and localization of image splicing//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 375-384 [DOI: 10.1109/WACV48630.2021.00042http://dx.doi.org/10.1109/WACV48630.2021.00042]

Lawgaly A and Khelifi F. 2017. Sensor pattern noise estimation based on improved locally adaptive DCT filtering and weighted averaging for source camera identification and verification. IEEE Transactions on Information Forensics and Security, 12(2): 392-404 [DOI: 10.1109/TIFS.2016.2620280]

Li X L, Yu N H, Zhang X P, Zhang W M, Li B, Lu W, Wang W and Liu X L. 2021. Overview of digital media forensics technology. Journal of Image and Graphics, 26(6): 1216-1226

李晓龙, 俞能海, 张新鹏, 张卫明, 李斌, 卢伟, 王伟, 刘晓龙. 2021. 数字媒体取证技术综述. 中国图象图形学报, 26(6): 1216-1226 [DOI: 10.11834/jig.210081]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Niu Y K, Tondi B, Zhao Y, Ni R R and Barni M. 2021. Image splicing detection, localization and attribution via JPEG primary quantization matrix estimation and clustering. IEEE Transactions on Information Forensics and Security, 16: 5397-5412 [DOI: 10.1109/TIFS.2021.3129654]

Park J, Cho D, Ahn W and Lee H K. 2018. Double JPEG detection in mixed JPEG quality factors using deep convolutional neural network//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 656-672 [DOI: 10.1007/978-3-030-01228-1_39http://dx.doi.org/10.1007/978-3-030-01228-1_39]

Popescu A C and Farid H. 2005. Exposing digital forgeries by detecting traces of resampling. IEEE Transactions on Signal Processing, 53(2): 758-767 [DOI: 10.1109/TSP.2004.839932]

Qiao T T, Zhang J, Xu D Q and Tao D C. 2019. MirrorGAN: learning text-to-image generation by redescription//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 1505-1514 [DOI: 10.1109/CVPR.2019.00160http://dx.doi.org/10.1109/CVPR.2019.00160]

Rao Y and Ni J Q. 2016. A deep learning approach to detection of splicing and copy-move forgeries in images//Proceedings of 2016 IEEE International Workshop on Information Forensics and Security. Abu Dhabi, United Arab Emirates: IEEE: 1-6 [DOI: 10.1109/WIFS.2016.7823911http://dx.doi.org/10.1109/WIFS.2016.7823911]

Salloum R, Ren Y Z and Kuo C C J. 2018. Image splicing localization using a multi-task fully convolutional network (MFCN). Journal of Visual Communication and Image Representation, 51: 201-209 [DOI: 10.1016/j.jvcir.2018.01.010]

Thai T H, Cogranne R, Retraint F and Doan T N C. 2017. JPEG quantization step estimation and its applications to digital image forensics. IEEE Transactions on Information Forensics and Security, 12(1): 123-133 [DOI: 10.1109/TIFS.2016.2604208]

Tian X X, Li H Q, Zhang Q and Zhou A Y. 2021. Dual-channel R-FCN model for image forgery detection. Chinese Journal of Computers, 44(2): 370-383.

田秀霞, 李华强, 张琴, 周傲英. 2021. 基于双通道R-FCN的图像篡改检测模型. 计算机学报, 44(2): 370-383 [DOI: 10.11897/SP.J.1016.2021.00370]

Wang J D, Sun K, Cheng T H, Jiang B R, Deng C R, Zhao Y, Liu D, Mu Y D, Tan M K, Wang X G, Liu W Y and Xiao B. 2021. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3349-3364 [DOI: 10.1109/TPAMI.2020.2983686]

Wang Q and Zhang R. 2016. Double JPEG compression forensics based on a convolutional neural network. EURASIP Journal on Information Security, 2016: #23 [DOI: 10.1186/s13635-016-0047-y]

Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]

Wu Y, AbdAlmageed W and Natarajan P. 2019. ManTra-Net: manipulation tracing network for detection and localization of image forgeries with anomalous features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9535-9544 [DOI: 10.1109/CVPR.2019.00977http://dx.doi.org/10.1109/CVPR.2019.00977]

Xiao B, Wei Y, Bi X L, Li W S and Ma J F. 2020. Image splicing forgery detection combining coarse to refined convolutional neural network and adaptive clustering. Information Sciences, 511: 172-191 [DOI: 10.1016/j.ins.2019.09.038]

Ying Q C, Qian Z X, Zhou H, Xu H S, Zhang X P and Li S Y. 2021. From image to image: immunized image generation//Proceedings of the 29th ACM International Conference on Multimedia. Virtual, China: ACM: 3565-3573 [DOI: 10.1145/3474085.3475520http://dx.doi.org/10.1145/3474085.3475520]

Zhou P, Han X T, Morariu V I and Davis L S. 2018. Learning rich features for image manipulation detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1053-1061 [DOI: 10.1109/CVPR.2018.00116http://dx.doi.org/10.1109/CVPR.2018.00116]

Zhou P, Chen B C, Han X T, Najibi M, Shrivastava A, Lim S N and Davis L. 2020. Generate, segment, and refine: towards generic manipulation segmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: 13058-13065 [DOI: 10.1609/aaai.v34i07.7007http://dx.doi.org/10.1609/aaai.v34i07.7007]

Alert me when the article has been cited

提交

SSRGFD：stereo super-resolution image general forensic dataset

Approach of Texture Segmentation Based on Wavelet Transform and ODPSF

An Improved Algorithm for Shape Matching Based on Multi-resolution

A Multi-resolution Based Image Registration Method for Panorama

View-Dependent Point Set Rendering Based on Hierarchical Clustering Tree