发布时间: 2021-10-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200298
2021 | Volume 26 | Number 10

图像分析和识别

边缘与区域不一致性引导下的图像拼接篡改检测网络

蒋小玉, 刘春晓

浙江工商大学计算机与信息工程学院, 杭州 310018

收稿日期: 2020-06-23; 修回日期: 2020-09-22; 预印本日期: 2021-09-29

基金项目: 国家自然科学基金项目（61003188）；浙江省自然科学基金项目（LY14F020004）

作者简介: 蒋小玉, 1994年生, 女, 硕士研究生, 主要研究方向为数字图像处理与模式识别。E-mail: jxy94jxy@163.com
刘春晓, 通信作者, 男, 副教授, 硕士生导师, 主要研究方向为图像处理与理解、机器学习与计算机视觉、模式识别与智能系统。E-mail: cxliu@zjgsu.edu.cn
*通信作者: 刘春晓 cxliu@zjgsu.edu.cn

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2021)10-2411-10

摘要

目的针对已有图像拼接篡改检测方法中存在的真伪判断分类精度不高、拼接篡改区域定位不准确问题，本文设计了一种篡改边缘两侧和篡改区域内外不一致性引导下的重点关注篡改区域与篡改边缘的图像拼接篡改检测卷积神经网络。方法图像内容在篡改过程中，拼接物体的边缘都会留下篡改痕迹，这是图像拼接篡改检测的重要线索。因此，本文设计了一条篡改边缘提取分支，通过学习拼接物体边缘两侧的不一致性，重点提取拼接篡改区域的边缘轮廓。考虑到篡改边缘像素点过少会导致网络难以收敛，提出一个边缘加粗策略，形成一个边缘加粗的"甜甜圈"，使得篡改边缘提取结果更具完整性。在不同图像采集过程中，所用相机设备和光线条件等因素不同，导致每幅图像包含的信息也不尽相同。对此，设计了一条篡改区域定位分支，重点学习来自不同图像拼接区域与周围区域之间不一致性的差异化特征，并将注意力机制引入图像拼接篡改检测的篡改区域定位分支，进一步提高对拼接篡改区域的学习关注程度。面向真伪判断设计了一条图像是否经过拼接篡改的二分类网络分支，不但可以快速有效地给出输入图像是否为篡改图像的判断结果，而且可以与上述两条分支的输出结果一起提供给用户，由用户结合视觉语义信息进行综合判断。结果本文算法与已有的4个代表性方法在4个专业数据集上进行算法实验和性能比较。在真伪判断分类的精确度方面，在Dresden、COCO（common objects in context）、RAISE（a raw images dataset for digital image forensics）和IFS-TC（information forensics and security technical committee）数据集上分别提高了8.3%、4.6%、1.0%和1.0%；在篡改区域定位的准确度方面，F1评分与重叠度IOU（intersection over union）指标较已有方法分别提升了9.4%和8.6%。结论本文算法将真伪判别分类、篡改区域定位和篡改边缘提取融合在一起，互相促进，较大提升了各分支任务的性能表现，在图像拼接篡改检测方面取得了优于已有方法的效果，为数字图像取证技术领域的研究工作拓展了思路。

关键词

图像拼接篡改检测; 卷积神经网络(CNN); 篡改区域定位; 篡改边缘提取; 真伪判别分类

Edge and region inconsistency-guided image splicing tamper detection network

Jiang Xiaoyu, Liu Chunxiao

School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

Supported by: National Natural Science Foundation of China (61003188)

Abstract

Objective With the rapid development of internet technology, digital image processing techniques have become more and more developed in recent years. Nowadays, ordinary users can easily use various software to edit digital images. Although these falsified images can bring us some special visual or entertainment effects, they can also be maliciously tampered. These maliciously tampered images will have a huge impact on litigation evidence collection, criminal investigation, national political and military affairs, etc. Therefore, image forensics research has very important significance. Although the tampered images can be edited in various ways, this study focuses on the detection of image splicing tamper operation. The splicing process of digital images is to copy a small region of a real image and insert it into some region of another real image, so as to tamper the original image content. In the process of inserting the spliced object, some post-processing operations, such as blurring, smoothing, retouching, and blending, may also be used to hide the tampering traces, making the tampered image look more realistic and natural. To solve the problems of state-of-the-art image splicing tamper detection methods, such as low classification accuracy and coarse localizations of the spliced tamper regions, a convolutional neural network for image splicing tamper detection is designed under the guidance of the inconsistency around the tamper edges and the tamper regions to pay more attention on tamper regions and tamper edges. Method First, in the image tampering process, the tamper edges of the spliced objects leave tampering traces, which are important cues for image splicing tamper detection. Therefore, a tamper edge extraction network branch is designed in this study. By learning the inconsistency on both sides of the tamper edges of spliced objects, the tamper edges of spliced tamper regions will be extracted. Considering that it is difficult to make the network converge faster and better due to relatively few tamper edge pixels of the spliced objects, this study expands 6 pixels inward and outward along the tamper edges of the spliced objects, which forms a "doughnut" with a bold tamper edge. It drives the tamper edge extraction network branch to focus on the edge contour of the tampered object by learning the inconsistencies on both sides of the tamper edges. Second, the information contained in each image is different due to factors (e.g., camera equipment, lighting conditions, noising environment) during the image capturing process, which can be helpful in discriminating the tamper regions from their surroundings (i.e., the spliced objects copied from one image to another). Therefore, this study designs a tamper region localization network branch to learn the inconsistency between the spliced region and other regions. We also introduce the attention mechanism into this network branch for the first time to focus on the learning of tamper regions. Finally, a two-category classification network branch for authenticity discrimination is designed, in which 0 denotes untampered images and 1 denotes tampered images. This network branch can quickly and effectively determine whether the input image is a tampered image and help users to jointly determine the final tamper detection result together with the results obtained by the above two network branches. Model training and testing are carried out on the Keras platform with a NVIDIA GeForce GTX 1080Ti GPU card. The stochastic gradient descent method is used to train our model, and the related parameters are batch size of 16, momentum of 0.95, and attenuation rate of 0.000 5. The learning rate is initialized with 0.001 and updated every 6 250 iterations with an update coefficient of 0.99. The total number of iterations is 312 500. Result Our model is compared with fourstate-of-the-art methods (i.e., multi-task fully convolutional network, fully convolutional network, manipulation tracing network, and MobileNets) on four public datasets, namely, Dresden, a raw images dataset for digital image forensics (RAISE), information forensics and security technical committee (IFS-TC), and common objects in context (COCO) datasets. The classification accuracy of authenticity discrimination of our model increases by 8.3% on the Dresden dataset, 4.6% on the COCO dataset, and 1.0% on the RAISE and IFS-TC datasets. In terms of the localization accuracy of tamper regions, the F1 score and intersection over union(IOU) index are improved by 9.4% and 8.6%, respectively, compared with the existing methods. The network model designed in this study shows excellent generalization ability for images with different resolutions. Our method can not only locate the tamper region and extract the tamper edge well, but also improve the classification accuracy of image authenticity discrimination. Conclusion The image splicing tamper detection network proposed in this study consists of three network branches, namely, authenticity discrimination classification, tamper region localization, and tamper edge extraction. The three sub-tasks are fused together to promote each other, which greatly improves the performance of each network branch. The proposed method surpasses most existing methods in image splicing tamper detection, and its main performance advantages are as follows: 1) compared with existing methods, the proposed algorithm can more effectively judge whether the images have been tampered; 2) the proposed algorithm is more accurate than existing methods in locating tamper regions. This study expands the ideas and methods for the research work on digital image forensics techniques.

Key words

image splicing tamper detection; convolutional neural network(CNN); tamper region localization; tamper edge extraction; authenticity discrimination classification

0 引言

图像的拼接篡改是指将复制的某幅真实图像中的一块区域(往往包含一个物体)粘贴到另外一幅真实图像的某个位置，并在插入拼接区域后采用模糊、平滑、润饰和融合等后处理操作来掩盖篡改痕迹，使篡改图像看起来更加逼真自然，从而达到篡改图像内容的目的。图像的拼接篡改会导致虚假证据和谣言，具有一定的社会危害性，为此需要图像拼接篡改检测技术对某些图像的真伪进行判别。

图像拼接篡改检测包含真伪判别分类和篡改区域定位两个技术需求。面向图像拼接操作的真伪判别的核心在于分类。Farid(1999)借助归一化的双线性频谱特征发现拼接篡改区域会破坏原始图像的频域信号之间的相关性，通过多光谱分析检测信号间的相关性来判断图像是否经过拼接篡改。Ng和Chang(2004)提出一种基于真实信号双相扰动的图像拼接篡改检测模型，在CASIA数据集上的检测率为63 %。Shi等人(2007)提出一种基于图像矩特征和马尔可夫转移概率的特征分析法来检测拼接篡改图像。Zhao等人(2011)利用条件共生概率矩阵的三阶统计特征进行图像拼接篡改检测，在降低维度后利用支持向量机(support vector machine，SVM)进行分类，准确率为80.80 %。Rao和Ni(2016)利用卷积神经网络(convolutional neural networks，CNNs)判断输入图像是否篡改。显然，现有面向图像拼接篡改的真伪判断方法在分类精度上还有待提高。

同时，图像拼接篡改检测需要准确定位篡改区域。Mahdian和Saic(2008)利用篡改区域在缩放和旋转等插值过程中导致的图像信号统计变化来定位图像篡改区域。Wei等人(2010)通过引入图像旋转角度的估计对Mahdian和Saic(2008)的方法进行改进。Cao等人(2014)提出一种基于对比度增强的方法。但这些方法都容易受到低质量JPEG压缩率的影响和干扰，无法定位较小的拼接篡改区域。考虑到具有较强特征表示能力的卷积神经网络在对象标记(Girshick等，2014)和图像分割(Long等，2015)等计算机视觉任务上的优异表现，Bappy等人(2017)采用混合CNN-LSTM (long short-term memory)- CNN模型来捕获篡改区域与周围区域之间的差异化特征。虽然利用边缘的差异性对高分辨率图像中的篡改区域起到了很好的定位作用，但对相当平滑的低分辨率图像无效。Wei等人(2019)将丰富的图像隐写分析模型整合到卷积网络中，利用多种人工先验知识引导网络的训练过程。Rao(2020)设计了一种双分支卷积神经网络来学习拼接篡改图像的局部特征，可以有效检测篡改区域，但定位精确度有待提高。

总之，已有的图像拼接篡改检测方法存在图像真伪判断分类精度不高和拼接篡改区域定位不准确等问题。为此，本文设计了一种在篡改边缘两侧和篡改区域内外不一致性引导下，重点关注篡改区域与篡改边缘的图像拼接篡改检测卷积神经网络。主要贡献如下：1)将篡改区域定位、篡改边缘提取和真伪判别分类融为一体，形成一个三分支的多任务学习框架; 2)为了提高篡改边缘提取的完整性，设计了一个篡改边缘加粗的“甜甜圈”，通过向两侧扩充像素点来获得更完整的篡改边缘; 3)将注意力机制引入图像拼接篡改检测的篡改区域定位任务，引导网络在训练和收敛过程中将篡改区域的定位更准确。

1 本文算法

基于深度卷积网络的技术框架，本文设计了一种在篡改边缘两侧和篡改区域内外不一致性引导下的图像拼接篡改检测算法，不仅能够对输入图像进行真伪判别分类，而且能定位拼接篡改区域、提取拼接篡改边缘。

1.1 三分支图像拼接篡改检测网络框架

图像拼接篡改检测通常包括真伪判别分类、篡改区域定位和篡改边缘提取等3个子任务。如果图像真伪判别分类结果为真或假，那么篡改区域定位和篡改边缘提取结果应该分别对应空或非空；反之亦然。如果篡改区域定位结果为空或非空，那么篡改边缘提取结果应该分别对应空或非空；反之亦然。当结果为非空时，篡改区域边界应该与提取的篡改边缘保持一致，篡改边缘的填充应该与定位的篡改区域保持一致。基于上述考虑，本文认为真伪判别分类、篡改区域定位和篡改边缘提取之间具有较强的关联性，并为此设计了一个三分支图像拼接篡改检测网络框架，如图 1所示。该网络的输入为256×256像素的RGB彩色图像，输入网络前进行归一化操作，将图像像素颜色值归一化到0~1之间。输入图像首先经过4个公共的组合卷积层，然后紧跟面向3个子任务的3个网络分支，其中每个组合卷积层都遵循依次为3×3卷积层(图 1中紫色块)、1×1卷积层(图 1中灰色块)、MaxPooling池化层和ReLU激活函数的构造形式。

图 1 本文的网络架构示意图

Fig. 1 Schematic diagram of our network architecture

1) 篡改区域定位分支。在前3个组合卷积层之后引入注意力机制，再依次连接3 × 3卷积层和1 × 1卷积层，最后按照图 1的网络结构进行堆叠卷积，得到篡改区域标签。其间1 × 1卷积的作用是在不影响输入输出维数的情况下，对输入进行线性形变，然后通过激励函数进行非线性处理，增加网络的非线性表达能力。本文使用的卷积操作都是采用大小为1的滑动步长。

2) 真伪判别分类分支。输入图像在经过4个组合卷积层后，再继续连接3个3×3卷积层，最后经过全局平均池化层、全连接层64个结点(图 1中粉色块)和sigmoid函数得到真伪分类标签。本文所有池化层操作均使用大小为3×3且步幅为2的滑动窗口。

3) 篡改边缘提取分支。借助一个3×3大小的梯度卷积核(图 1中绿色块)提取输入图像的边缘图，与输入图像经过4个组合卷积层后的特征图相堆叠，然后连接3个3 × 3卷积层，得到最后的篡改边缘标签。

本文采用交叉熵和均方根损失函数度量每个网络分支的预测误差，并分别以1、1和256 × 256的权重设置将篡改区域定位、篡改边缘提取和真伪判别分类3个网络分支的预测误差结合起来形成综合损失函数。在误差反向传播和网络模型训练过程中，前面的4个公共组合卷积层在综合损失函数的驱动下，提取3个子任务都需要的高层次图像篡改语义特征；后面的3个网络分支在综合损失函数约束下相互牵制训练而成，实现了3个子任务的互相促进和性能提升。

1.2 篡改边缘加粗的“甜甜圈”标签

在分析拼接篡改图像时发现，定位篡改区域的一个有效途径是提取篡改区域的边缘。对此，将篡改边缘扩充成“甜甜圈”形式，如图 2所示。理由有二：1)篡改边缘的像素点信息量很少，导致网络很难准确学习到相应的特征信息，适当扩充像素点可根据篡改边缘的操作痕迹更快学习到篡改区域的边缘信息；2)分布在篡改区域边缘两侧的图像内容来自不同图像，由于拍摄过程中的相机设备、光线条件、环境噪声等原因导致每幅图像包含的信息不同，导致拼接物体从原始图像复制到另一幅图像中时，来自不同图像的区域隐藏的信息存有差别。当网络学习到这种差异时，很快就能找到篡改区域边缘。第2个理由的思想与定位全局篡改区域的思想一致，区别在于一个是从全局角度出发，另一个是从局部角度出发去寻找篡改边缘两边的不一致性。如图 2(a)所示，圆形篡改边缘内部区域为篡改物体，$A$和$B$区域是来自不同图像的内容。

图 2 篡改边缘加粗的“甜甜圈”示意图

Fig. 2 Schematic diagram of the expanded tamper edges doughnut

((a)schematic diagram of the tamper edge boundary; (b)schematic diagram of the "doughnut")

针对上述问题的分析和思考，本文将篡改区域边缘分别向里向外扩充了6个像素点，类似形成了一个边缘加粗的“甜甜圈”，如图 2(b)所示。具体做法如下：1)根据篡改图像和对应的标签图提取篡改物体的灰度图；2)根据灰度图得到篡改物体的边缘；3)由边缘像素开始分别向内和向外扩充6个像素点，使边缘轮廓加宽。由于“甜甜圈”向里向外扩充了像素点，在一定程度上损失了篡改边缘提取的精确度，但在定位整体篡改区域上提高了检测的准确率，既可以较好地避免定位不准确问题，又可以在一定程度上加快网络训练速度。

为了更好地提取篡改区域边缘，本文约束梯度卷积核权重之和为0，且受Zhang等人(2018)方法的启发，利用拉普拉斯核随机初始化25 % 的核对输入图像进行边缘信息提取，输出特征图大小为254×254×16。然后，将滤波结果进行下采样后与后面的特征图相堆叠，进一步加强篡改边缘提取网络分支对篡改边缘信息的学习。

1.3 篡改区域定位的注意力机制

注意力模型在深度学习领域广泛使用，在图像处理、语音识别和自然语言处理等不同类型任务中都有注意力模型的应用。图像处理技术中很多视觉注意力模型将注意力集中于图像某一区域，但这些方法对篡改区域定位往往不够完整，尤其是在细窄部件结构处的定位容易断裂。受Sun等人(2018)和Ronneberger等人(2015)注意力模型的启发，本文首次将注意力机制引入图像拼接篡改检测的篡改区域定位分支，增加网络对篡改区域的重点关注与学习，如图 3所示。主要思想是将特征图卷积处理后得到的权值与原先的特征图相乘，使得反映篡改区域特征的特征图权重大，反之权重小，因此能够使网络更好地关注拼接篡改区域。

图 3 注意力机制网络分支示意图

Fig. 3 Schematic diagram of the attention mechanism network branch

2 实验结果与分析

本文模型训练和测试基于Keras进行，使用一块NVIDIA GeForce GTX 1080Ti GPU利用随机梯度下降法进行训练。训练参数如下：动量为0.95，衰减率为0.000 5，初始学习率为0.001，学习率每6 250次迭代更新一次，更新系数为0.99。batch size设置为16大小，总迭代次数为312 500次。

2.1 实验数据收集与制作

训练本文算法的样本数据来自COCO(common objects in context)(Lin等，2014)、RAISE(a raw images dataset for digital image forensics)(Dang-Nguyen等，2015)和Dresden(Gloe和Böhme，2010)3个数据集。样本数据制作步骤如下：1)为了使训练的模型对不同分辨率图像都具泛化性，分别提取3个数据集的中心块。首先提取RAISE和Dresden数据集的中心块，大小为1 024 × 1 024像素，将其分割成16个不重叠的256 × 256像素的小块。同时提取COCO数据集的中心块，大小为256 × 256像素。2)根据COCO数据集给出的pixelmap数据得到拼接对象，将其复制粘贴到步骤1)中的小块上，且一个拼接物体随机复制粘贴到12幅不同背景图中。粘贴过程中根据拼接物体与背景图的大小关系进行放缩操作，即当篡改物体总面积小于背景图总面积5 % 时进行放大操作；当篡改物体总面积大于背景图总面积45 % 时进行缩小操作。最终选取10万幅原图和10万幅篡改图进行模型训练，并利用剩余数据进行测试。

图 4为上述过程的数据集制作举例，即将图 4(b)中对应于图 4(c)的人形滑板部分抠出，然后经过放缩粘贴到图 4(a)中，得到最终的拼接篡改图像(图 4(d))。

图 4 样本数据制作举例

Fig. 4 Sample data production example

((a)original image 1; (b)original image 2;(c)pixelmap; (d)tampered image)

实验测试使用的公共数据集为CASIA2(Dong等，2013)、EXIF-SC(exchangeable image file format for digital still cameras)(Huh等，2018；Moreira等，2018)和IFS-TC(information forensics and security technical committee)数据集。由于CASIA2数据集未提供篡改区域的真实标签，实验时手动将原图和篡改图像进行相减比对并去噪后得到篡改区域标签。

2.2 有效性验证

图 5展示了以篡改边缘像素为中心分别向里向外扩充4个、6个和8个像素点的篡改边缘提取网络分支的对比实验结果。可以看出，扩充4个像素点的检测结果没有扩充6个像素点的完整，但扩充8个像素点时会丢失更多的精确度。因此，本文模型的训练、测试及对比实验都选择基于扩充6个像素点进行。

图 5 扩充不同像素点的篡改边缘提取实验对比

Fig. 5 Comparison of tamper edges extraction with the expansion of different pixels

((a)tampered images; (b)tamper region masks; (c)4 pixels expansion; (d)6 pixels expansion; (e)8 pixels expansion)

图 6展示了篡改区域定位分支中注意力机制对定位篡改区域的作用。实验分别训练了增加和不增加注意力机制网络分支的模型，并在EXIF-SC数据集上进行测试。对比发现，未增加注意力机制时只能定位部分篡改区域，且存在误判情况。增加注意力机制时能够较好地定位篡改区域，且较好地避免了细窄结构的断裂问题。实验结果表明增加注意力机制对于定位篡改区域的检测是有效的。

图 6 有无增加注意力机制的篡改区域定位实验对比

Fig. 6 Comparison of tamper regions localization with or without attention mechanism

((a)tampered images; (b)tamper region masks; (c)without attention mechanism; (d)with attention mechanism)

图 7展示了有无增加梯度卷积核对篡改边缘提取分支的对比实验结果。可以看出，网络增加梯度卷积核后，“甜甜圈”的表现比未增加的检测效果更显著，验证了增加梯度卷积核对篡改边缘提取具有良好的辅助功能。

图 7 有无增加梯度卷积核的篡改边缘提取实验对比

Fig. 7 Comparison of tamper edges extraction results with or without gradient convolution kernel

((a)tampered images; (b)tamper region masks; (c)without gradient convolution kernel; (d)with gradient convolution kernel)

2.3 算法效果展示与性能比较

为了快速判别待测试图像是否为篡改图像，本文算法利用中间分支得到的真伪判别分类对图像进行一个二分类输出。表 1为本文算法与FCN(fully convolutional networks for semantic segmentation)(Salloum等，2018)、MFCN(multi-task fully convolutional network)(Salloum等，2018)、ManTra-Net(Wu等，2019)和Mobilenets(Howard等，2017)等图像拼接篡改检测方法在不同数据集上的真伪判别分类性能比较。实验中，FCN和MFCN的代码参照文献复现，ManTra-Net和Mobilenets直接使用文献提供的代码。

表 1 不同方法在不同数据集上的真伪判别分类准确度比较
Table 1 Comparison of different methods in authenticity classification accuracy on different datasets

下载CSV

方法	Dresden	RAISE	COCO	IFS-TC
FCN	0.750	0.916	0.708	0.800
MFCN	0.875	0.836	0.916	0.602
ManTra-Net	0.833	0.820	0.791	0.710
Mobilenets	0.708	0.958	0.820	0.900
本文	0.958	0.968	0.962	0.910
注：加粗字体为每列最优结果。

从表 1可以看出，1)本文的真伪判别分类结果在RAISE数据集上表现较为突出，在检测较为困难的IFS-TC数据集上达到了0.910；2)本文方法与其他4种方法相比，在所有数据集上都具有相对较好的真伪判别分类精度；3)本文方法在所有数据集上的真伪判别分类表现都很稳定，精度稳固在0.9左右，充分表明本文方法不仅具有较高的真伪判别分类精度，而且具有较强的稳定性。

表 2为真伪判别分类的曲线下面积(area under curve, AUC)性能指标对比评估结果。可以看出，本文方法的AUC值在各个数据集上总体表现良好，尤其在COCO数据集上达到了0.910，只有在Dresden和IFS-TC数据集上稍微逊于Mobilenets方法。

表 2 不同方法在不同数据集上的真伪判别分类AUC比较
Table 2 Comparison of different methods in authenticity classification AUC on different datasets

下载CSV

方法	Dresden	RAISE	COCO	IFS-TC	EXIF-SC	CASIA2
FCN	0.630	0.567	0.629	0.561	0.603	0.672
MFCN	0.652	0.665	0.768	0.586	0.712	0.656
ManTra-Net	0.718	0.667	0.568	0.679	0.664	0.796
Mobilenets	0.760	0.758	0.850	0.812	0.641	0.649
本文	0.752	0.851	0.910	0.810	0.792	0.868
注：加粗字体为每列最优结果。

为了验证本文方法在篡改区域定位方面的有效性，选取F1评分和重叠度(intersection over union, IOU)作为性能比较的评价指标，并与其他4种方法进行了对比分析。不同方法在Dresden、RAISE、COCO、IFS-TC、EXIF-SC和CASIA2数据集上的篡改区域定位性能表现平均值如表 3所示，实验数据证实了本文方法的有效性。

表 3 不同方法在6个数据集上的篡改区域定位性能比较
Table 3 Performance comparison of different methods in tamper region localization on 6 datasets

下载CSV

方法	F1(上分支)	IOU(上分支)
FCN	0.324 8	0.635 3
MFCN	0.430 8	0.762 0
ManTra-Net	0.384 3	0.651 6
Mobilenets	0.389 1	0.660 0
本文	0.524 8	0.848 6
注：加粗字体为每列最优结果。

图 8展示了本文篡改区域定位结果与现有方法在CASIA2、EXIF-SC和IFS-TC数据集上的对比实验结果。图 8(a)为输入的篡改图像，图 8(b)为真实的篡改区域标签，图 8(c)—(g)分别是FCN、MF- CN、ManTra-Net、Mobilenets和本文方法的篡改区域定位结果。图 8第1—2行、3—4行和5—6行图像分别来自CASIA2、EXIF-SC和IFS-TC数据集。可以看出，FCN和MFCN方法的局部篡改区域定位结果不理想，仅能看出大致轮廓，且出现很多不连续的误判噪点。ManTra-Net和Mobilenets的篡改区域定位结果存在较多误判现象，将原图误认为篡改区域，将真正的篡改区域误认为原图。相比于以上方法，本文方法能够较好地定位篡改区域，误判率较低，充分验证了本文算法的有效性。

图 8 不同方法的篡改区域定位结果对比

Fig. 8 Comparison of tamper region localization results from different methods

((a)tampered images; (b)tamper region masks; (c)FCN; (d)MFCN; (e)ManTra-Net; (f)Mobilenets; (g)ours)

图 9展示了在CASIA2、EXIF-SC和IFS-TC数据集上的部分图像拼接篡改检测结果，图中1—2行、3—4行和5—6行图像分别来自CASIA2、EXIF-SC和IFS-TC数据集。图 9(a)是输入的篡改图像，图 9(b)是真实篡改区域标签，图 9(c)是篡改区域定位结果，图 9(d)是篡改边缘提取结果，即“甜甜圈”，图 9(e)是真伪判别分类结果，其中1代表篡改图，0代表非篡改图。从图 9可以看出，本文算法在定位篡改区域和提取篡改边缘上都能有较好表现，同时能得到较好的真伪判别分类结果。

图 9 不同数据集上的更多检测结果

Fig. 9 More image splicing tamper detection results on different data sets

((a)tampered images; (b)tamper region masks; (c)tamper region location results; (d)tamper edge extraction results; (e)authenticity discrimination)

图 10为算法测试过程中发现的3个检测有误案例。第1行输入的是一幅未经过任何篡改的原始图像，篡改区域定位分支和篡改边缘提取分支展示了错误的检测结果，但是真伪判别分类分支判断的结果是正确的。在此情况下，用户可以根据视觉语义信息判断前两个分支得到的篡改区域和篡改边缘没有意义，从而可根据真伪判别分类分支得出该输入图像是一幅未篡改图像的综合结论。第2行和第3行的输入图像都彻底错判，导致判断错误的原因可能有二：1)测试数据中部分图像的分辨率过低，导致难以判别；2)测试数据中存在多种篡改方式混合的篡改图像，导致较难判断及精确定位。解决这些问题将是下一步研究工作的主要目标。

图 10 检测有误示例

Fig. 10 Failure examples

((a)input images; (b)tamper region masks; (c)tamper regions; (d)tamper edges; (e)authenticity discrimination)

3 总结

针对已有图像拼接篡改检测方法真伪判别分类精度较低且篡改区域定位不够准确等问题，本文提出将篡改区域定位、篡改边缘提取和真伪判别分类3个子任务融为一体，形成一个篡改边缘两侧和篡改区域内外不一致性引导下的图像拼接篡改检测卷积神经网络框架，同时设计了将篡改边缘加粗的“甜甜圈”，并且为篡改区域定位引入了注意力机制，进一步加强了篡改区域定位和篡改边缘提取的准确性。通过与4种主流方法在3个公共数据集和3个自制数据集上的实验结果比较，表明本文方法的真伪判别分类准确度和篡改区域定位与篡改边缘提取的精确度均优于已有方法，而且对不同分辨率的图像数据集都表现出了较好的拼接篡改检测泛化能力。但也表现出如下不足：1)难以判别多种操作方式混合的篡改图像；2)难以判断分辨率过低的篡改图像。为此，下一步的研究工作将从以下两个方面考虑：1)调整网络结构，使其能够判断出混合篡改；2)增加低分辨率样本，进一步提升算法的鲁棒性。

参考文献

Bappy J H, Roy-Chowdhury A K, Bunk J, Nataraj L and Manjunath B S. 2017. Exploiting spatial structure for localizing manipulated image regions//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4970-4979[DOI: 10.1109/ICCV.2017.532]

Cao G, Zhao Y, Ni R R, Li X L. 2014. Contrast enhancement-based forensics in digital images. IEEE Transactions on Information Forensics and Security, 9(3): 515-525 [DOI:10.1109/TIFS.2014.2300937]

Dang-Nguyen D T, Pasquini C, Conotter V and Boato G. 2015. RAISE: a raw images dataset for digital image forensics//Proceedings of the 6th ACM Multimedia Systems Conference. New York, USA: ACM: 219-224[DOI: 10.1145/2713168.2713194]

Dong J, Wang W and Tan T N. 2013. CASIA image tampering detection evaluation database//Proceedings of 2013 IEEE China Summit and International Conference on Signal and Information Processing. Beijing, China: IEEE: 422-426[DOI: 10.1109/ChinaSIP.2013.6625374]

Farid H. 1999. Detecting Digital Forgeries Using Bispectral Analysis. Cambridge: Massachusetts Institute of Technology

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587[DOI: 10.1109/CVPR.2014.81]

Gloe T, Böhme R. 2010. The dresden image database for benchmarking digital image forensics. Journal of Digital Forensic Practice, 3(2/4): 150-159 [DOI:10.1080/15567281.2010.531500]

Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. https://arxiv.org/pdf/1704.04861.pdf

Huh M, Liu A, Owens A and Efros A A. 2018. Fighting fake news: image splice detection via learned self-consistency//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 101-117[DOI: 10.1007/978-3-030-01252-6_7]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[DOI: 10.1007/978-3-319-10602-1_48]

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[DOI: 10.1109/CVPR.2015.7298965]

Mahdian B, Saic S. 2008. Blind authentication using periodic properties of interpolation. IEEE Transactions on Information Forensics and Security, 3(3): 529-538 [DOI:10.1109/tifs.2004.924603]

Moreira D, Bharati A, Brogan J, Pinto A, Parowski M, Bowyer K W, Flynn J P, Rocha A, Scheirer W J. 2018. Image provenance analysis at scale. IEEE Transactions on Image Processing, 27(12): 6109-6123 [DOI:10.1109/TIP.2018.2865674]

Ng T T and Chang S F. 2004. A model for image splicing//Proceedings of 2004 International Conference on Image Processing. Singapore, Singapore: IEEE: 1169-1172[DOI: 10.1109/ICIP.2004.1419512]

Rao Y and Ni J Q. 2016. A deep learning approach to detection of splicing and copy-move forgeries in Images//Proceedings of 2016 IEEE International Workshop on Information Forensics and Security. Abu Dhabi, United Arab Emirates: IEEE: 1-6[DOI: 10.1109/WIFS.2016.7823911]

Rao Y, Ni J Q, Zhao H M. 2020. Deep learning local descriptor for Image splicing detection and localization. IEEE Access, 8: 25611-25625 [DOI:10.1109/ACCESS.2020.2970735]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28]

Salloum R, Ren Y Z, Kuo C C J. 2018. Image splicing localization using a multi-task fully convolutional network (MFCN). Journal of Visual Communication and Image Representation, 51: 201-209 [DOI:10.1016/j.jvcir.2018.01.010]

Shi Y Q, Chen C H and Wen C. 2007. A natural image model approach to splicing detection//Proceedings of the 9th Workshop on Multimedia and Security. New York, USA: ACM: 51-62[DOI: 10.1145/1288869.1288878]

Sun M, Yuan Y C, Zhou F and Ding E. 2018. Multi-attention multi-class constraint for fine-grained image recognition//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 834-850[DOI: 10.1007/978-3-030-01270-0_49]

Wei B L, Yu M, Chen K and Jiang J G. 2019. Deep-BIF: blind image forensics based on deep learning//Proceedings of 2019 IEEE Conference on Dependable and Secure Computing. Hangzhou, China: IEEE: 1-6[DOI: 10.1109/DSC47296.2019.8937712]

Wei W M, Wang S Z, Zhang X P, Tang Z J. 2010. Estimation of image rotation angle using interpolation-related spectral signatures with application to blind detection of image forgery. IEEE Transactions on Information Forensics and Security, 5(3): 507-517 [DOI:10.1109/tifs.2010.2051254]

Wu Y, AbdAlmageed W and Natarajan P. 2019. ManTra-Net: manipulation tracing network for detection and localization of image forgeries with anomalous features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9543-9552[DOI: 10.1109/CVPR.2019.00977]

Zhang Z P, Zhang Y X, Zhou Z and Luo J B. 2018. Boundary-based image forgery detection by fast shallow CNN//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 2658-2663[DOI: 10.1109/ICPR.2018.8545074]

Zhao X D, Wang S L, Li S H and Li J H. 2011. A comprehensive study on third order statistical features for image splicing detection//Proceedings of the 10th International Workshop on Digital Watermarking. Atlantic City, USA: IEEE: 243-256[DOI: 10.1007/978-3-642-32205-1_20]