目的 变电站图像拼接篡改是电力系统的一大安全隐患，针对篡改图像背景复杂、篡改内容尺度不一造成的误检漏检问题以及相关研究较少，本文提出一种面向变电站的拼接篡改图像的双通道检测模型。方法 两通道均采用深度学习方法自适应提取篡改图像和残差图像的特征，其中篡改图像包含丰富的色彩特征和内容信息，残差图像重点凸显了篡改区域的边缘，有效应对了篡改图像多样性导致的篡改特征提取困难问题；将特征金字塔结构Transformer通道作为网络主分支通过全局交互机制获取图像全局信息，建立关键点之间的联系，使模型具备良好的泛化性和多尺度特征处理能力；引入浅层卷积神经网络(convolutional neural network, CNN)通道作为辅助分支，着重提取篡改区域的边缘特征，使模型在整体轮廓上更容易定位篡改区域。结果 实验在自制变电站拼接篡改数据集(self-made substation splicing tampered datasets, SSSTD)、CASIA和NIST16上与4种同类型方法进行比较。定量上看，在SSSTD中，本文模型相对性能第2的模型的精确率、召回率、F1和平均精度分别提高了0.12%、2.17%、1.24%和7.71%；在CASIA和NIST16中，本文模型也取得了最好成绩。定性上看，所提模型减少了误检和漏检，同时定位精度更高。结论 本文所提的双通道拼接篡改检测模型结合了Transformer和CNN在图像篡改检测方面的优势，提高了模型的检测精度，适用于复杂变电站场景下的篡改目标检测。
Pyramid Transformer combined with shallow CNN for substation image tampering detection
xing jian hao, tian xiu xia, han yi(Shanghai University of Electric Power)
Objective With the widespread application of intelligent power inspection, image information becomes especially important, however, the rapid development of image tampering technology nowadays gives some unscrupulous elements a new way to harm the power system. As an important component of the power system, substations are responsible for the interconversion of different voltage levels, and ensuring the full-time output of stable voltage and the reasonable use of substation resources is the basis for the safe and stable operation of the entire power network. However, if the collected substation images are maliciously tampered, it may not only cause the failure of the smart grid system, but also make the operators misjudge the actual situation of the substation, which will eventually lead to power system failure and even cause major accidents such as large-scale power outages, bringing irreversible losses to the national production. Therefore, how to detect the tampered images of substations is a key task to ensure the stability of power systems. Since the complex background of tampered images and the different scales of tampered contents cause the existing detection models to have the problem of false detection and leakage detection, while the related research on image splicing tampering detection in power scenes is less, so this paper proposes a dual-channel detection model for splicing tampered images of electrical equipment in substation scenes. Method The model consists of three parts: Transformer channel with feature pyramid structure, shallow CNN channel, and Network head. The input tampered image size is 512×512×3, and the output is the detection and localization result of the tampered image. Both channels use deep learning methods to adaptively extract features of the original color image and the residual image, where the original color image contains rich color features and content information, and the residual image focuses on highlighting the edges of the tampered region, which effectively solves the problem of difficult extraction of tampered features caused by the diversity of tampered images. In this paper, the feature pyramid structure Transformer channel is used as the main feature extraction channel, which consists of the pyramid structure Transformer and a progressive local decoder (PLD). Transformer can efficiently extract features and establish connections between feature points by global attention from the first layer of the model on the global sensory field. Meanwhile, the use of pyramid structure gives the network better generalization and multi-scale feature processing capability. PLD enables features with different depth and expressiveness to guide and fuse with each other, which can solve the problems of attention scattering and underestimation of local features to improve the detail processing capability. The shallow CNN channel is used as an auxiliary detection channel, and the shallow network is used to extract the edge features of the tampered region in the residual image, so that the model can locate the tampered region more easily in the overall contour. Among them, the Residual block is the residual network module, which forms the backbone of the shallow network, and its input is the residual image generated from the tampered image through the high-pass filtering layer. The parallel axial attention block (PAA block) introduces different sizes of dilated convolution to increase the perceptual field of the shallow network, and the parallel axial attention mechanism (PAA) helps the network to extract contextual semantic information. The features of two tributaries are fused into Network head by channel, and the experiments in this paper show that merging by channel is more effective than accumulating by element. Finally, the Network head detects the presence or absence of tampered regions in the image and locates the tampered regions accurately. Result The experiments are first conducted on pre-training datasets and pre-training weights are obtained, and the test results show that the model in this paper has good detection effect for various tampering targets. The model is fine-tuned on the basis of the pre-training weights and compared with four models of the same type on the self-made substation splicing tampered datasets (SSSTD), CASIA, and NIST16, and four evaluation metrics, accuracy, recall, F1, and average accuracy, are selected for quantitative analysis. In SSSTD, the accuracy, recall, F1 and average precision indexes of this paper improved by 0.12%, 2.17%, 1.24% and 7.71%, respectively, compared with the model with the 2nd highest performance; in CASIA, this paper"s model still achieved the best results in the four evaluation indexes; in NIST16, various detection models achieved higher values in accuracy, and this paper"s model achieved higher values in recall rate, F1 and average precision indexes are substantially improved compared with the four comparison models. Qualitatively, the proposed model improves the problem of false detection and missed detection, while having higher localization accuracy, and the overall detection effect is better than other models. Conclusion How to detect tampered substation image splicing is a key task to ensure the stability of power system. This paper designs a new complex substation image splicing tampering detection model based on the feature pyramid structure Transformer and shallow CNN dual channel. The feature pyramid structure Transformer channel obtains rich semantic information and visual features of tampered images through global interaction mechanism, which enhances the accuracy and multi-scale processing capability of the detection model. The shallow CNN as an auxiliary channel focuses on extracting residual image edge features, which makes it easier for the model to locate tampered regions in the overall contour. The models are measured on different splicing tampering datasets, and all the models in this paper achieve the optimal results, and it is further shown from the visualization that the model in this paper has the best detection effect in the actual substation scenario. However, this paper mainly investigates image splicing tampering detection, but there are diverse types of tampering in reality, and the next step is to investigate other types of tampered image detection to improve the compatibility of tampering detection models.