目的 在抗屏摄鲁棒图像水印算法的研究中，如何在保证含水印图像视觉质量的同时提高算法的鲁棒性是主要存在的挑战。为此本文提出一种基于深度学习的端到端网络框架以用于鲁棒水印的嵌入与提取。方法 在该网络框架中,本文设计了包含摩尔纹在内的噪声层用以模拟真实屏摄噪声造成的失真，并通过网络训练来学习到抵抗屏摄噪声的能力，增强网络生成的含水印图像的鲁棒性；同时引入了JND损失函数，旨在通过监督图像的JND系数图与含有水印信息的残差图之间的感知差异来自适应控制鲁棒水印的嵌入强度，以提高生成的含水印图像的视觉质量。此外，本文还提出了两种图像区域自动定位方法，分别用于解决：拍摄图像中前景与背景分割即含水印图像区域的定位矫正问题，以及含水印图像经过数字裁剪攻击后的解码问题。结果 实验结果表明，引入JND损失函数后嵌入水印后图像的视觉质量得到了提高，平均的峰值信噪比（peak signal to noise ratio，PSNR）、结构相似性（structural similarity，SSIM）可分别达到30.9371 dB和0.9424。加入了摩尔纹的噪声模拟层后，本文算法的误码率可下降至1%~3%，具有了抵抗屏摄噪声的能力。另外，本文利用图像的R通道嵌入用于抗裁剪的模板，使得算法可有效抵抗较大程度的数字裁剪攻击。本文算法的计算复杂度较低，对单幅图像进行嵌入时，定位与提取操作的总耗时小于0.1秒，可满足实际应用场景的实时性需求。结论 本文算法的嵌入容量和生成的含水印图像视觉质量较为理想，且在不同拍摄距离，角度以及不同拍摄和显示设备条件下的鲁棒性优于已报道的主流算法。
Screen-shooting robust watermarking with end-to-end neural network
Wu Jiayi, Li Xiaomeng, Qin Chuan(University of Shanghai for Science and Technology)
Objective With the rapid development of Internet and the upgrading of imaging devices, the security of digital image storage and file sharing is becoming an important concern. Robust watermarking techniques can be used to solve these problems and the general idea is to embed watermark information, such as copyright label and user identification, into the to-be-protected image imperceptibly and extract the watermark from the watermarked image, even after undergoing some attacks. The two most important properties of robust watermarking are the robustness and visual quality of watermarked image, which mean the watermarked image should be robust against different kinds of attacks and has satisfactory visual quality simultaneously. As a typical type of robust watermarking, screen-shooting robust watermarking should be able to resist the noises involved during the screen-shooting procedure. In other words, watermark information can still be extracted correctly from the watermarked image after screen-shooting. Method In this paper, we proposed an effective, end-to-end network framework based on deep learning for screen-shooting robust watermarking. In this framework, a screen-shooting noise layer, including Moiré pattern simulation, is introduced to simulate the noise within the screen-shooting channel, so as to learn how to enhance the robustness of the network against realistic noise during the screen-shooting procedure through network training. In order to further improve the visual quality of the generated watermarked image, we define and introduce a JND loss function, which aims to control the strength of the residual image containing watermark information by supervising the visual perceptual loss between the JND map of original image and the residual image. In addition, two automatic localization methods for watermarked image are presented. Two automatic localization methods for watermarked images are presented in this paper. The first method is applied to locate the watermark of an image in a screen shooting scenario. In this scenario, the screen shot result may not only contain the image displayed on the screen but also some background information of the screen. This background information can interfere and affect the result of watermark extraction at the decoding end, rendering it useless. Therefore, this paper proposes a region localization method that combines deep learning and traditional image processing. The method assumes that the image region that needs to extract the watermark accounts for most of the pixels in the screen-shooting result, and the background color is relatively uniform with no obvious mutation. The localization of the image region containing the watermark can be equated to the problem of foreground extraction in this case. The second localization method proposed in this paper is applied to the watermarking of images under digital attack. The paper argues that the robustness of the watermarking algorithm should not be limited to the robustness of the screen-shooting process, but also for attacks in the digital environment such as image filtering, image noise addition, digital cropping, and so on. While the vast majority of digital attacks can be equated by the noise introduced by the screen-shooting process, digital cropping attacks cannot be considered as a kind of screen-shooting noise. For this reason, this paper introduces an anti-crop region localization method based on symmetric noise templates. The method divides the image into four sub-images: top-left, bottom-left, top-right, and bottom-right. A two-channel watermark information residual map is generated and embedded in the green and blue channels to create four copies of the same watermark information in one image. Additionally, a symmetric noise template is embedded in the red channel for anti-crop localization. Even when the watermarked image suffers from cropping attacks, the localization method can still correctly extract the watermark information as long as more than 1/4 of the image area exists. Result Experimental results show that after introducing the JND loss function and embedding watermark, visual quality of watermarked image is improved, and average peak signal to noise ratio (PSNR) and structural similarity (SSIM) can reach 30.9371 dB and 0.9424 respectively. After adding the moiré noise simulation layer, the bit error rate of the scheme we proposed can be reduced to 1%~3%, which shows the ability to resist the noise generated from the screen-shooting. In addition, the proposed scheme can effectively resist strong cropping attacks by embedding the anti-cropping template into the R channel of the image. The total running time of embedding and extracting a single image is less than 0.1 second, which is suitable for deployment in application scenarios with real-time requirements. Meanwhile, the performance of the proposed algorithm is compared with state-of-the-art screen-shooting robust watermarking algorithms across various experimental settings, including screen-shooting and digital attack settings. The results of the bit error rate (BER) comparison demonstrate that the algorithms presented in this paper not only facilitate the network to simulate screen-shooting noise with a high level of robustness against actual screen-shooting noise, but also equip the network with the ability to withstand the specific digital cropping attack. Conclusion This paper propose an end-to-end embedding-extraction network for robust watermarking against screen-shooting. In this network a Moiré noise simulation layer and a JND loss function module is introduced to enhance the robustness and visual quality of the watermarked images generated by the network. Additionally, we design two watermark localization methods to address two realistic scenarios, screen-shooting and digital cropping. Our experimental results demonstrate that the proposed scheme achieves a satisfactory embedding capacity and visual quality of the generated watermarked image and the robustness of our scheme under different conditions of shooting distances, angles, and capturing/displaying devices is better than some of the current state-of-the-art schemes. Our further research works involve investigating the decoding of watermarks when only a portion of the screen image is captured, which is a more intricate process than mere digital cropping and improving the visual quality of watermarked images in scenarios with high embedding capacity, thus striving for more desirable outcomes.