面向屏幕拍摄的端到端鲁棒图像水印算法
Screen-shooting robust watermarking with end-to-end neural network
- 2023年28卷第12期 页码:3713-3730
纸质出版日期: 2023-12-16
DOI: 10.11834/jig.221141
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-12-16 ,
移动端阅览
吴嘉奕, 李晓萌, 秦川. 2023. 面向屏幕拍摄的端到端鲁棒图像水印算法. 中国图象图形学报, 28(12):3713-3730
Wu Jiayi, Li Xiaomeng, Qin Chuan. 2023. Screen-shooting robust watermarking with end-to-end neural network. Journal of Image and Graphics, 28(12):3713-3730
目的
2
在抗屏摄鲁棒图像水印算法的研究中,如何在保证含水印图像视觉质量的同时提高算法的鲁棒性是存在的主要挑战。为此,提出一种基于深度学习的端到端网络框架以用于鲁棒水印的嵌入与提取。
方法
2
在该网络框架中,本文设计了包含摩尔纹在内的噪声层用以模拟真实屏摄噪声造成的失真,并通过网络训练来学习到抵抗屏摄噪声的能力,增强网络生成的含水印图像的鲁棒性;同时引入了最小可察觉失真(just noticeable distortion,JND)损失函数,旨在通过监督图像的JND系数图与含有水印信息的残差图之间的感知差异来自适应控制鲁棒水印的嵌入强度,以提高生成的含水印图像的视觉质量。此外,还提出了两种图像区域自动定位方法,分别用于解决:拍摄图像中前景与背景分割即含水印图像区域的定位矫正问题,以及含水印图像经过数字裁剪攻击后的解码问题。
结果
2
实验结果表明,引入JND损失函数后嵌入水印图像的视觉质量得到了提高,平均的峰值信噪比(peak signal-to-noise ratio,PSNR)、结构相似性(structural similarity,SSIM)可分别达到30.937 1 dB和0.942 4。加入摩尔纹的噪声模拟层后,所提算法的误码率可下降1%~3%,具有抵抗屏摄噪声的能力。另外,将图像的R通道嵌入用于抗裁剪的模板,使得算法可有效抵抗较大程度的数字裁剪攻击。本文算法的计算复杂度较低,对单幅图像进行嵌入时,定位与提取操作的总耗时小于0.1 s,可满足实际应用场景的实时性需求。
结论
2
本文算法的嵌入容量和生成的含水印图像视觉质量较为理想,且在不同拍摄距离、角度以及不同拍摄和显示设备条件下的鲁棒性优于已报道的主流算法。
Objective
2
With the rapid development of the Internet and imaging devices, the security of digital image storage and file sharing has become an important concern. Robust watermarking techniques can be used to solve these problems. The general idea of these techniques is to embed watermark information, such as copyright labels and user identification, into the to-be-protected image imperceptibly and then extract the watermark from the watermarked image even after undergoing some attacks. The two most important properties of robust watermarking are the robustness and visual quality of the watermarked image. Therefore, the watermarked image should be robust against different kinds of attacks and show satisfactory visual quality. As a typical robust watermarking technique, screen-shooting robust watermarking can resist the noises involved during the screen-shooting procedure. In other words, watermark information can still be accurately extracted from the watermarked image after screen-shooting.
Method
2
In this paper, we propose an effective, end-to-end network framework based on deep learning for screen-shooting robust watermarking. In this framework, a screen-shooting noise layer, including a Moiré pattern simulation, is introduced to simulate the noise within the screen-shooting channel so as to learn how to enhance the robustness of the network against realistic noise during the screen-shooting procedure through network training. In order to further improve the visual quality of the generated watermarked image, we define and introduce a just noticeable distortion (JND) loss function that aims to control the strength of the residual image containing the watermark information by supervising the visual perceptual loss between the JND maps of the original and residual images. We also propose two automatic localization methods for watermarked images. The first method locates the watermark of an image in a screen-shooting scenario, wherein the obtained screenshot may not only contain the image displayed on the screen but also some background information, which can affect the result of watermark extraction at the decoding end and render this result useless. To address this problem, this paper proposes the second method, namely, a region localization method that combines deep learning with traditional image processing. This method assumes that the image region that needs to extract the watermark accounts for most of the pixels in the screen-shooting result and that the background color is relatively uniform with no obvious mutation. The localization of the image region containing the watermark can be equated to the problem of foreground extraction in this case. We apply this method to the watermarking of images under digital attack. The robustness of the watermarking algorithm should not be limited to the robustness of the screen-shooting process but also to attacks in the digital environment, such as image filtering, image noise addition, and digital cropping. While the vast majority of the digital attacks can be equated by the noise introduced by the screen-shooting process, digital cropping attacks cannot be regarded as a kind of screen-shooting noise. For this reason, this paper introduces an anti-crop region localization method based on symmetric noise templates. This method divides the image into four sub-images, namely, top-left, bottom-left, top-right, and bottom-right. A two-channel watermark information residual map is generated and embedded in the green and blue channels to create four copies of the same watermark information in one image. Additionally, a symmetric noise template is embedded in the red channel for anti-crop localization. Even when the watermarked image suffers from cropping attacks, the localization method can still accurately extract the watermark information as long as more than 1/4 of the image area exists.
Result
2
Experimental results show that after introducing the JND loss function and embedding watermark, the visual quality of watermarked image is improved, and the average peak signal-to-noise ratio(PSNR) and structural similarity (SSIM) reach 30.937 1 dB and 0.942 4, respectively. After adding the Moiré noise simulation layer, the bit error rate of the proposed scheme is reduced to 1%~3%, which demonstrates the ability of this scheme to resist the noise generated from the screen shooting. This scheme also effectively resists strong cropping attacks by embedding the anti-cropping template into the R channel of the image. The total running time of embedding and extracting a single image is less than 0.1 s, which is suitable for deployment in application scenarios with real-time requirements. Meanwhile, the performance of the proposed algorithm is compared with that of state-of-the-art screen-shooting robust watermarking algorithms across various experimental settings, including screen shooting and digital attack settings. Results of the bit error rate comparison demonstrate that the proposed algorithms not only help the network simulate screen-shooting noise with a high level of robustness against actual screen-shooting noise but also equip the network with the ability to withstand specific digital cropping attacks.
Conclusion
2
This paper proposes an end-to-end embedding-extraction network for robust watermarking against screen shooting. In this network, a Moiré noise simulation layer and a JND loss function module are introduced to enhance the robustness and visual quality of the watermarked images generated by the network. We also design two watermark localization methods to address two realistic scenarios, namely, screen shooting and digital cropping. Our experimental results demonstrate that our proposed scheme achieves a satisfactory embedding capacity and visual quality of the generated watermarked image and that the robustness of our scheme under different shooting distances, angles, and capturing/displaying devices is better than those of some state-of-the-art schemes. In our future research, we aim to investigate the decoding of watermarks when only a portion of the screen image is captured, which is a more intricate process than mere digital cropping and improving the visual quality of watermarked images in scenarios with high embedding capacity.
鲁棒水印屏幕拍摄视觉质量端到端网络自动定位
robust watermarkingscreen-shootingvisual qualityend-to-end networkautomatic localization
Ahumada A J Jr and Peterson H A. 1992. Luminance-model-based DCT quantization for color image compression//Proceedings Volume 1666, Human Vision, Visual Processing, And Digital Display III. San Jose, USA: SPIE: 365-374 [DOI: 10.1117/12.135982http://dx.doi.org/10.1117/12.135982]
Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851 [DOI: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]
Delgado-Guillen L A, Garcia-Hernandez J J and Torres-Huitzil C. 2013. Digital watermarking of color images utilizing mobile platforms//The 56th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). Columbus, USA: IEEE: 1363-1366 [DOI: 10.1109/MWSCAS.2013.6674909http://dx.doi.org/10.1109/MWSCAS.2013.6674909]
Fang H, Chen D D, Huang Q D, Zhang J, Ma Z H, Zhang W M and Yu N H. 2021. Deep template-based watermarking. IEEE Transactions on Circuits and Systems for Video Technology, 31(4): 1436-1451 [DOI: 10.1109/TCSVT.2020.3009349http://dx.doi.org/10.1109/TCSVT.2020.3009349]
Fang H, Chen D D, Wang F, Ma Z H, Liu H G, Zhou W B, Zhang W M and Yu N H. 2022. TERA: screen-to-camera image code with transparency, efficiency, robustness and adaptability. IEEE Transactions on Multimedia, 24: 955-967 [DOI: 10.1109/TMM.2021.3061801http://dx.doi.org/10.1109/TMM.2021.3061801]
Fang H, Zhang W M, Zhou H, Cui H and Yu N H. 2019. Screen-shooting resilient watermarking. IEEE Transactions on Information Forensics and Security, 14(6): 1403-1418 [DOI: 10.1109/TIFS.2018.2878541http://dx.doi.org/10.1109/TIFS.2018.2878541]
Huiskes M J and Lew M S. 2008. The MIR Flickr retrieval evaluation//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. Vancouver, Canada: ACM: 39-43 [DOI: 10.1145/1460096.1460104http://dx.doi.org/10.1145/1460096.1460104]
Kang X G, Huang J W and Zeng W J. 2010. Efficient general print-scanning resilient data hiding based on uniform log-polar mapping. IEEE Transactions on Information Forensics and Security, 5(1): 1-12 [DOI: 10.1109/TIFS.2009.2039604http://dx.doi.org/10.1109/TIFS.2009.2039604]
Katayama A, Nakamura T, Yamamuro M and Sonehara N. 2004. New high-speed frame detection method: side trace algorithm (STA) for i-appli on cellular phones to detect watermarks//Proceedings of the 3rd International Conference on Mobile and Ubiquitous Multimedia. College Park, USA: ACM: 109-116 [DOI: 10.1145/1052380.1052396http://dx.doi.org/10.1145/1052380.1052396]
Nakamura T, Katayama A, Yamamuro M and Sonehara N. 2004. Fast watermark detection scheme for camera-equipped cellular phone//Proceedings of the 3rd International Conference on Mobile and Ubiquitous Multimedia. College Park, USA: ACM: 101-108 [DOI: 10.1145/1052380.1052395http://dx.doi.org/10.1145/1052380.1052395]
Pramila A, Keskinarkaus A and Seppänen T. 2012. Toward an interactive poster using digital watermarking and a mobile phone camera. Signal, Image and Video Processing, 6(2): 211-222 [DOI: 10.1007/s11760-011-0211-2http://dx.doi.org/10.1007/s11760-011-0211-2]
Qin N, Zhang G C and Wei W Y. 2007. A new robust watermarking algorithm embedding text data into image. Journal of Image and Graphics, 12(2): 195-199
秦娜, 张贵仓, 魏伟一. 2007. 一种在图像中嵌入有意义文本信息的新型鲁棒水印算法. 中国图象图形学报, 12(2): 195-199 [DOI: 10.11834/jig.20070202http://dx.doi.org/10.11834/jig.20070202]
Tancik M, Mildenhall B and Ng R. 2020. StegaStamp: invisible hyperlinks in physical photographs//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2114-2123 [DOI: 10.1109/CVPR42600.2020.00219http://dx.doi.org/10.1109/CVPR42600.2020.00219]
Telea A. 2004. An image inpainting technique based on the fast marching method. Journal of Graphics Tools, 9(1): 23-34 [DOI: 10.1080/10867651.2004.10487596http://dx.doi.org/10.1080/10867651.2004.10487596]
Timofte R, Agustsson E, Van Gool L, Yang M H, Zhang L, Lim B, Son S, Kim H, Nah S, Lee K M, Wang X T, Tian Y P, Yu K, Zhang Y L, Wu S X, Dong C, Lin L, Qiao Y, Loy C C, Bae W, Yoo J, Han Y, Ye J C, Choi J S, Kim M, Fan Y C, Yu J H, Han W, Liu D, Yu H C, Wang Z Y, Shi H H, Wang X C, Huang T S, Chen Y J, Zhang K, Zuo W M, Tang Z M, Luo L K, Li S H, Fu M, Cao L, Heng W, Bui G, Le T, Duan Y, Tao D C, Wang R X, Lin X, Pang J X, Xu J C, Zhao Y, Xu X Y, Pan J S, Sun D Q, Zhang Y J, Song X B, Dai Y C, Qin X Y, Huynh X P, Guo T T, Mousavi H S, Vu T H, Monga V, Cruz C, Egiazarian K, Katkovnik V, Mehta R, Jain A K, Agarwalla A, Praveen C V S, Zhou R F, Wen H D, Zhu C, Xia Z Q, Wang Z T and Guo Q. 2017. NTIRE 2017 challenge on single image super-resolution: methods and results//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 1110-1121 [DOI: 10.1109/CVPRW.2017.149http://dx.doi.org/10.1109/CVPRW.2017.149]
Yang X K, Ling W S, Lu Z K, Ong E P and Yao S S. 2005. Just noticeable distortion model and its applications in video coding. Signal Processing: Image Communication, 20(7): 662-680 [DOI: 10.1016/j.image.2005.04.001http://dx.doi.org/10.1016/j.image.2005.04.001]
Yuan S X, Timofte R, Slabaugh G and Leonardis A. 2019. AIM 2019 challenge on image demoireing: dataset and study//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE: 3526-3533 [DOI: 10.1109/ICCVW.2019.00437http://dx.doi.org/10.1109/ICCVW.2019.00437]
Zhang R, Isola P, Efros A A, Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 586-595 [DOI: 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068]
Zheng G, Hu D H, Ge H and Zheng S L. 2021. End-to-end image steganography and watermarking driven by generative adversarial networks. Journal of Image and Graphics, 26(10): 2485-2502
郑钢, 胡东辉, 戈辉, 郑淑丽. 2021. 生成对抗网络驱动的图像隐写与水印模型. 中国图象图形学报, 26(10): 2485-2502 [DOI: 10.11834/jig.200404http://dx.doi.org/10.11834/jig.200404]
Zhu J R, Kaplan R, Johnson J and Li F F. 2018. HiDDeN: hiding data with deep networks//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 682-697 [DOI: 10.1007/978-3-030-01267-0_40http://dx.doi.org/10.1007/978-3-030-01267-0_40]
相关作者
相关机构