融合3D注意力和Transformer的图像去雨网络

王美华; 柯凡晖; 梁云; 范衠; 廖磊

发布时间： 2022-05-20
摘要点击次数： 2730
全文下载次数： 1227
DOI: 10.11834/jig.210794
2022 | Volume 27 | Number 5

融合3D注意力和Transformer的图像去雨网络

王美华¹, 柯凡晖¹, 梁云¹, 范衠², 廖磊¹(1.华南农业大学数学与信息学院, 广州 510642;2.汕头大学工学院, 汕头 515063)

摘要

目的因为有雨图像中雨线存在方向、密度和大小等各方面的差异，单幅图像去雨依旧是一个充满挑战的研究问题。现有算法在某些复杂图像上仍存在过度去雨或去雨不足等问题，部分复杂图像的边缘高频信息在去雨过程中被抹除，或图像中残留雨成分。针对上述问题，本文提出三维注意力和Transformer去雨网络(three-dimension attention and Transformer deraining network，TDATDN)。方法将三维注意力机制与残差密集块结构相结合，以解决残差密集块通道高维度特征融合问题；使用Transformer计算特征全局关联性；针对去雨过程中图像高频信息被破坏和结构信息被抹除的问题，将多尺度结构相似性损失与常用图像去雨损失函数结合参与去雨网络训练。结果本文将提出的TDATDN网络在Rain12000雨线数据集上进行实验。其中，峰值信噪比(peak signal to noise ratio，PSNR)达到33.01 dB，结构相似性(structural similarity，SSIM)达到0.927 8。实验结果表明，本文算法对比以往基于深度学习的神经网络去雨算法，显著改善了单幅图像去雨效果。结论本文提出的TDATDN图像去雨网络结合了3D注意力机制、Transformer和编码器—解码器架构的优点，可较好地完成单幅图像去雨工作。

关键词

单幅图像去雨卷积神经网络(CNN) Transformer 3D注意力 U-Net

3D attention and Transformer based single image deraining network

Wang Meihua¹, Ke Fanhui¹, Liang Yun¹, Fan Zhun², Liao Lei¹(1.College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China;2.College of Engineering, Shantou University, Shantou 515063, China)

Abstract

Objective Vision-based computer systems can be used to process and analyze acquired images and videos in fuzzy weather like rainy, snowy, sleet or foggy. These image quality degradation issues derived from severe weather conditions will significantly distort the image visual quality and reduce the performance of the computer vision system. Hence, it is important to develop computer image deraining automatic processing algorithms. Our research focuses on the issue of single image based removing rain streaks. The traditional image rain removal model is mainly based on the prior information to remove the rain from the image. It regards the rain image as a combination of the rain layer and the background layer, and defines the separation of the rain layer and the background layer by the image deraining task. Due to the existing differences in related to direction, density, and size of rain streaks in rain images, a single image derived de-raining issue is a challenging computer vision task currently. Deep learning has benefited to de-raining images but existing models has challenges like excessive rain removal or insufficient rain removal on complicated images scenario. The high-frequency edge information of some complex images is erased during the rain removal process, or rain components remaining in the rain removal image. We propose this paper proposes the three-dimension attention and Transformer de-raining network (TDATDN) single image rain removal network, which improves the image rain removal network based on the encoder-decoder architecture and integrates 3D attention, Transformer and encoder-decoder take advantages of the structure to enhance the image to the rain effect. Our training dataset consists of 12 000 pairs of training images (including three types of rain images with different rain densities), and 1 200 test set images are used to test the rain removal effect. The input image size is scaled to 256×256 for training and testing. Adam optimizer is used for training and learning. The initial learning rate is set to 1×10^-4, and its network epoch number is 100. The learning rate is multiplied by 0.5 when reach 15 times. Method Our method melts the three-dimension attention mechanism into the residual dense block structure to resolve the challenge of high-dimensional feature fusion via the residual dense block channel. Then, our proposed three-dimension attention residual dense block as the backbone network to build an encoder-decoder-based architecture image de-raining network, and uses Transformer mechanism to calculate the global contextual relevance of the deep semantic information of the network. The Transformer obtained self-attention feature encoding by is up-sampling operation based on the decoder structure image restoration path. To obtain a rain removal result with richer high-frequency details the up-sampling operation obtains the feature map of the image is spliced in the channel direction with the corresponding encoder-based feature map. For the image high-frequency information loss and the structure information is erased in the rain removal process, our problem solving combines the multi-scale structure similarity loss with the commonly used image de-raining loss function to improve the training of the de-raining network. Result Our TDATDN network is demonstrated on the Rain12000 rain streaks dataset. Among them, the peak signal to noise ratio (PSNR) reached 33.01 dB, and the structural similarity (SSIM) reached 0.927 8. A comparative experiment was carried out to verify the fusion algorithm results. The result of the comparative experiment illustrated that our algorithm has its priority to improve the effect of a single image oriented rain removing. Conclusion Our image de-raining network takes the advantages of 3D attention mechanism, Transformer and encoder-decoder architecture into account.

Keywords

single image deraining convolutional neural network(CNN) Transformer 3D attention U-Net

在线采编平台

在线出版

年度会议

下载中心

年度信息