目的 现存的去噪算法中很多在去除噪声的同时都存在边缘信息过光滑、易产生彩色伪影的问题，为了解决这些缺点本文提出了一种基于联合感知损失的深度残差去噪网络。方法 首先利用低通滤波器将噪声图片分解成高频层和低频层，然后将包含噪声和边缘信息的高频层输入设计好的残差网络中，通过常规逐像素损失方法学习端到端的残差映射预测出噪声残差图片，再由一个从输入直接通往输出的全局跳跃连接处理得到初始较模糊的去噪结果，最后级联一个预训练好的语义分割网络用来定义感知损失，指导前面的去噪模型学习更多语义特征信息来增强被模糊的边缘细节，得到更清晰真实的去噪结果。结果 本文从定性和定量两个方面进行对比实验。以峰值信噪比(PSNR)作为量化指标来评价算法性能，结果表明所提出的网络在同其他对比方法一样使用逐像素损失训练时能产生最好的指标结果，相比BM3D、TNRD和DnCNN在Set14测试集25噪声级别时的结果分别提高了约0.63dB、0.55dB和0.17dB。在视觉定性分析上，本文提出的感知损失模型明显取得了更清晰的去噪结果，相对其他方法产生的模糊区域该方法保留了更多的边缘信息和纹理细节。此外还进行了盲去噪测试实验，对一张含有不同噪声级别的图片进行去噪处理，结果表明本文训练好的算法模型可以一次性处理多种未知级别的噪声并产生满意的去噪输出而且没有多余伪影。结论 实验结果表明，基于边缘增强的感知损失残差网络的图像去噪算法在去除噪声的同时可以保留更多容易被模糊的边缘细节，改善去噪结果过平滑的问题，提高图像视觉效果。
Image denoising via residual network based on perceptual loss
wucongzhong,chenxi,jidong,zhanshu(School of Computer and Information, Hefei University of Technology)
Objective: Image denoising is a classical image reconstruction problem in low-level computer vision, which estimates the latent clean image from a noisy one. In reality, digital images are often affected by the noise caused by imaging equipment and external environment in the process of digitization and transmission. Although some methods achieved reasonable results in recently years, they rarely mentioned about the over-smoothing effects and the loss of edge details. For this reason a novel image denoising method via residual learning based on edge enhancement is proposed. Method: Recently, due to the powerful learning ability, very deep convolutional neural network was widely used to the image restoration. Inspired by the ResNet, different from other direct denoising network, the identity mappings are introduced to allow our residual network to increase the depth, and then slightly modify the architecture to better adapt to denoising task. Pooling layers and batch normalization are removed for reserving details. Instead of these, high-frequency layer decomposition and global skip connection are used to prevent over-fitting. They change the input and output of network in order to reduce the solution space. To speed up the training process, we select the rectified linear unit (ReLU) as activation function and remove it before convolution layer. Traditionally, image restoration works used per-pixel loss between ground truth and restored image as the optimization target, which can achieve the excellent quantitative scores. However, in recent researches, minimizing pixel-wise errors depending only on low-level pixels has proved that it may cause the loss of details and make the results smooth. On the other hand, the perceptual loss function has shown it can generate high-quality images with a better visual performance
by capturing the difference of high-level feature representations, but sometimes it fails to preserve color and local spatial information. To combine both benefits, we propose a new joint loss function consisting of a normal pixel-to-pixel loss and a perceptual loss together with appropriate weights. In summary, the flow of our method is described below. At first, the high-frequency layer of noisy image is used as the input by removing background information. The secondly, a residual mapping is trained to predict the difference between clean and noisy images as output instead of the final denoised image. Furtherly improve the denoised result, a joint loss function is defined as the weighted sum of pixel-to-pixel Euclidean loss and perceptual loss. A well-trained convolutional neural network is connected to learn the semantic information we would like to measure in our perceptual loss. It encourages the train process to learn similar feature representations rather than match each low-level pixel, which can guide front denoising network to reconstruct more edges and details. Different from normal denoising models for only one specific noise level,our single model can deal with noise of unknown levels(i.e., blind denoising). We employ CBSD400 as the training set and evaluate the quality in Set5、Set14 and CBSD100 with noise level of 15, 25, 50 respectively. To train network for known specific noise level, we generate the noisy images by adding Gaussian noise with standard deviations of . Alternatively, we train a single blind network for the unknown noise range . Result: In order to verify the effectiveness of the proposed network, we show the quantitative and qualitative results of our method in comparison to state-of-the-art methods, including BM3D, TNRD and DnCNN. The performance of the algorithm was evaluated by peak signal-to-noise ratio (PSNR) as a quantitative indicator. The results show that the proposed network training using MSE loss solely can produce the best index results. Compared with BM3D、TNRD and DnCNN ,the proposed algorithm MSE-S improves 0.63dB、0.55dB and 0.17dB respectively. In the qualitative visual sense, the perceptual loss model proposed in this paper clearly achieves a clearer denoising result. Compared with the fuzzy regions generated by other methods, this method preserves more edge information and texture details. To show the ability of blind denoising, we give an addition experiment. The input is composed of noisy parts with three levels, i.e., level 10, level 30 and level 50. With the result, we see our blind model can generate satisfying restored output without artifacts even the input is corrupted by several levels of noise in different parts. Conclusion: In this paper, we have described a deep residual denoising network of 26 weight layers where perceptual loss is adopted to enhance more detail information. Residual learning and high-frequency layer decomposition are used to reduce the solution space in order to speed up training process without pooling layers and batch normalization. Different from the normal denoising model for only one specific noise level, our new model has the ability to deal with blind denoising problem with different unknown noise levels. Experiments show that the proposed network achieves superior performances both in quantitative and qualitative results, and recovers majority of missing details from low-quality observations. In future, we will explore to handle other kinds of noise, especially the complex real-world noise and consider a single comprehensive network for more image restoration tasks. In addition, it should also be dedicated to researching more visually perceptible indicators instead of PSNR.