吴从中,陈曦,季栋,詹曙(合肥工业大学计算机与信息学院, 合肥 230009)
目的 现存的去噪算法中很多在去除噪声的同时都存在边缘信息过光滑、易产生彩色伪影的问题，为了解决这些缺点，本文提出了一种基于联合感知损失的深度残差去噪网络。方法 首先利用低通滤波器将噪声图片分解成高频层和低频层，然后将包含噪声和边缘信息的高频层输入设计好的残差网络中，通过常规逐像素损失方法学习端到端的残差映射预测出噪声残差图片，再由一个从输入直接通往输出的全局跳跃连接处理得到初始较模糊的去噪结果，最后级联一个预训练好的语义分割网络用来定义感知损失，指导前面的去噪模型学习更多语义特征信息来增强被模糊的边缘细节，得到更清晰真实的去噪结果。结果 本文从定性和定量两个方面进行对比实验。以峰值信噪比（PSNR）作为量化指标来评价算法性能，结果表明所提出的网络在同其他对比方法一样使用逐像素损失训练时能产生最好的指标结果，在Set5、Set14和BSD100测试集25噪声级别时的结果分别为30.51 dB、30.60 dB和29.38 dB。在视觉定性分析上，本文提出的感知损失模型明显取得了更清晰的去噪结果，相比其他方法产生的模糊区域该方法保留了更多的边缘信息和纹理细节。此外还进行了盲去噪测试实验，对一张含有不同噪声级别的图片进行去噪处理，结果表明本文训练好的算法模型可以一次性处理多种未知级别的噪声并产生满意的去噪输出而且没有多余伪影。结论 基于边缘增强的感知损失残差网络的图像去噪算法在去除噪声的同时可以保留更多容易被模糊的边缘细节，改善去噪结果过平滑的问题，提高图像视觉效果。
Image denoising via residual network based on perceptual loss
Wu Congzhong,Chen Xi,Ji Dong,Zhan Shu(School of Computer and Information, Hefei University of Technology, Hefei 230009, China)
Objective Image denoising is a classical image reconstruction problem in low-level computer vision.It estimates the latent clean image from a noisy one.Digital images are often affected by the noise caused by imaging equipment and external environment in the process of digitization and transmission.Although several methods have achieved reasonable results in recent years,they rarely mentioned the over-smoothing effects and the loss of edge details.Thus,a novel image denoising method via residual learning based on edge enhancement is proposed.Method Recently,due to its powerful learning ability,very deep convolutional neural network has been widely used for image restoration.Inspired by ResNet,unlike other direct denoising networks,identity mappings are introduced to enable our residual network to increase the depth,and then slightly modify the architecture to adapt better to the denoising task.Pooling layers and batch normalization are removed to preserve details.Instead of these,high-frequency layer decomposition and global skip connection are used to prevent over-fitting.They change the input and output of the network to reduce the solution space.To speed up the training process,we select the rectified linear unit (ReLU) as the activation function and remove it before the convolution layer.Traditionally,image restoration work used the per-pixel loss between the ground truth and the restored image as the optimization target to obtain excellent quantitative scores.However,in recent research,minimizing pixel-wise errors only on the basis of low-level pixels has proven prone to loss of details and smoothens the results.Meanwhile,the perceptual loss function has shown that it can generate high-quality images with a better visual performance by capturing the difference between the high-level feature representations,but it sometimes fails to preserve color and local spatial information.To combine both benefits,we propose a new joint loss function that consists of a normal pixel-to-pixel loss and a perceptual loss with appropriate weights.In summary,the flow of our method is described as follows.First,the high-frequency layer of the noisy image is used as the input by removing the background information.Then,a residual mapping is trained to predict the difference between clean and noisy images as output instead of the final denoised image.The denoised result is improved further,and a joint loss function is defined as the weighted sum of the pixel-to-pixel Euclidean and perceptual losses.A well-trained convolutional neural network is connected to learn the semantic information,which we will likely measure in our perceptual loss.This setup encourages the train process to learn similar feature representations rather than match each low-level pixel,which can guide the front denoising network in reconstructing more edges and details.Unlike normal denoising models for only one specific noise level,our single model can deal with the noise of unknown levels (i.e.,blind denoising).We employ CBSD400 as the training set and evaluate the quality in Set5,Set14,and CBSD100 with noise levels of 15,25,and 50,respectively.To train the network for a specific noise level,we generate the noisy images by adding Gaussian noise with standard deviations of σ=15,25,50.Alternatively,we train a single blind network for the unknown noise range[1,50].Result To verify the effectiveness of the proposed network,we show the quantitative and qualitative results of our method in comparison to those of state-of-the-art methods,including BM3D,TNRD,and DnCNN.The performance of the algorithm is evaluated by the peak signal-to-noise ratio as the quantitative indicator.Results show that the proposed network training with MSE loss can solely produce the best index results.The proposed algorithm (MSE-S) is better by 0.63 dB、0.55 dB and 0.17 dB compared with BM3D,TNRD,and DnCNN,respectively.In the qualitative visual sense,the perceptual loss model proposed in this paper clearly achieves a clearer denoising result.Compared with the fuzzy regions generated by other methods,this method preserves more edge information and texture details.We perform another experiment to show the ability of blind denoising.The input is composed of noisy parts with three levels,10,30,and 50.Results indicate that our blind model can generate a satisfactory restored output without artifacts even when the input is corrupted by several levels of noise in different parts.Conclusion In this paper,we describe a deep residual denoising network of 26 weight layers where perceptual loss is adopted to enhance the information detail.Residual learning and high-frequency layer decomposition are used to reduce the solution space to speed up the training process without pooling layers and batch normalization.Unlike the normal denoising model for only one specific noise level,our new model can deal with blind denoising problems with different unknown noise levels.The experiments show that the proposed network achieves superior performances both in quantitative and qualitative results,and recovers majority of the missing details from low-quality observations.In the future,we will explore how to handle other kinds of noise,especially the complex real-world noise,and consider a single comprehensive network for more image restoration tasks.In addition,we will likely focus on researching more visually perceptible indicators in addition to PSNR.