发布时间: 2018-10-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180069 2018 | Volume 23 | Number 10 图像处理和编码

 收稿日期: 2018-02-27; 修回日期: 2018-06-01 基金项目: 国家自然科学基金项目（61371156） 第一作者简介: 吴从中, 1964年生, 男, 副教授, 主要研究方向为信号与信息处理。E-mail:329161005@qq.com;陈曦, 硕士研究生, 研究方向为机器学习与图像复原。E-mail:245699640@qq.com;季栋, 硕士研究生, 研究方向为机器学习与图像分割。E-mail:jidong_forever@163.com. 中图法分类号: TP319 文献标识码: A 文章编号: 1006-8961(2018)10-1483-09

# 关键词

Image denoising via residual network based on perceptual loss
Wu Congzhong, Chen Xi, Ji Dong, Zhan Shu
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
Supported by: National Natural Science Foundation of China (61371156)

# Abstract

Objective Image denoising is a classical image reconstruction problem in low-level computer vision.It estimates the latent clean image from a noisy one.Digital images are often affected by the noise caused by imaging equipment and external environment in the process of digitization and transmission.Although several methods have achieved reasonable results in recent years, they rarely mentioned the over-smoothing effects and the loss of edge details.Thus, a novel image denoising method via residual learning based on edge enhancement is proposed. Method Recently, due to its powerful learning ability, very deep convolutional neural network has been widely used for image restoration.Inspired by ResNet, unlike other direct denoising networks, identity mappings are introduced to enable our residual network to increase the depth, and then slightly modify the architecture to adapt better to the denoising task.Pooling layers and batch normalization are removed to preserve details.Instead of these, high-frequency layer decomposition and global skip connection are used to prevent over-fitting.They change the input and output of the network to reduce the solution space.To speed up the training process, we select the rectified linear unit (ReLU) as the activation function and remove it before the convolution layer.Traditionally, image restoration work used the per-pixel loss between the ground truth and the restored image as the optimization target to obtain excellent quantitative scores.However, in recent research, minimizing pixel-wise errors only on the basis of low-level pixels has proven prone to loss of details and smoothens the results.Meanwhile, the perceptual loss function has shown that it can generate high-quality images with a better visual performance by capturing the difference between the high-level feature representations, but it sometimes fails to preserve color and local spatial information.To combine both benefits, we propose a new joint loss function that consists of a normal pixel-to-pixel loss and a perceptual loss with appropriate weights.In summary, the flow of our method is described as follows.First, the high-frequency layer of the noisy image is used as the input by removing the background information.Then, a residual mapping is trained to predict the difference between clean and noisy images as output instead of the final denoised image.The denoised result is improved further, and a joint loss function is defined as the weighted sum of the pixel-to-pixel Euclidean and perceptual losses.A well-trained convolutional neural network is connected to learn the semantic information, which we will likely measure in our perceptual loss.This setup encourages the train process to learn similar feature representations rather than match each low-level pixel, which can guide the front denoising network in reconstructing more edges and details.Unlike normal denoising models for only one specific noise level, our single model can deal with the noise of unknown levels (i.e., blind denoising).We employ CBSD400 as the training set and evaluate the quality in Set5, Set14, and CBSD100 with noise levels of 15, 25, and 50, respectively.To train the network for a specific noise level, we generate the noisy images by adding Gaussian noise with standard deviations of σ=15, 25, 50.Alternatively, we train a single blind network for the unknown noise range [1,50]. Result To verify the effectiveness of the proposed network, we show the quantitative and qualitative results of our method in comparison to those of state-of-the-art methods, including BM3D, TNRD, and DnCNN.The performance of the algorithm is evaluated by the peak signal-to-noise ratio as the quantitative indicator.Results show that the proposed network training with MSE loss can solely produce the best index results.The proposed algorithm (MSE-S) is better by 0.63 dB、0.55 dB and 0.17 dB compared with BM3D, TNRD, and DnCNN, respectively.In the qualitative visual sense, the perceptual loss model proposed in this paper clearly achieves a clearer denoising result.Compared with the fuzzy regions generated by other methods, this method preserves more edge information and texture details.We perform another experiment to show the ability of blind denoising.The input is composed of noisy parts with three levels, 10, 30, and 50.Results indicate that our blind model can generate a satisfactory restored output without artifacts even when the input is corrupted by several levels of noise in different parts. Conclusion In this paper, we describe a deep residual denoising network of 26 weight layers where perceptual loss is adopted to enhance the information detail.Residual learning and high-frequency layer decomposition are used to reduce the solution space to speed up the training process without pooling layers and batch normalization.Unlike the normal denoising model for only one specific noise level, our new model can deal with blind denoising problems with different unknown noise levels.The experiments show that the proposed network achieves superior performances both in quantitative and qualitative results, and recovers majority of the missing details from low-quality observations.In the future, we will explore how to handle other kinds of noise, especially the complex real-world noise, and consider a single comprehensive network for more image restoration tasks.In addition, we will likely focus on researching more visually perceptible indicators in addition to PSNR.

# Key words

image denoising; residual learning; perceptual loss; hierarchical mode

# 0 引言

1) 为了保留更多的细节信息，该去噪网络中移除了池化层和批标准化，为此又提出分层模式和残差映射来压缩网络输入和输出的映射范围，降低训练难度，优化训练速度。

2) 与其他直接降噪网络不同，该网络引入了恒等映射作为捷径连接，形成由一系列残差单元组成的链式结构，信号可以从一个单元直接传播到其他单元。与直接模式相比，这种链式模块可以促进学习过程，避免梯度消失。

3) 引入语义分割网络作为损失网络来提取特征图定义感知损失。与单纯使用逐像素损失函数造成细节丢失不同，联合感知损失使得去噪图像可以保留更多容易被模糊的边缘信息。

# 1.1 去噪网络

 $x = {x_{{\rm{high}}}} + {x_{{\rm{low}}}}$ (1)

 $y = x-n$ (2)

 $y = x-F({x_{{\rm{high}}}})$ (3)

 ${R^u} = F({R^{u-1}}) + {R^{u-1}}$ (4)

# 1.2 损失函数

 ${L_{{\rm{joint}}}} = {L_{{\rm{MSE}}}} + \lambda {L_{{\rm{SegNet}}}}$ (5)

# 1.2.2 感知损失函数

 ${L_{{\rm{SegNet}}}} = \frac{1}{{{W_i}{H_i}}}{\left\| {{\phi _i}(x-F({x_{{\rm{high}}}}))-{\phi _i}\left( y \right)} \right\|^2}$ (7)

# 2.3 结果分析

Table 1 Average PSNR for noise level 15, 25 and 50 on datasets Set5, Set14, and Set100

 数据库 Noise BM3D TNRD DnCNN MSE-S MSE-B Joint Set5 15 32.26 32.49 32.74 32.84 32.74 32.61 25 29.84 30.10 30.39 30.51 30.41 30.09 50 26.71 26.94 27.26 27.41 27.29 26.95 Set14 15 32.37 32.50 32.86 33.01 32.89 32.41 25 29.97 30.06 30.43 30.60 30.48 30.22 50 26.72 26.81 27.18 27.30 27.20 26.98 BSD100 15 31.08 31.42 31.73 31.89 31.70 31.28 25 28.57 28.89 29.23 29.38 29.20 28.97 50 25.62 26.01 26.23 26.41 26.29 26.03

Table 2 Running time of different methods

 BM3D(CPU) TNRD(CPU/GPU) DnCNN(CPU/GPU) Joint(CPU/GPU) 时间/(s/幅) 3.01 1.86/0.035 3.85/0.060 4.69/0.081

# 参考文献

• [1] Buades A, Coll B, Morel J M. A non-local algorithm for image denoising[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005: 60-65.[DOI: 10.1109/CVPR.2005.38]
• [2] Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries[J]. IEEE Transactions on Image Processing, 2006, 15(12): 3736–3745. [DOI:10.1109/TIP.2006.881969]
• [3] Dabov K, Foi A, Katkovnik V, et al. Image denoising by sparse 3-D transform-domain collaborative filtering[J]. IEEE Transactions on Image Processing, 2007, 16(8): 2080–2095. [DOI:10.1109/TIP.2007.901238]
• [4] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2015: 1646-1654.[DOI: 10.1109/CVPR.2016.182]
• [5] Nah S, Kim T H, Lee K M. Deep multi-scale convolutional neural network for dynamic scene deblurring[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 257-265.[DOI: 10.1109/CVPR.2017.35]
• [6] Jain V, Seung H S. Natural image denoising with convolutional networks[C]//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates Inc., 2008: 769-776.
• [7] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceeding of the 25th International Conference on Machine Learning. Helsinki, Finland: ACM, 2008: 1096-1103.[DOI: 10.1145/1390156.1390294]
• [8] Xie J Y, Xu L L, Chen E H. Image denoising and inpainting with deep neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 341-349.
• [9] Chen Y, Pock T. Trainable Nonlinear Reaction Diffusion:A Flexible Framework for Fast and Effective Image Restoration[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1256–1272. [DOI:10.1109/TPAMI.2016.2596743]
• [10] Mao X J, Shen C H, Yang Y B. Image restoration using convolutional auto-encoders with symmetric skip connections[J]. arXiv preprint arXiv: 1606.08921, 2016.
• [11] Zhang K, Zuo W M, Chen Y J, et al. Beyond a Gaussian denoiser:residual learning of deep CNN for image denoising[J]. IEEE Transactions on Image Processing, 2017, 26(7): 3142–3155. [DOI:10.1109/TIP.2017.2662206]
• [12] Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 694-711.[DOI: 10.1007/978-3-319-46475-6_43]
• [13] Badrinarayanan V, Kendall A, Cipolla R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495. [DOI:10.1109/TPAMI.2016.2644615]
• [14] He K M, Zhang X Y, Ren S Q, et al. Identity mappings in deep residual networks[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 630-645.[DOI: 10.1007/978-3-319-46493-0_38]
• [15] He K M, Sun J, Tang X O. Guided image filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(6): 1397–1409. [DOI:10.1109/TPAMI.2012.213]
• [16] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2016: 105-114.[DOI: 10.1109/CVPR.2017.19]
• [17] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international conference on Multimedia. Orlando, Florida, USA: ACM, 2014: 675-678.[DOI: 10.1145/2647868.2654889]