发布时间: 2018-07-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170538 2018 | Volume 23 | Number 7 图像处理和编码

1. 天津工业大学电子与信息工程学院, 天津 300387;
2. 天津市光电检测技术与系统重点实验室, 天津 300387
 收稿日期: 2017-10-17; 修回日期: 2018-01-26 基金项目: 天津市应用基础与前沿技术研究计划(15JCYBJC16500) 第一作者简介: 李现国, 1981年生, 男, 副教授, 2012年于天津工业大学获机械设计及理论专业博士学位, 主要研究方向为智能信息处理、光电检测技术与系统。E-mail:lixianguo@tjpu.edu.cn, yyl070805@163.com;孙叶美, 女, 硕士研究生, 主要研究方向为智能信息处理。E-mail:sunyemei1216@163.com;杨彦利, 男, 副教授, 主要研究方向为智能信息处理、设备状态监测。E-mail:yangyanli@tjpu.edu.cn;苗长云, 男, 教授, 主要研究方向为通信网络与系统、光电检测技术与系统。E-mail:miaochangyun@tjpu.edu.cn. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)07-0984-10

关键词

Image super-resolution reconstruction based on intermediate supervision convolutional neural networks
Li Xianguo1,2, Sun Yemei1,2, Yang Yanli1,2, Miao Changyun1,2
1. School of Electronics and Information Engineering, Tianjin Polytechnic University, Tianjin 300387, China;
2. Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin 300387, China
Supported by: Tianjin Research Program of Application Foundation and Advanced Technology (15JCYBJC16500)

Abstract

Objective Learning-based image super-resolution reconstruction has recently become a research hotspot. A new image super-resolution reconstruction method based on intermediate supervision convolutional neural network (CNN) is proposed to solve the problems of less network layers, small receptive field, and functionality that is limited to a single scale in the original network super-resolution CNN (SRCNN) to further improve the quality of image reconstruction. Method This method is based on the idea of deep learning CNN. First, when information regarding the input or gradient passes through many layers, such information can vanish and be "washed out" by the time it reaches the end (or beginning) of the network. Therefore, we design a CNN structure that has an intermediate supervision layer. The learning architecture has 16 weight layers, and information used for reconstruction (receptive field) is considerably substantial (31×31 versus 13×13). The learning architecture layers are the same type, except for the first, seventh, and the last layers:64 filters with a size of 3×3×64, where a filter operates on a 3×3 spatial region across 64 channels (feature maps). Each convolutional layer is followed by a rectified linear unit as an activation function. The first convolution layer operates on the input image. The seventh layer is an intermediate supervision layer that can guide the training of preceding layers in the CNN; this guidance can be considered an implicit deep supervision adopted to strengthen the learning capability during training. The last layer, which uses a single filter with size 3×3×64, is used for image reconstruction. Second, the supervision layer and reconstruction loss functions are defined to solve the vanishing gradient problem of the deep CNN. The training procedure includes three steps:image preprocessing, feature extraction, and image reconstruction. In the first step, the network is trained by the low-resolution images, which are blurred by different upscaling factors (2, 3, 4, possibly including fractional factors), to reconstruct different degrees of blurred images well. In the second step, the image feature is extracted using convolution operations. Unlike center pixels in the SRCNN, those in the center-surround relation methodology are inferred by surrounding pixels that are not fully utilized. We pad one before the convolutions to keep the sizes of all feature maps (including the output image) uniform, thereby increasing the use of edge information for images and feature maps. In the last step, a smooth loss function with a good generalization performance is easily achieved with a comprehensive use of the features of shallow complexity because the input and output (predicted) images have high similarity and the high-resolution image is reconstructed by the residual learning method. Result The proposed method is evaluated on open challenge datasets Set5 and Set14, which are often used in super-resolution methods. Experimental results show that the proposed method has better subjective visual effect and objective quantitative evaluation than bicubic interpolation, A+, SelfEx, and SRCNN. For subjective visual evaluation, the proposed method produces a reconstructed image that has superior clarity and edge sharpness. For objective evaluation, the average peak signal to noise ratio (PSNR) achieved by this method is 2.26 dB, 0.28 dB, 0.28 dB, and 0.15 dB higher, respectively, than those attained by the other approaches. Meanwhile, the time consumed is less than half that of the SRCNN method when using the trained network models to reconstruct images. Conclusion The flow of information and gradients can be smoothly propagated throughout the entire network by introducing intermediate supervision into our network, thereby enhancing the reconstruction capability of networks and the training efficiency. Extensive experiments confirm that the proposed method, which has intermediate supervision, improves the quality and efficiency of image super-resolution reconstruction. This approach has good generalization capability and can be used for the super-resolution reconstruction of natural scene images.

Key words

super-resolution reconstruction; deep learning; intermediate supervision; convolution neural network; vanishing gradients; residual-learning

0 引言

SRCNN方法成功地将深度学习技术应用到了图像超分辨率重建, 但也存在一定的缺陷:第一, 网络层少, SRCNN仅使用三层卷积网络导致训练得到的模型重建性能有一定的限制。研究表明较深层次的神经网络可提取图像更深层次的特征, 可提升图像重建性能[18]。第二, 感受野小, 在卷积神经网络中, 利用局部感受野可构成相邻层神经元的上下文信息关联, SRCNN仅使用三层卷积网络导致卷积网络的感受野太小(13×13), 不能充分利用图像的上下文信息。第三, 泛化能力差, 只采用单一尺度因子模糊的图像对卷积神经网络模型进行训练, 导致得到的模型仅对某个模糊范围的低分辨率图像有较好的重建效果, 比如采用2的尺度因子模糊的图像训练的模型, 用来重建采用3的尺度因子模糊的图像时, 重建的图像效果甚至比双三次插值更差[19], 需重新训练模型。一些研究学者针对SRCNN的不足做了改进, 其中文献[20]在原有的三层卷积神经网络中, 调整卷积核大小, 加入池化层以降低维度减少计算等操作来提高重建的精度, 但对图像细节信息要求较高的超分辨率重建来说, 使用池化层降低维度会导致丢失图像的很多细节信息, 影响超分辨率重建的精度。文献[21]通过加深网络层提高重建性能, 其采用7个卷积层和1个反卷积层联合实现超分辨率的重建, 但依然存在网络层少、感受野小等缺陷。

1.3 特征提取

 $x_j^l = f(\sum\limits_{i \in {M_j}} {x_i^{l-1}} \times \mathit{\boldsymbol{W}}_{ij}^l + b_j^l)$ (1)

 $\begin{array}{l} \Delta {\mathit{\boldsymbol{W}}^l} =-\eta \frac{{\partial E}}{{\partial {\mathit{\boldsymbol{W}}^l}}} =-\eta \frac{{\partial E}}{{\partial {\mu ^l}}}\frac{{\partial {\mu ^l}}}{{\partial {\mathit{\boldsymbol{W}}^l}}} = \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;-\eta {({\mathit{\boldsymbol{\delta }}^l})^{\rm{T}}}{x^{l - 1}} \end{array}$ (6)

 $\Delta {b^l} =-\eta \frac{{\partial E}}{{\partial {b^l}}} =-\eta \frac{{\partial E}}{{\partial {\mu ^l}}}\frac{{\partial {\mu ^l}}}{{\partial {b^l}}} =-\eta {\mathit{\boldsymbol{\delta }}^l}$ (7)

$l$层中${b^l}$引起的${x^l}$的偏差为

 $\Delta {x^l} \approx \frac{{\partial {x^l}}}{{\partial {b^l}}}\Delta {b^l} = f^\prime ({\mu ^l})\Delta {b^l}$ (8)

 $\Delta {\mu ^2} = \frac{{\partial {\mu ^2}}}{{\partial {x^l}}}\Delta {x^l} = {\mathit{\boldsymbol{W}}^2}f^\prime ({\mu ^l})\Delta {b^l}$ (9)

 $\begin{array}{l} \Delta E = f\prime ({x^1}){\mathit{\boldsymbol{W}}^2}f^\prime ({x^2}){\mathit{\boldsymbol{W}}^3}f^\prime ({x^3}) \ldots \times {\rm{ }}\\ \;\;\;\;\;\;\;\;\;{\mathit{\boldsymbol{W}}^n}f^\prime ({x^n})\frac{{\partial E}}{{\partial {x^n}}}\Delta {b^l} \end{array}$ (10)

 $\begin{array}{l} \frac{{\Delta E}}{{\Delta {b^l}}} = \frac{{\partial E}}{{\partial {b^l}}} = f^\prime ({x^1}){\mathit{\boldsymbol{W}}^2}f^\prime ({x^2}){\mathit{\boldsymbol{W}}^3}f^\prime ({x^3}) \ldots \times {\rm{ }}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\mathit{\boldsymbol{W}}^n}f^\prime ({x^n})\frac{{\partial E}}{{\partial {x^n}}} \end{array}$ (11)

 ${E_1}({\mathit{\boldsymbol{\theta }}_1}) = \frac{1}{N}\sum\limits_{i = 1}^N {{{\left\| {{\mathit{\boldsymbol{Y}}_i}-{F_1}({\mathit{\boldsymbol{X}}_i};{\mathit{\boldsymbol{\theta }}_1})} \right\|}^2}}$ (12)

1.4 图像重建

 $\begin{array}{l} {E_2}({\mathit{\boldsymbol{\theta }}_2}) = \frac{1}{N}\sum\limits_{i = 1}^N {{{\left\| {{\mathit{\boldsymbol{Y}}_i}-{\mathit{\boldsymbol{X}}_i}-{F_2}({\mathit{\boldsymbol{X}}_i};{\mathit{\boldsymbol{\theta }}_2})} \right\|}^2}} = \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\frac{1}{N}\sum\limits_{i = 1}^N {{{\left\| {\mathit{\boldsymbol{r}}-{F_2}({\mathit{\boldsymbol{X}}_i};{\mathit{\boldsymbol{\theta }}_2})} \right\|}^2}} \end{array}$ (13)

2 实验结果与分析

 ${\rm{PSNR}} = 10{\rm{lg}}\frac{{\mathit{MN}}}{{{{\left\| {\mathit{\boldsymbol{Y}}-\mathit{\boldsymbol{\tilde Y}}} \right\|}^2}}}$ (14)

 ${\rm{SSIM}} = \frac{{(2{\mu _Y}{\mu _{\mathit{\boldsymbol{\tilde Y}}}} + {C_1})\left( {{\sigma _{\mathit{\boldsymbol{Y\tilde Y}}}} + {C_2}} \right)}}{{\left( {\mu _\mathit{\boldsymbol{Y}}^2 + \mu _{\mathit{\boldsymbol{\tilde Y}}}^2 + {C_1}} \right)\left( {\sigma _\mathit{\boldsymbol{Y}}^2 + \sigma _{\mathit{\boldsymbol{\tilde Y}}}^2 + {C_2}} \right)}}$ (15)

Table 1 PSNR/SSIM of different methods with upscaling factor 2, 3 and 4 on datasets Set5, Set14

 数据库 尺度因子 双三次插值(PSNR/dB)/SSIM A+ (PSNR/dB)/SSIM SelfEx (PSNR/dB)/SSIM SRCNN (PSNR/dB)/SSIM 本文方法(PSNR/dB)/SSIM 2 33.66/0.992 36.54/0.954 36.49/0.953 36.66/0.954 36.81/0.956 Set5 3 30.39/0.868 32.58/0.908 32.58/0.909 32.75/0.909 32.97/0.914 4 28.42/0.810 30.28/0.860 30.31/0.861 30.31/0.862 30.72/0.882 2 30.24/0.868 32.28/0.905 32.22/0.903 32.42/0.906 32.46/0.907 Set14 3 27.55/0.774 29.13/0.818 29.16/0.819 29.28/0.820 29.31/0.823 4 26.00/0.702 27.32/0.749 27.40/0.751 27.49/0.750 27.56/0.756 平均值 29.38/0.836 31.36/0.866 31.36/0.866 31.49/0.867 31.64/0.873

Table 2 CPU time of different methods with upscaling factor 2, 3 and 4 on datasets Set5, Set14

 /s 数据库 尺度因子 双三次插值 A+ SelfEx SRCNN 本文方法 2 0.00 0.52 59.38 5.31 2.50 Set5 3 0.00 0.32 42.66 5.23 1.92 4 0.00 0.24 35.96 5.22 2.51 2 0.00 1.14 149.72 10.88 5.12 Set14 3 0.00 0.65 106.05 10.63 4.07 4 0.00 0.49 90.95 10.52 5.01

参考文献

• [1] Zhang Y, Li J Z, Li D L, et al. Super-resolution reconstruction for UAV video[J]. Journal of Image and Graphics, 2016, 21(7): 967–976. [张岩, 李建增, 李德良, 等. 无人机侦察视频超分辨率重建方法[J]. 中国图象图形学报, 2016, 21(7): 967–976. ] [DOI:10.11834/jig.20160715]
• [2] Zhao J J, Fang Q, Liang Z C, et al. Sketch face recognition based on super-resolution reconstruction[J]. Journal of Image and Graphics, 2016, 21(2): 218–224. [赵京晶, 方琪, 梁植程, 等. 超分辨率重建的素描人脸识别[J]. 中国图象图形学报, 2016, 21(2): 218–224. ] [DOI:10.11834/jig.20160211]
• [3] Hu C S, Zhan S, Wu C Z. Image super-resolution based on deep learning features[J]. Acta Automatica Sinica, 2017, 43(5): 814–821. [胡长胜, 詹曙, 吴从中. 基于深度特征学习的图像超分辨率重建[J]. 自动化学报, 2017, 43(5): 814–821. ] [DOI:10.16383/j.aas.2017.c150634]
• [4] Su H, Zhou J, Zhang Z H. Survey of super-resolution image reconstruction methods[J]. Acta Automatica Sinica, 2013, 39(8): 1202–1213. [苏衡, 周杰, 张志浩. 超分辨率图像重建方法综述[J]. 自动化学报, 2013, 39(8): 1202–1213. ] [DOI:10.3724/SP.J.1004.2013.01202]
• [5] Chavez-Roman H, Ponomaryov V. Super resolution image generation using wavelet domain interpolation with edge extraction via a sparse representation[J]. IEEE Geoscience and Remote Sensing Letters, 2014, 11(10): 1777–1781. [DOI:10.1109/LGRS.2014.2308905]
• [6] Tai Y W, Liu S C, Brown M S, et al. Super resolution using edge prior and single image detail synthesis[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 2400-2407. [DOI:10.1109/CVPR.2010.5539933]
• [7] Zhang K B, Gao X B, Tao D C, et al. Single image super-resolution with non-local means and steering kernel regression[J]. IEEE Transactions on Image Processing, 2012, 21(11): 4544–4556. [DOI:10.1109/TIP.2012.2208977]
• [8] Xu J, Chang Z G, Fan J L. Image superresolution by midfrequency sparse representation and total variation regularization[J]. Journal of Electronic Imaging, 2015, 24(1): #013039. [DOI:10.1117/1.JEI.24.1.013039]
• [9] Zhang K B, Tao D C, Gao X B, et al. Learning multiple linear mappings for efficient single image super-resolution[J]. IEEE Transactions on Image Processing, 2015, 24(3): 846–861. [DOI:10.1109/TIP.2015.2389629]
• [10] Deng C Z, Tian W, Wang S Q, et al. Super-resolution reconstruction of approximate sparsity regularized infrared images[J]. Optics and Precision Engineering, 2014, 22(6): 1648–1654. [邓承志, 田伟, 汪胜前, 等. 近似稀疏正则化的红外图像超分辨率重建[J]. 光学精密工程, 2014, 22(6): 1648–1654. ] [DOI:10.3788/OPE.20142206.1648]
• [11] Chang H, Yeung D Y, Xiong Y M. Super-resolution through neighbor embedding[C]//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2004: Ⅰ. [DOI:10.1109/CVPR.2004.1315043]
• [12] Cao X, Chen X H, Pan R H. Fast image super-resolution algorithm based on sparse representation[J]. Computer Engineering, 2015, 41(6): 211–215, 220. [曹翔, 陈秀宏, 潘荣华. 基于稀疏表示的快速图像超分辨率算法[J]. 计算机工程, 2015, 41(6): 211–215, 220. ] [DOI:10.3969/j.issn.1000-3428.2015.06.038]
• [13] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//Proceedings of the 7th International Conference on Curves and Surfaces. Avignon, France: Springer, 2012: 711-730. [DOI:10.1007/978-3-642-27413-8_47]
• [14] Timofte R, De V, Van Gool L. Anchored neighborhood regression for fast example-based super-resolution[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 1920-1927. [DOI:10.1109/ICCV.2013.241]
• [15] Timofte R, De Smet V, Van Gool L. A+: adjusted anchored neighborhood regression for fast super-resolution[C]//Proceedings of the 12th Asian Conference on Computer Vision. Singapore: Springer, 2014: 111-126. [DOI:10.1007/978-3-319-16817-3_8]
• [16] Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 3791-3799. [DOI:10.1109/CVPR.2015.7299003]
• [17] Dong C, Loy C C, He K M, et al. Learning a deep convolutional network for image super-resolution[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 184-199. [DOI:10.1007/978-3-319-10593-2]
• [18] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. [DOI:10.1109/TPAMI.2015.2389824]
• [19] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654. [DOI:10.1109/CVPR.2016.182]
• [20] Xiao J S, Liu E Y, Zhu L, et al. Improved image super-resolution algorithm based on convolutional neural network[J]. Acta Optica Sinica, 2017, 37(3): #0318011. [肖进胜, 刘恩雨, 朱力, 等. 改进的基于卷积神经网络的图像超分辨率算法[J]. 光学学报, 2017, 37(3): #0318011. ] [DOI:10.3788/AOS201737.0318011]
• [21] Li S M, Lei G Q, Fan R. Depth map super-resolution reconstruction based on convolutional neural networks[J]. Acta Optica Sinica, 2017, 37(12): #1210002. [李素梅, 雷国庆, 范如. 基于卷积神经网络的深度图超分辨率重建[J]. 光学学报, 2017, 37(12): #1210002. ] [DOI:10.3788/AOS201737.1210002]
• [22] Xu K S, Wang H L, Tang P J. Image captioning with deep LSTM based on sequential residual[C]//Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China: IEEE, 2017: 361-366. [DOI:10.1109/ICME.2017.8019408]
• [23] Sun X, Li X G, Li J F, et al. Review on deep learning based image super-resolution restoration algorithms[J]. Acta Automatica Sinica, 2017, 43(5): 697–709. [孙旭, 李晓光, 李嘉锋, 等. 基于深度学习的图像超分辨率复原研究进展[J]. 自动化学报, 2017, 43(5): 697–709. ] [DOI:10.16383/j.aas.2017.c160629]
• [24] Wei S E, Ramakrishna V, Kanade T, et al. Convolutional pose machines[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4724-4732. [DOI:10.1109/CVPR.2016.511]
• [25] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778. [DOI:10.1109/CVPR.2016.90]
• [26] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, Florida: ACM, 2014: 675-678. [DOI:10.1145/2647868.2654889]
• [27] Sun Y W, Li L T, Cong P, et al. Super-resolution method for radiation image based on deep learning[J]. Atomic Energy Science and Technology, 2017, 51(5): 890–895. [孙跃文, 李立涛, 丛鹏, 等. 基于深度学习的辐射图像超分辨率重建方法[J]. 原子能科学技术, 2017, 51(5): 890–895. ] [DOI:10.7538/yzk.2017.51.05.0890]
• [28] Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 5197-5206. [DOI:10.1109/CVPR.2015.7299156]