发布时间: 2021-04-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200168
2021 | Volume 26 | Number 4

图像处理和编码

多层次感知残差卷积网络的单幅图像超分重建

何蕾, 程佳豪, 占志钰, 杨雯博, 刘沛然

合肥工业大学, 合肥 230601

收稿日期: 2020-05-13; 修回日期: 2020-10-19; 预印本日期: 2020-10-26

基金项目: 国家自然科学基金项目（61502141）

作者简介: 何蕾, 1980年生, 女, 副教授, 主要研究方向为计算机图形学、数字图像处理。E-mail: hlei80@163.com
程佳豪, 通信作者, 男, 本科生, 主要研究方向为计算机视觉、计算机图形学。E-mail: 1098760713@qq.com
占志钰, 男, 本科生, 主要研究方向为计算机图形学、计算机视觉。E-mail: 443552678@qq.com
杨雯博, 男, 本科生, 主要研究方向为计算机视觉、软件与数字媒体技术。E-mail: 839569838@qq.com
刘沛然, 男, 本科生, 主要研究方向为计算数学。E-mail: 3134367592@qq.com
*通信作者: 程佳豪 1098760713@qq.com

中图法分类号: TP391.41

文献标识码: A

文章编号: 1006-8961(2021)04-0776-11

摘要

目的单幅图像超分辨率重建的深度学习算法中，大多数网络都采用了单一尺度的卷积核来提取特征（如3×3的卷积核），往往忽略了不同卷积核尺寸带来的不同大小感受域的问题，而不同大小的感受域会使网络注意到不同程度的特征，因此只采用单一尺度的卷积核会使网络忽略了不同特征图之间的宏观联系。针对上述问题，本文提出了多层次感知残差卷积网络（multi-level perception residual convolutional network，MLP-Net，用于单幅图像超分辨率重建）。方法通过特征提取模块提取图像低频特征作为输入。输入部分由密集连接的多个多层次感知模块组成，其中多层次感知模块分为浅层多层次特征提取和深层多层次特征提取，以确保网络既能注意到图像的低级特征，又能注意到高级特征，同时也能保证特征之间的宏观联系。结果实验结果采用客观评价的峰值信噪比（peak signal to noise ratio，PSNR）和结构相似性（structural similarity，SSIM）两个指标，将本文算法其他超分辨率算法进行了对比。最终结果表明本文算法在4个基准测试集上（Set5、Set14、Urban100和BSD100（Berkeley Segmentation Dataset））放大2倍的平均峰值信噪比分别为37.851 1 dB，33.933 8 dB，32.219 1 dB，32.148 9 dB，均高于其他几种算法的结果。结论本文提出的卷积网络采用多尺度卷积充分提取分层特征中的不同层次特征，同时利用低分辨率图像本身的结构信息完成重建，并取得不错的重建效果。

关键词

深度学习; 卷积神经网络(CNN); 单幅图像超分辨率(SISR); 多层次感知; 残差网络; 密集连接; DIV2K

Single image super-resolution reconstruction based on multi-level perceptual residual convolutional network

He Lei, Cheng Jiahao, Zhan Zhiyu, Yang Wenbo, Liu Peiran

Hefei University of Technology, Hefei 230601, China

Supported by: National Natural Science Foundation of China(61502141)

Abstract

Objective Single image super-resolution reconstruction (SISR) is a classic problem in computer vision. SISR aims to reconstruct one high-resolution image from single or many low-resolution (LR) images. Currently, image super-resolution (SR) technology is widely used in medical imaging, satellite remote sensing, video surveillance, and other fields. However, the SR problem is an essentially complex and morbid problem. To solve this problem, many SISR methods have been proposed, including interpolation-based methods and reconstruction-based methods. Due to large amplification factors, the repair performance will drop sharply, and the reconstructed results are very poor. With the rise of deep learning, deep convolutional neural networks have also been used to solve this problem. Researchers have proposed a series of models and made significant progress. With the gradual understanding of deep learning techniques, researchers have found that deep network brings better results than shallow network, and too deep network can cause gradient explosion or disappearance. In addition, the gradient explosion or disappearance can cause the model to be untrainable and thus unable to achieve the best results through training. In recent years, most networks based on deep learning for single-image SR reconstruction adopt single-scale convolution kernels. Generally, a 3×3 convolution kernel is used for feature extraction. Although single-scale convolution kernels can also extract a lot of detailed information, these algorithms usually ignore the problem of different receptive field sizes caused by different convolution kernel sizes. Receptive fields of different sizes will make the network pay attention to different features; therefore, only using a 3×3 convolution kernel will cause the network to ignore the macroscopic relation between different feature images. Considering these problems, this study proposes a multi-level perception network based on GoogLeNet, residual network, and dense convolutional network. Method First, the feature extraction module is used as the input, which can extract low-frequency image features. The feature extraction module consists of two 3×3 convolution layers, which is input to multiple densely connected multi-level perception modules. The multi-level perception module is composed of 3×3 and 5×5 convolution kernels. The 3×3 convolution kernel is responsible for extracting detailed feature information, and the 5×5 convolution kernel is responsible for extracting global feature information. Second, the multi-level perception module is divided into shallow multi-level feature extraction, deep multi-level feature extraction, and tandem compression unit. The shallow multi-level feature extraction is composed of 3×3 chain convolution and 5×5 chain convolution. The former is responsible for extracting fine local feature information in shallow features, whereas the latter is responsible for extracting global features in shallow features. The deep multi-level feature extraction is also composed of 3×3 chain convolution and 5×5 chain convolution. The former extracts fine local feature information in deep features, whereas the latter extracts global feature information in deep features. In the tandem compression unit, the global feature information in shallow features, the fine local feature information in deep features, the global information in deep features, and the initial input are concatenated together and then compressed into the same dimension as the input image. In this way, not only low-level and high-level features of the image can be ensured, but also the macro relationship between the features can be guaranteed. Finally, the reconstruction module is used to obtain the final output by combining the upscaling image with the residual image. This study adopts the DIV2K dataset, which consists of 800 high-definition images, and each image has probably 2 million pixels. In order to make full use of these data, the picture is randomly rotated by 90°, 180°, and 270° and horizontally flipped. Result The reconstructed results are evaluated by using the peak signal-to-noise ratio (PSNR) and structural similarity index and compared with some state-of-the-art SR reconstruction methods. The reconstructed results with 2 scaling factor show that the PSNRs of the proposed algorithm in four benchmark test sets (Set5, Set14, Berkeley Segmentation Dataset(BSD100), and Urban100) are 37.851 1 dB, 33.933 8 dB, 32.219 1 dB, and 32.148 9 dB, respectively, which are all higher than those of the other methods. Conclusion Compared with other algorithms, the proposed convolutional network model in this study can better take into account the problem of the receptive field and fully extracts different levels of hierarchical features through multi-scale convolution. At the same time, the model uses the structural feature information of the LR image itself to complete the reconstruction, and good reconstructed results can be obtained by using this model.

Key words

deep learning; convolutional neural network(CNN); single image super-resolution(SISR); multi-level perception; residual network; dense connection; DIV2K

0 引言

单幅图像超分辨率(single image super resolution，SISR)是低层计算机视觉中的经典问题，它在不改变图像语义的前提下，从低分辨率(low resolution，LR)图像重建出包含丰富细节的高分辨率(high resolution，HR)图像。因此，图像超分辨率技术广泛运用于医疗影像、卫星遥感和视频监控等领域。但是SR(super resolution)问题本质上是一个复杂且病态的问题，为了解决这一问题，提出了许多SISR方法，包括基于插值的方法和基于重建的方法。通常情况下，上述方法在较大放大因子的情况下，修复性能会急剧下降，重建效果不佳。

随着深度学习的火热发展，卷积神经网络(convolution neural network，CNN)也被运用在该问题的解决上。本文首先回顾基于CNN的SR方法，然后介绍本文方法。

基于CNN的SR方法取得了显著进展。Dong等人(2016a)首先提出了3层卷积神经网络超分辨率方法(super-resolution convolutional neural network，SRCNN)，以端到端的方式联合优化特征提取、非线性映射和图像重建3个阶段，论证了卷积神经网络可以通过端到端的方式学习LR到HR的映射。针对计算量过大的缺点，Dong等人(2016b)又构造了一种改进方法FSRCNN(fast super-resolution convolutional neural network, )，该方法采用原始LR图像而不是LR的高采样图像作为输入，从而极大地提高了计算效率。Shi等人(2016)提出了一种有效的亚像素卷积神经网络(efficient sub-pixel convolutional neural network, ESPCNN)，该文采用有效的亚像素卷积代替上采样操作, 通过简单的数学运算代替上采样，极大提高了运行速度。

研究发现深层网络比浅层网络效果更好，Kim等人(2016b)在VDSR(super-resolution using very deep convolutional networks) 中采用了更深的网络共20层，同时利用梯度裁剪、跳跃连接等学习策略，加速了网络收敛。然而，网络模型过深会使得梯度消失或者爆炸从而导致模型不可训练，也就无法通过训练达到最好的效果。于是，He等人(2016)提出了一种残差框架，来解决网络层数过深而导致的梯度消失或者爆炸。接着，Huang等人(2017)提出了一种密集连接，来促进层之间的信息流通。Hui等人(2018)建造了一个简单网络(information distillation network，IDN)，该网络包含一个增强单元和一个压缩单元，该方法在保持良好重建精度的同时，实现了实时性。沈明玉等人(2019)则通过编码网络得到低分辨率图像的结构特征信息，同时通过阶段特征融合单元组成的多路径前馈网络得到高频特征。杨娟等人(2019)通过递进式提取图像不同尺度的高频特征完成生成器模块的重建任务。而李浪宇等人(2018)则通过结合图像局部相似性，构建了一个细节补充网络来补充图像特征，并使用一层卷积层将细节补充网络得到的特征与特征提取网络提取的特征融合，从而重建出高分辨率图像。为了工业应用，Ahn等人(2018)提出了一种新的方法——CARN (cascading residual network)，该方法采用局部和全局级联的机制来集成多层特性，从而加快模型的运行速度。针对高倍数因子训练困难的问题，Lai等人(2017)提出了一种拉普拉斯金字塔重建模型(Laplacian pyramid super-resolution network, LapSRN)，该模型通过级联卷积神经网络逐步重构高分辨率图像的子频带残差。针对较大放大因子下进行超分辨率处理时，如何恢复更细纹理细节的问题，Ledig等人(2017)引入了生成对抗网络(generative adversarial network, GAN)技术，提出了由感知损失和对抗损失组成的一种新的损失函数。Fang等人(2020)通过将软边缘图像先验知识整合到模型中用于辅助图像的重建，从而避免了盲目增加网络深度。针对插值处理，尤其是超分辨率因子很高的时候，可能导致图像变得更平滑的问题，Yang等人(2019)提出了一种深度递归融合网络(deep recurrent fusion network，DRFN)，该网络利用转置卷积代替双三次插值进行上采样，并集成了从递归残差块存储中提取的不同级别的特征，从而构成了最终的高分辨率图像。针对现有网络很少挖掘层间特征的相关性，从而降低了卷积神经网络学习能力的问题，Dai等人(2019)提出了一种二阶注意力网络(second-order attention network, SAN)，以获得更好的特征表达和特征相关性学习。Qiu等人(2019)认为图像中的低频信息和高频信息具有不同程度的复杂性，应该通过不同表征能力的模型来恢复，于是提出了一种新的嵌入式块残差网络(embedded block residual network, EBRN)，通过分层处理不同频率的特征，从而避免了低频纹理在高复杂网络训练下过拟合而导致效果变差。对于现有基于深度学习的图像超分辨率方法并没有充分利用人类视觉系统中普遍存在的反馈机制的不足，Li等人(2019)提出了一种图像超分辨率反馈网络(super-resolution feedback network, SRFBN)，该网络利用高层信息来细化低层表示，在一个有约束的递归神经网络中使用隐藏状态来实现这种反馈方式。为了采用一个模型来求解任意尺度因子(包括非整数尺度因子)的超分辨率，Hu等人(2019)采用比例因子作为输入，动态预测上采样滤波器的权值，并利用这些权值生成任意大小的高分辨率图像。Zhang等人(2019a)提出了一种DPSR(deep plug-and-play super-resolution)方法，通过设计一个新的单个图像超分辨率退化模型来代替盲去模糊的模糊核估计，并引入能量函数来优化该退化模型。

尽管上述网络模型取得了显著性能，但大多数网络仍然存在不足，如忽视了感受域的问题。目前也有小部分超分网络考虑了该问题，如Kim等人(2016a)提出了DRCN(deeply-recursive convolutional network)结构，通过递归使用相同卷积核以达到扩展感受域的目的。Zhang等人(2019b)提出了一种新的卷积DCSR(dilated convolutions for single image super-resolution)模型，将标准卷积和扩展卷积进行结合，从而达到了在不增加参数的条件下增大感受域的目的。不同大小的感受域下产生的特征蕴含着不同层次的局部和全局信息，包含不同特征图之间的宏观联系，这些特征可以更加有效地指导网络重建图像，丰富图像不同层次的信息，使得图像更加贴合实际。

为了解决不同的卷积核对图像重建的作用，本文提出了一种多层次感知残差卷积网络模型(multi-level perception residual convolutional network, MLP-Net)，如图 1所示，该网络可以提取不同阶段的不同层次信息。

图 1 多层次感知残差卷积网络

Fig. 1 Multi-level perceptual residual convolutional network

本文的主要贡献如下：

1) 多层次感知模块(multi-level perception block，MLPB) 是MLP-Net的核心模块，该模块直接从LR图像进行浅层和深层不同层次的特征提取，利用不同尺度卷积来生成局部特征和全局特征，多个级联模块和密集连接来生成更多层次信息。

2) 结构简洁，具有实时性。

3) 具有良好的可扩展性与级联性，以及良好的重构精度。

1 本文方法

如图 1所示，本文网络结构主要包括3个部分：

1) 提取浅层特征的特征提取模块(FBlock)；

2) 多层次特征提取的多层次感知模块(MLPB)；

3) 生成最终图像的重建网络(RBlock)。

1.1 特征提取模块(FBlock)

假设$ \mathit{\boldsymbol{x}}$为MLP-Net的输入，首先信息流过由两个3×3卷积网络组成的特征提取模块，用来提取原始LR图像的特征图，这个过程描述为

$ {\mathit{\boldsymbol{B}}_0} = F(\mathit{\boldsymbol{x}}) $

(1)

式中，$ F$为特征提取函数，$ {\mathit{\boldsymbol{B}}_0}$为提取到的特征，该浅层特征为多层次感知模块(MLPB) 的输入。

1.2 多层次感知模块(MLPB)

如图 1所示，MLPB是密集连接的，假设当前位置是第$ k$个MLPB，那么其输入为前${k - 1} $个MLPB的输出和FBlock的输出，可表示为

$ \begin{array}{l} {\mathit{\boldsymbol{B}}_k} = {\mathit{\boldsymbol{M}}_k}\left({{\mathit{\boldsymbol{B}}_{k - 1}}, {\mathit{\boldsymbol{B}}_{k - 2}}, \cdots, {\mathit{\boldsymbol{B}}_1}, {\mathit{\boldsymbol{B}}_0}} \right)\\ \;\;\;\;\;\;\;\;\;\;k = 1, \cdots, n \end{array} $

(2)

式中，${\mathit{\boldsymbol{M}}_k} $为第$ k$个MLPB函数，$ {\mathit{\boldsymbol{B}}_k}$为第$ k$个MLPB的输出，${{\mathit{\boldsymbol{B}}_{k - 1}}} $, $ {{\mathit{\boldsymbol{B}}_{k - 2}}}$, …, ${{\mathit{\boldsymbol{B}}_1}} $分别为前${k - 1} $个MLPB的输出，$ {B_0}$为FBlock的输出。

每一个MLPB模块结构分为3部分，如图 2所示，第1部分为前3层卷积的浅层多层次提取单元，第2部分为后3层卷积的深层多层次提取单元，第3部分为串联压缩单元。

图 2 多层次感知模块

Fig. 2 Multi-level perception block

1.2.1 浅层多层次提取单元

假设${{\mathit{\boldsymbol{B}}_{k - 1}}} $为第$ k$个MLPB模块的输入，前3层3×3卷积链路可表示为

$ \mathit{\boldsymbol{B}}_k^{13} = \mathit{\boldsymbol{C}}_{3 \times 3}^1\left({{\mathit{\boldsymbol{B}}_{k - 1}}} \right) $

(3)

式中，$ \mathit{\boldsymbol{B}}_k^{13}$为${{\mathit{\boldsymbol{B}}_{k - 1}}} $经过3×3链式卷积后的特征，$ \mathit{\boldsymbol{C}}_{3 \times 3}^1$为3×3的3层链式卷积函数，其目的是提取当前层次特征中的浅层局部特征。

同时，为了提取到当前特征的全局特征，使用5×5链路的链式卷积，可表示为

$ \mathit{\boldsymbol{B}}_k^{15} = \mathit{\boldsymbol{C}}_{5 \times 5}^1\left({{\mathit{\boldsymbol{B}}_{k - 1}}} \right) $

(4)

式中，$ \mathit{\boldsymbol{B}}_k^{15}$为${{\mathit{\boldsymbol{B}}_{k - 1}}} $经过5×5链式卷积后的特征，$ \mathit{\boldsymbol{C}}_{5 \times 5}^1$为5×5的3层链式卷积函数。将$ \mathit{\boldsymbol{B}}_k^{15}$与${{\mathit{\boldsymbol{B}}_{k - 1}}} $进行串联后得到特征信息$ \mathit{\boldsymbol{B}}_k^\alpha $，即

$ \mathit{\boldsymbol{B}}_k^\alpha = \mathit{\boldsymbol{B}}_k^{15} + {\mathit{\boldsymbol{B}}_{k - 1}} $

(5)

1.2.2 深层多层次提取单元

为了获得更精细的局部特征信息，经浅层多层次提取单元后又采用了后3层的3×3链式卷积，可表示为

$ \mathit{\boldsymbol{B}}_k^{23} = \mathit{\boldsymbol{C}}_{3 \times 3}^2\left({\mathit{\boldsymbol{B}}_k^{13}} \right) $

(6)

式中，$ \mathit{\boldsymbol{C}}_{3 \times 3}^2$为后3层的3×3链式卷积函数，$ \mathit{\boldsymbol{B}}_k^{23}$为输出的特征。

同时，采用了后3层的5×5链式卷积来提取更多的全局信息，可表示为

$ \mathit{\boldsymbol{B}}_k^{25} = \mathit{\boldsymbol{C}}_{5 \times 5}^2\left({\mathit{\boldsymbol{B}}_k^{13}} \right) $

(7)

式中，$ \mathit{\boldsymbol{C}}_{5 \times 5}^2$为后3层的5×5链式卷积函数，$\mathit{\boldsymbol{B}}_k^{25} $为输出的特征。

1.2.3 串联压缩单元

将式(5)—(7)所得的所有特征在通道上进行串联，得到

$ \mathit{\boldsymbol{B}}_k^2 = \mathit{\boldsymbol{B}}_k^\alpha + \mathit{\boldsymbol{B}}_k^{23} + \mathit{\boldsymbol{B}}_k^{25} $

(8)

为了进一步压缩提取有效信息，利用1×1卷积来实现压缩机制，具体而言，所有输出送至1×1卷积层，将维度压缩至与${{\mathit{\boldsymbol{B}}_{k - 1}}} $相同的维度，可表示为

$ {\mathit{\boldsymbol{B}}_k} = \mathit{\boldsymbol{f}}\left({\mathit{\boldsymbol{B}}_k^2} \right) = {\mathit{\boldsymbol{A}}_k}\left({{w_k}\left({\mathit{\boldsymbol{B}}_k^2} \right) + b} \right) $

(9)

式中，$\mathit{\boldsymbol{f}} $为1×1卷积函数，$ {\mathit{\boldsymbol{A}}_k}$为激活函数，${{w_k}} $为权重参数，$ b$为偏置。

1.3 重构模块(RBlock)

本文采用一种不含激活函数的3×3转置卷积操作用于重构模块。假设$ \mathit{\boldsymbol{y}}$为MLP-Net的输出，则最终重建的结果可表示为

$ \mathit{\boldsymbol{y}} = \mathit{\boldsymbol{R}}\left({{\mathit{\boldsymbol{M}}_n}\left({{\mathit{\boldsymbol{B}}_{n - 1}}, {\mathit{\boldsymbol{B}}_{n - 2}}, \cdots, {\mathit{\boldsymbol{B}}_0}} \right)} \right) + \mathit{\boldsymbol{D}}\left(x \right) $

(10)

式中，$ \mathit{\boldsymbol{R}}$和$\mathit{\boldsymbol{D}} $分别为经过重构函数和采用双三次插值后的图像，$ {{\mathit{\boldsymbol{M}}_n}}$为第$ n$个MLPB函数，$ {{\mathit{\boldsymbol{B}}_{n - 1}}}$, …, ${{\mathit{\boldsymbol{B}}_1}} $分别为前$ n-1$个MLPB的输出，$ {\mathit{\boldsymbol{B}}_0}$为FBlock的输出。

2 实验分析与结果

本文采用峰值信噪比(peak signal-to-noise ratio, PSNR)和结构相似性(sturctural similarity, SSIM)作为客观评价，并与多种超分辨率重建算法进行了效果比对。

2.1 数据集

2.1.1 训练集

本文采用的是DIV2K(Diverse 2k)数据集，该数据集由800幅高清图像组成，每幅图像在200万像素左右，同时为了充分利用数据，随机对图像进行90°、180°、270°旋转和水平翻转。

2.1.2 测试集

选用Set5、Set14、BSD100(Berkeley Segmentation Dataset)、Urban100这4个广泛使用的数据集作为测试集，它们分别包含5幅、14幅、100幅和100幅图像。

2.2 训练细节

2.2.1 训练样本

对DIV2K数据集图像随机裁剪至192×192像素，作为HR图像。根据放大因子(×2，×3，×4)，将HR图像进行下采样作为LR图像，即得到LR/HR图像对分别为96×96/192×192像素，64×64/192×192像素，48×48/192×192像素。在实验中，为了保障映射过程中不会因卷积处理而导致图像大小难以控制，当图像尺寸不足裁剪尺寸时，采取对像素进行“0”填充。

2.2.2 参数设置

本文采用平均绝对误差(mean absolute error, MAE)作为误差函数，采用Adam优化算法学习网络参数。通过PSNR/SSIM的对比和梯度爆炸问题来确定最优的参数(见表 1—表 3，图 3—图 5)，最终确定batch size设置为16，初始学习率为0.000 6，每2 000次减半。考虑到训练的效率，结合整体效果，最终选择MLPB模块数为4层。采用PReLU作为激活函数，初始权重设置为0。考虑到网络参数数量和可训练程度，FBlock采用两层64个3×3卷积核组成，每个MLPB模块中浅层多层次提取单元的卷积核数目分别是32、48、64，深层多层次提取单元卷积核数目分别是64、48、32，同时利用分组卷积来达到减少训练参数，避免过拟合的目的。

表 1 4个数据集上不同学习率的重建结果比较
Table 1 The comparisons of different learning rates on four datasets

下载CSV

学习率	PSNR(dB)/SSIM
学习率	Set5	Set14	Urban100	BSD100
0.000 1	37.734 3	33.788 4	32.006 0	32.095 6
0.000 1	/0.958 8	/0.919 8	/0.927 1	/0.898 3
0.000 2	37.811 8	33.901 5	32.180 5	32.140 0
0.000 2	/0.959 1	/0.920 7	/0.928 7	/0.899 0
0.000 4	37.834 0	33.935 7	32.196 9	32.147 5
0.000 4	/0.959 1	/0.921 3	/0.929 0	/0.898 9
0.000 5	37.831 1	33.884 7	32.212 4	32.145 4
0.000 5	/0.959 2	/0.920 8	/0.929 1	/0.899 0
0.000 6	37.851 1	33.933 8	32.219 1	32.148 9
0.000 6	/0.959 2	/0.921 0	/0.929 1	/0.898 9
0.000 7	37.8027	33.8919	32.1947	32.1463
0.000 7	/0.959 0	/0.920 7	/0.928 9	/0.898 8
0.000 9	37.809 2	33.852 6	32.153 7	32.143 6
0.000 9	/0.959 0	/0.920 5	/0.928 3	/0.898 9
注：加粗字体为每列最优结果。

表 2 4个数据集上MLPB不同层数的重建结果比较
Table 2 The comparisons of different layers of MLPB on four datasets

下载CSV

模块数	PSNR(dB)/SSIM
模块数	Set5	Set14	Urban100	BSD100
2层	37.684 4	33.733 2	31.723 3	32.042 7
2层	/0.958 5	/0.919 7	/0.924 1	/0.897 5
3层	37.759 4	33.825	31.974 2	32.097 6
3层	/0.958 8	/0.920 1	/0.926 8	/0.898 4
4层	37.774 9	33.829 2	32.119 5	32.110 6
4层	/0.959 0	/0.920 0	/0.928 3	/0.898 6
注：加粗字体为每列最优结果。

表 3 4个数据集上不同batch size的重建结果比较
Table 3 The comparisons of different batch size on four datasets

下载CSV

batch size	PSNR(dB)/SSIM
batch size	Set5	Set14	Urban100	BSD100
10	37.819 4	33.881 8	32.155 1	32.148 3
10	/0.959 0	/0.920 4	/0.928 5	/0.898 8
12	37.779 6	33.898 6	32.163 4	32.142 8
12	/0.959 0	/0.920 7	/0.928 4	/0.898 8
16	37.851 1	33.933 8	32.219 1	32.148 9
16	/0.959 2	/0.921 0	/0.929 1	/0.898 9
18	37.819 6	33.862 8	32.199 7	32.148 3
18	/0.959 0	/0.920 3	/0.929 1	/0.898 8
注：加粗字体为每列最优结果。

图 3 不同学习率在Set5上的效果

Fig. 3 The effect of different rates on Set5

图 4 不同batch size在Set5上的效果

Fig. 4 The effect of different batch size on Set5

图 5 MLPB不同层数在Set5上的效果

Fig. 5 The effect of different layers of MLPB on Set5

2.3 实验结果及其分析

为了充分验证本文算法的优越性和有效性，在实验中分别采用了双三次插值法、FSRCNN(Dong等, 2016b)、VDSR(Kim等，2016b)、LapSRN(Lai等，2017)、IDN(Hui等，2018)、CARN(Ahn等，2018)、ESPCNN(Shi等，2016)和本文方法对重建结果进行比较，并在Set5、Set14、BSD100和Urban100这4个标准测试集上将放大2倍、3倍和4倍的结果分别进行主观评价和客观评价。

表 4—表 7分别展示了这4个测试集上的平均PSNR和SSIM的比较结果，通过数值比较可以发现，本文方法重建结果的平均PSNR和SSIM值明显大于其他几种方法的结果，这意味着本文方法更优(由于ESPCNN算法的源码仅仅放大了3倍，为了保障原文算法的真实性，在表 4—表 7中仅有3倍的实验数据)。为了进一步说明本文算法的相对优势，如图 6—图 10所示，展示了采用不同重建方法放大2倍的视觉效果图，通过观察可以发现，采用本文方法恢复出的图像边缘和纹理细节更加清晰，图像锐度更强，也更加贴合真实图像。如从图 7的实验结果中可以明显观察到，采用本文方法得到的图像在窗子的边缘和细节上处理得更加清晰，而图 7(b)非常模糊，完全看不见细节，图 7 (c)的纹理不够清晰，其他几个图在边框部分已经完全变形。大量实验结果表明，采用本文方法重建的图像无论从主观评价还是客观评价来看整体效果都是最好的。

表 4 在Set5数据集上重建结果比较
Table 4 The comparisons of reconstructed results on Set5 dataset

下载CSV

算法	PSNR(dB)/SSIM
算法	×2	×3	×4
Bicubic	33.66/0.929 9	30.39/0.868 2	28.42/0.810 4
FSRCNN	37.00/0.955 8	33.16/0.914 0	30.71/0.865 7
VDSR	37.53/0.958 7	33.66/0.921 3	31.35/0.883 8
LapSRN	37.52/0.958 1	33.82/0.920 7	31.54/0.881 1
IDN	37.76/0.958 9	34.21/0.925 1	31.97/0.892 0
CARN	37.76/0.959 0	34.29/0.925 5	32.13/0.893 7
ESPCNN	-	33.00/0.890 9	-
MLP-Net(本文)	37.85/0.959 2	34.29/0.925 8	32.05/0.892 9
注：加粗字体为每列最大值，“-”代表无数据。

表 5 在Set14数据集上重建结果比较
Table 5 The comparisons of reconstructed results on Set14 dataset

下载CSV

算法	PSNR(dB)/SSIM
算法	×2	×3	×4
Bicubic	30.24/0.868 8	27.55/0.774 2	26.00/0.702 7
FSRCNN	32.63/0.908 8	29.43/0.824 2	27.59/0.753 5
VDSR	33.03/0.912 4	29.77/0.831 4	28.01/0.767 4
LapSRN	33.08/0.910 9	29.89/0.830 4	28.19/0.763 5
IDN	33.86/0.920 8	30.51/0.847 5	28.75/0.753 1
CARN	33.52/0.916 6	30.29/0.840 7	28.60/0.780 6
ESPCNN	-	29.42/0.816 9	-
MLP-Net(本文)	33.93/0.921 0	30.52/0.847 9	28.80/0.790 8
注：加粗字体为每列最大值，“-”代表无数据。

表 6 在Urban100数据集上重建结果比较
Table 6 The comparisons of reconstructed results on Urban100 dataset

下载CSV

算法	PSNR(dB)/SSIM
算法	×2	×3	×4
Bicubic	26.88/0.840 3	24.46/0.734 9	23.14/0.657 7
FSRCNN	29.88/0.902 0	26.43/0.808 0	24.62/0.728 0
VDSR	30.76/0.914 0	27.14/0.827 9	25.18/0.752 4
LapSRN	30.41/0.911 2	27.07/0.829 8	25.21/0.756 4
IDN	32.08/0.898 0	29.02/0.803 4	27.51/0.733 1
CARN	32.09/0.897 8	29.06/0.803 4	27.58/0.734 9
ESPCNN	-	25.98/0.787 1	-
MLP-Net(本文)	32.22/0.929 1	28.15/0.852 1	25.17/0.758 3
注：加粗字体为每列最大值，“-”代表无数据。

表 7 在BSD100数据集上重建结果比较
Table 7 The comparisons of reconstructed results on BSD100 dataset

下载CSV

算法	PSNR(dB)/SSIM
算法	×2	×3	×4
Bicubic	29.56/0.843 1	27.21/0.738 5	25.96/0.667 5
FSRCNN	31.53/0.892 0	28.53/0.791 0	26.98/0.715 0
VDSR	31.90/0.896 0	28.82/0.797 6	27.29/0.725 1
LapSRN	31.80/0.894 9	28.82/0.795 0	27.32/0.716 2
IDN	31.61/0.930 6	27.28/0.838 7	25.05/0.753 1
CARN	31.51/0.931 2	27.38/0.840 4	26.07/0.783 7
ESPCNN	-	28.65/0.793 2	-
MLP-Net(本文)	32.15/0.898 9	29.07/0.804 6	27.53/0.734 2
注：加粗字体为每列最大值，“-”代表无数据。

图 6 不同方法在Urban100_081上重建效果对比

Fig. 6 The comparisons of reconstruction results of Urban100_081 by different methods ((a) original; (b) bicubic; (c) CARN; (d) FSRCNN; (e) IDN; (f) LapSRN; (g) VDSR; (h) ours)

图 7 不同方法在BSDl00_092上重建效果对比

Fig. 7 The comparisons of reconstruction results of BSDl00_092 by different methods ((a) original; (b) bicubic; (c) CARN; (d) FSRCNN; (e) IDN; (f) LapSRN; (g) VDSR; (h) ours)

图 8 不同方法在Set5_01重建效果对比

Fig. 8 The comparisons of reconstruction results of Set5_01 by different methods ((a) original; (b) bicubic; (c) CARN; (d) FSRCNN; (e) IDN; (f) LapSRN; (g) VDSR; (h) ours)

图 9 不同方法在BSDl00_06上重建效果对比

Fig. 9 The comparisons of reconstruction results of BSDl00_06 by different methods ((a) original; (b) bicubic; (c) CARN; (d) FSRCNN; (e) IDN; (f) LapSRN; (g) VDSR; (h) ours)

图 10 不同方法在BSD100_079上重建效果对比

Fig. 10 The comparisons of reconstruction results of BSD100_079 by different methods ((a) original; (b) bicubic; (c) CARN; (d) FSRCNN; (e) IDN; (f) LapSRN; (g) VDSR; (h) ours)

3 结论

本文提出了一种网络结构用来解决目前很多算法忽视感受域的问题，即利用不同大小卷积核来提取不同层次下局部和全局的特征, 从而更好地重建HR图像。本文模型充分利用了密集连接和残差网络的优点，不仅加强了网络层之间特征联系，充分利用了各层的特征，而且特征提取块和总体网络结构都具有相当好的级联性和可扩展性。实验中将本文算法与目前流行的一些超分辨率方法进行比较，结果显示本文算法不仅在主观视觉上而且在PSNR和SSIM客观评价的指标上都优于其他几种方法。由于本文模型中采用了多层次卷积结构，涉及的参数量较多，即使使用了密集连接和残差结构仍然需要较大的计算量。未来将进一步优化该多层次感知残差卷积网络结构，同时将考虑研究深度学习的其他模型结构，如生成对抗网络模型，并分析能否将其与卷积神经网络进行结合并用于图像超分辨率的重建中。

参考文献

Ahn N, Kang B and Sohn K A. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 256-272[DOI: 10.1007/978-3-030-01249-6_16]

Dai T, Cai J R, Zhang Y B, Xia S T and Zhang L. 2019. Second-order attention network for single image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11057-11066[DOI: 10.1109/CVPR.2019.01132]

Dong C, Loy C C, He K M, Tang X O. 2016a. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307 [DOI:10.1109/TPAMI.2015.2439281]

Dong C, Loy C C and Tang X O. 2016b. Accelerating the super-resolution convolutional neural network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 391-407[DOI: 10.1007/978-3-319-46475-6_25]

Fang F M, Li J C, Zeng T Y. 2020. Soft-edge assisted network for single image super-resolution. IEEE Transactions on Image Processing, 29: 4656-4668 [DOI:10.1109/TIP.2020.2973769]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]

Hu X C, Mu H Y, Zhang X Y, Wang Z L, Tan T N and Sun J. 2019. Meta-SR: a magnification-arbitrary network for super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1575-1584[DOI: 10.1109/CVPR.2019.00167]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243]

Hui Z, Wang X M and Gao X B. 2018. Fast and accurate single image super-resolution via information distillation network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 723-731[DOI: 10.1109/CVPR.2018.00082]

Kim J, Lee J K and Lee K M. 2016a. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/CVPR.2016.181]

Kim J, Lee J K and Lee K M. 2016b. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182]

Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5835-5843[DOI: 10.1109/CVPR.2017.618]

Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 105-114[DOI: 10.1109/CVPR.2017.19]

Li L Y, Su Z, Shi X H, Huang E B, Luo X N. 2018. Mutual-detail convolution model for image super-resolution reconstruction. Journal of Image and Graphics, 23(4): 572-582 (李浪宇, 苏卓, 石晓红, 黄恩博, 罗笑南. 2018. 图像超分辨率重建中的细节互补卷积模型. 中国图象图形学报, 23(4): 572-582) [DOI:10.11834/jig.170361]

Li Z, Yang J L, Liu Z, Yang X M, Jeon G and Wu W. 2019. Feedback network for image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3862-3871[DOI: 10.1109/CVPR.2019.00399]

Qiu Y J, Wang R X, Tao D P and Cheng J. 2019. Embedded block residual network: a recursive restoration model for single-image super-resolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4179-4188[DOI: 10.1109/ICCV.2019.00428]

Shen M Y, Yu P F, Wang R G, Yang J, Xue L X. 2019. Image super-resolution reconstruction via deep network based on multi-staged fusion. Journal of Image and Graphics, 24(8): 1258-1269 (沈明玉, 俞鹏飞, 汪荣贵, 杨娟, 薛丽霞. 2019. 多阶段融合网络的图像超分辨率重建. 中国图象图形学报, 24(8): 1258-1269) [DOI:10.11834/jig.180619]

Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[DOI: 10.1109/CVPR.2016.207]

Yang J, Li W J, Wang R G, Xue L X. 2019. Generative adversarial network for image super-resolution combining perceptual loss. Journal of Image and Graphics, 24(8): 1270-1282 (杨娟, 李文静, 汪荣贵, 薛丽霞. 2019. 融合感知损失的生成式对抗超分辨率算法. 中国图象图形学报, 24(8): 1270-1282) [DOI:10.11834/jig.180613]

Yang X, Mei H Y, Zhang J Q, Xu K, Yin B C, Zhang Q, Wei X P. 2019. DRFN: deep recurrent fusion network for single-image super-resolution with large factors. IEEE Transactions on Multimedia, 21(2): 328-337 [DOI:10.1109/tmm.2018.2863602]

Zhang K, Zuo W M and Zhang L. 2019a. Deep plug-and-play super-resolution for arbitrary blur kernels//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1671-1681[DOI: 10.1109/CVPR.2019.00177]

Zhang Z D, Wang X R, Jung C. 2019b. DCSR: dilated convolutions for single image super-resolution. IEEE Transactions on Image Processing, 28(4): 1625-1635 [DOI:10.1109/TIP.2018.2877483]

Zhou F, Yang W M, Liao Q M. 2012. Interpolation-based image super-resolution using multisurface fitting. IEEE Transactions on Image Processing, 21(7): 3312-3318 [DOI:10.1109/TIP.2012.2189576]