发布时间: 2021-03-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200108
2021 | Volume 26 | Number 3

图像理解和计算机视觉

多通道递归残差网络的图像超分辨率重建

程德强¹, 郭昕¹, 陈亮亮¹, 寇旗旗², 赵凯¹, 高蕊¹

1. 中国矿业大学信息与控制工程学院, 徐州 221116;

2. 中国矿业大学计算机科学与技术学院, 徐州 221116

收稿日期: 2020-04-13; 修回日期: 2020-06-29; 预印本日期: 2020-07-06

基金项目: 国家重点研发计划项目（2018YFC0808302）；国家自然科学基金项目（51774281）

作者简介: 程德强, 1979年生, 男, 教授, 主要研究方向为图像智能检测与模式识别、图像处理与视频编码。E-mail: chengdq@cumt.edu.cn;
郭昕, 男, 硕士研究生, 主要研究方向为图像识别、图像超分辨率重建。E-mail: guo_xin@cumt.edu.cn;
陈亮亮, 男, 博士研究生, 主要研究方向为图像识别、图像超分辨率重建。E-mail: chenll01@cumt.edu.cn;
寇旗旗, 通信作者, 男, 讲师, 主要研究方向为视频图像处理、模式识别。E-mail: kouqiqi@cumt.edu.cn;
赵凯, 男, 博士研究生, 主要研究方向为行人再识别。E-mail: 478771764@qq.com;
高蕊, 女, 博士研究生, 主要研究方向为图像超分辨率重建。E-mail: 1508654068@qq.com

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2021)03-0605-14

摘要

目的基于神经网络的图像超分辨率重建技术主要是通过单一网络非线性映射学习得到高低分辨率之间特征信息关系来进行重建，在此过程中较浅网络的图像特征信息很容易丢失，加深网络深度又会增加网络训练时间和训练难度。针对此过程出现的训练时间长、重建结果细节信息较模糊等问题，提出一种多通道递归残差学习机制，以提高网络训练效率和图像重建质量。方法设计一种多通道递归残差网络模型，该模型首先利用递归方法将残差网络块进行复用，形成32层递归网络，来减少网络参数、增加网络深度，以加速网络收敛并获取更丰富的特征信息。然后采集不同卷积核下的特征信息，输入到各通道对应的递归残差网络后再一起输入到共用的重建网络中，提高对细节信息的重建能力。最后引入一种交叉学习机制，将通道1、2、3两两排列组合交叉相连，进一步加速不同通道特征信息融合、促进参数传递、提高网络重建性能。结果本文模型使用DIV2K（DIVerse 2K）数据集进行训练，在Set5、Set14、BSD100和Urban100数据集上进行测试，并与Bicubic、SRCNN（super-resolution convolutional neural network）、VDSR（super-resolution using very deep convolutional network）、LapSRN（deep Laplacian pyramid networks for fast and accurate super-resolution）和EDSR_baseline（enhanced deep residual networks for single image super-resolution_baseline）等方法的实验结果进行对比，结果显示前者获取细节特征信息能力提高，图像有了更清晰丰富的细节信息；客观数据方面，本文算法的数据有明显的提升，尤其在细节信息较多的Urban100数据集中PSNR（peak signal-to-noise ratio）平均分别提升了3.87 dB、1.93 dB、1.00 dB、1.12 dB和0.48 dB，网络训练效率相较非递归残差网络提升30%。结论本文模型可获得更好的视觉效果和客观质量评价，而且相较非递归残差网络训练过程耗时更短，可用于复杂场景下图像的超分辨率重建。

关键词

超分辨重建; 多通道; 递归; 交叉; 残差网络模型

Image super-resolution reconstruction from multi-channel recursive residual network

Cheng Deqiang¹, Guo Xin¹, Chen Liangliang¹, Kou Qiqi², Zhao Kai¹, Gao Rui¹

1. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China;

2. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China

Supported by: National Key Research and Development Program of China (2018YFC0808302); National Natural Science Foundation of China (51774281)

Abstract

Objective The limitations of external environment, hardware conditions, and network resources will cause the images we obtain in daily life to be low-resolution images, which will affect the accuracy of images used in other applications. Therefore, super-resolution reconstruction technology has become a very important research topic. This technique can be used to recover super-resolution images. High-resolution images can be reconstructed from the information relationship between high-resolution and low-resolution images. Obtaining the correspondence between high-resolution and low-resolution images is the key to image super-resolution reconstruction technology. It is a basic method for neural networks to solve the problem of image super-resolution by using the single-channel network to learn the feature information relationship between high resolution and low resolution. However, the feature information of the image is easily lost in the shallow layer, and the low utilization of the feature information leads to an unsatisfactory reconstruction effect when the image magnification is large, and the restoration ability of the image detail information is poor. Simply deepening the depth of the network will increase the training time and difficulty of the network, which will waste a large amount of hardware resources and time. A multi-channel recursive residual network model is proposed to solve these problems. This model can improve network training efficiency by iterating the residual network blocks and enhance the detailed information reconstruction capability through multi-channel and cross-learning mechanisms. Method A multi-channel recursive cross-residual network model is designed. The use of a large number of convolutional layers in the model explains why training takes a large amount of time. Fewer convolutional layers will reduce network reconstruction performance. Therefore, the method of recursive residual network blocks is used to deepen the network depth and speed up the network training. First, a multi-channel recursive cross-residual network model is designed. The model uses recursive multiplexing of residual network blocks to form a 32-layer recursive network, thereby reducing network parameters and increasing network depth. This model can speed up network training and obtain richer information. Then, the amount of feature information which has a great influence on reconstruction performance, obtained by deepening the network, is limited. Characteristic information is easily lost in the network. Therefore, multi-channel networks are used to obtain richer feature information, increase the access to information, and reduce the rate of information loss. This method can improve the ability of the network to reconstruct image detail information. Finally, the degree of information fusion in the network is increased to facilitate image super-resolution reconstruction. A multi-channel network cross-learning mechanism is introduced to speed up the fusion of feature information of different channels, promote parameter transfer, and effectively improve the training efficiency and information fusion degree. Result Experimental results measure the performance of the algorithm by using peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and network training time. Bicubic, A+, super-resolution convolutional neural network(SRCNN), super-resolution using very deep convolutional network(VDSR), deep Laplacian pyramid networks for fast and accurate super-resolution(LapSRN), and enhanced deep residual network for single image super-resolution-baseline(EDSR_baseline) are used for comparison in open datasets. Training is performed on the DIV2K(DIVerse 2K) dataset, where the network uses 800 as the training dataset and 100 as the validation dataset. Tests are then performed on the Set5, Set14, BSD100, and Urban100 datasets with 219 test data. Three reconstruction models are designed, which are enlarged at×2, ×3, and×4 resolutions, to facilitate the comparison of common algorithms. In the experiments, experimental data and reconstructed images are analyzed in detail. Compared with traditional serial networks, recursive networks can improve network efficiency and reduce network computing time. Especially in the Urban100 data set with more details, the experiments show that compared with Bicubic, SRCNN, VDSR, LapSRN, and traditional series networks, average PSNR increases by 3.87 dB, 1.93 dB, 1.00 dB, 1.12 dB, and 0.48 dB, respectively. The visual effect is also clearer than that of the previous algorithm. Compared with the traditional tandem network, network training efficiency is improved by 30%. Conclusion The proposed network overcomes the shortcomings of single-channel deep networks and accelerates network convergence and information fusion by adding recursive residual networks and cross-learning mechanisms. In addition, recursive residual networks can accelerate network convergence and solve problems such as gradients during network training. Experimental results show that compared with the existing reconstruction methods, this method can obtain higher PSNR and SSIM, and can improve substantially in images with more detailed information. Thus, this method has the advantages of short training time, low information redundancy, and better reconstruction effect. In the future, we will consider continuing to optimize the recursive network scale and network cross-learning mechanism.

Key words

super-resolution reconstruction; multi-channel; recursion; cross; residual network model

0 引言

数字图像处理技术是计算机视觉的重要研究内容，已应用到许多领域，如医学图像分析(Peled和Yeshurun，2001)、目标跟踪与检测(李红艳等，2019)、场景分类与重建(谭琨等，2019)、监控与安防系统等(Li和Orchard，2001；Unser等，1991)。视频图像的质量直接影响着上述应用的精度和效果，图像超分辨率重建技术可在不改变设备硬件环境的前提下将低分辨率图像通过一定的处理转换为高分辨率图像，达到提高视频图像质量的目的，近年来成为国内外学者的研究热点。

自20世纪60年代以来，大量基于插值(Unser等，1991；Peleg等，1987)和重建(Schultz和Stevenson，1996；Sun等，2008；Chang等，2004)的图像超分辨率重建算法不断被提出。随着卷积神经网络和深度学习的发展，基于学习(Timofte等，2013；Huang等，2015；Yang等，2008)的方法重建效果明显优于传统插值和重建方法，已引起了广泛关注。例如：Yang等人(2008)提出基于稀疏编码理论的重建方法，该方法通过构造高低分辨率图像之间的映射关系来对图像进行重建。之后也出现了大量优化算法(程德强等, 2018a, b)，Timofte等人(2014)在Yang等人(2008)方法的基础上利用稀疏字典和回归器，并结合邻域回归算法和简单函数提出了A+(adjusted anchored neighborhood regression for fast super-resolution)模型以实现图像重建。利用博弈思想的生成对抗网络(generative adversarial network，GAN)广泛应用在超分辨率重建中(Ledig等，2016；曹仰杰等，2018；Dou等，2020)，SRGAN(photo-realistic single image super-resolution using a generative adversarial network)网络模型可在较低PSNR(peak signal-to-noise ratio)前提下获得卓越的视觉体验。Dong等人(2014)提出SRCNN(super-resolution convolutional neural network)网络模型进行图像超分辨率重建，掀起了神经网络在超分辨率重建研究中的热潮。

但SRCNN仅适用3层网络，当图像重建因子较大时重建效果较差。之后也出现了许多优化SRCNN参数和特征维度的模型(肖进胜等，2017)。Dong等人(2016)也为提升SRCNN的重建性能提出了FSRCNN(fast super-resolution convolutional neural network)模型，通过使用反卷积层并改变特征维度达到加速网络训练过程的目的。同年Shi等人(2016)提出了ESPCN(efficient sub-pixel convolutional neural network)模型，实现了直接在网络中对图像进行缩放。虽然以上方法都提升了重建质量，但网络深度仍较浅，存在特征信息丢失的缺陷，当模型重建倍数较大时，重建结果并不理想。然而仅仅加深网络深度会使得网络训练变得困难，He等人(2016)提出的残差网络ResNet(residual network)解决了较深网络训练困难的问题。Kim等人(2016)提出的VDSR(accurate image super-resolution using very deep convolutional network)网络模型，利用残差网络将不同缩放因子的图片放在一起训练，可一次性重建多种放大因子的图片。Chen等人(2020)利用内容导向的深度残差网络，通过引导残差块实现图像超分辨率重建。刘树东等人(2019)利用残差网络提出对称残差CNN(convolutional neural network)进行图像超分辨率重建。以上算法在一定程度上提高了网络重建性能，加深了网络深度，但对图像细节信息的重建仍存在模糊和虚化等问题，特征信息的融合程度不够，网络训练时间也较长。

本文针对这些问题，提出一种多通道递归残差网络(recursive multi-channel network cross residual super-resolution image reconstruction, MCSR)模型。具体的创新如下：

1) 利用递归网络的思想，通过复用残差网络块，减少网络运算量，提高网络训练效率。

2) 针对单一通道的递归网络可能会丢失部分特征信息的问题，提出多通道网络，以获取更丰富的特征信息，提高对细节信息的重建能力。

3) 在多通道网络中加入交叉学习机制，更好地融合多通道特征信息，帮助特征信息在深度网络中传输。

1 网络模型

基于卷积神经网络的MCSR网络模型的处理过程分为特征提取和图像重建两部分。特征提取部分存在3个网络通道，每个通道含有4个递归残差网络，2个卷积层。每个通道内的4个递归残差网络由编号为1的递归残差网络进行4次递归处理形成。图像重建部分采用Shi等人(2016)在ESPCN中提出的亚像素卷积，网络的整体结构如图 1所示。

图 1 网络结构图

Fig. 1 Network structure diagram

网络模型训练过程中，采用L1范数的损失函数。L1范数损失函数对数据的波动敏感，可以有效指导模型参数的更新并防止梯度的变化，获得较高质量的重建图像。L1范数的损失函数为

$ S_{\mathrm{LI}}=\frac{1}{L^{2} M N} \sum\limits_{x=0}^{L M} \sum\limits_{y=0}^{L N}\left|f_{\mathrm{H}}(x, y)-f_{\mathrm{L}}(x, y)\right| $

(1)

式中，$L$为放大倍数，$M$, $N$分别表示图像的宽和高，${{f_{\rm{H}}}(x, y)}$为高分辨率图像，${{f_{\rm{L}}}(x, y)}$为通过网络模型重建出的高分辨率图像，$x$, $y$分别表示当前像素点在图像矩阵中的行号和列号，$x$为0~$LM$的常数，$y$为0~$LN$的常数，$LM$和$LN$分别表示放大倍数$L$与图像宽$M$和高$N$的乘积。

1.1 特征提取

在特征提取阶段，将输入图像通过3个大小为3×3的卷积核，并将结果输入到各自通道的递归残差网络中，进行非线性学习，本文的多通道递归残差网络模型如图 2所示。各个通道内的递归残差网络模型如图 3所示，图 3中的残差网络块为图 2中的递归网络块，其由4个残差块串联实现，图 3中的残差块结构如图 4所示，其卷积核大小均为3×3，采样特征维度为64。

图 2 多通道递归网络模型

Fig. 2 Multi-channel recursive cross-network model

图 3 递归残差网络图

Fig. 3 Recursive network diagram

图 4 残差块

Fig. 4 Residual block

在卷积神经网络中，网络层数过深可能会导致图像特征信息在经过多层卷积网络后消失，使网络重建质量变差。为保留更加丰富的特征信息，本文的非线性映射采取递归残差网络。在残差网络中卷积层的计算是算法耗时的主要原因，本模型采用递归将残差网络块进行复用，实现通道内的参数共享，避免了因重复运算不同残差块而带来网络训练效率较低的情况，加速深度网络的训练过程。递归网络模型如图 3所示，递归网络可表示为

$ G=\prod\limits_{t=1}^{4} H_{t} $

(2)

式中，$G$为递归网络，$H_{t}$为递归网络块，$t$为取1、2、3、4的常数。递归残差网络在满足网络深度和保留更多特征信息的同时提高了网络训练效率，实验证明此方法的网络训练时间较非递归残差网络缩短了约30%左右。

本文选取的残差块来源于ResNet，但是本文的残差块去除了BN(batch normalization)层(Nah等，2017)，去除BN层后不仅可以提升网络性能，还可缩短网络训练时间。本文所用残差块如图 4所示，改进的残差单元可表示为

$ \boldsymbol{R}_{n}=F\left(\boldsymbol{R}_{n-1}, W_{i}\right)+\boldsymbol{R}_{n-1} $

(3)

式中, $i$为常数，表示两个卷积层，值取1和2。$\boldsymbol{R}_{n-1}$为第$n$个残差块的输入，$\boldsymbol{R}_{n}$为第$n$个残差块的输出，$F$为学习到的非线性映射，即

$ F\left(\boldsymbol{R}_{n-1}, W_{i}\right)=W_{i}^{2} \cdot \lambda\left(W_{i} \cdot \boldsymbol{R}_{n-1}\right) $

(4)

式中, $W_{i} $为卷积层的权重，$λ$为激励函数ReLU。

Zhang等人(2016)提出多通道网络可获取不同种类的特征信息, 所以本模型采取多个通道的递归残差网络，来避免单一通道丢失信息的局限性。Lyu等人(2018)和Jiang等人(2020)提出把不同分支的信息进行融合的方法，该方法能够有效提高网络性能，所以在非线性学习中，利用交叉学习机制将各个通道间独立的特征信息充分融合，能够提高网络对细节信息的学习能力。综合考虑网络性能和硬件成本，模型选取了3通道网络，将3个通道交叉连接，即通道1和通道2、通道2和通道3、通道1和通道3分别进行交叉连接，如图 2所示，将3个通道交叉连接后的特征信息分别与各个通道的最后输出进行叠加后输入到下一层。

1.2 图像重建

网络模型的图像上采样并不需要对图像进行预处理，只要输入原始图像即可，使用亚像素卷积方法进行图像的上采样，如图 5所示。本文以上采样因子为2的模型为基础，将其迁移到上采样因子为3和4的操作中，同样的, 缩放因子为3和4的网络模型也在其2倍模型的基础上训练而来(Lim等，2017)，模型如图 6所示。以预先训练好的网络为基础可提高较大重建因子模型的训练效率，减少模型生成时间。

图 5 亚像素重建示意图

Fig. 5 Schematic diagram of sub-pixel reconstruction

图 6 上采样网络

Fig. 6 Upsampling network

以4×4像素的图像进行上采样因子为2的处理为例，为了使输出特征图与原始图像大小相同，对原图像进行补0处理，卷积核的大小为3×3。输出特征图数由放大倍数决定，如放大$L$倍，则输出的特征图数为$L$×$L$个。则图 5的输出特征图为4个，再将特征图中的像素点进行周期排列即可实现对原图像放大2倍的操作。

2 实验结果及分析

2.1 网络训练环境

网络训练所用平台为ubuntu18.04；编程框架为Pytorch1.0；处理器为Intel(R) Core (TM) i7-7800X CPU @2.5 GHzx12；系统内存为32 GB；显卡为GTX1080Ti，11 GB显存容量；Cuda8.0版本。

训练数据集为DIV2K(DIVerse 2K)数据集，其中包含训练图像800幅，验证图像100幅，测试图像100幅。本文先利用验证数据集来验证训练模型的性能，再利用Set5、Set14、BSD100、Urban100数据集测试，并对比本文网络与其他网络的性能。网络共迭代训练300次，初始学习率为0.000 1，当训练到200 epoch以后将学习率改为0.000 05。优化方法采用Adam，网络中出现的卷积核均为3×3大小。

2.2 性能评价指标

本文使用峰值信噪比(PSNR)和结构相似性(structural similarity, SSIM)来评价图像重建的质量。峰值信噪比是目前使用最普遍的图像重建评价指标，单位为dB。一般情况下，PSNR的数值越高，表示重建图像与原始图像越接近。为避免PSNR评价图像质量的局限性，本文还采用结构相似性来评价图像质量，其表达的是两幅图像的相似性，SSIM的值越接近1，说明重建结果越好。

2.3 网络性能对比

本文模型含有16个残差块且特征维度为64，为验证本文算法的有效性，本文进行了与含有16个残差块且特征维度为64的EDSR(enhanced deep super-resolution netwrok)(Lim等，2017)基础网络EDSR_baseline(enhanced deep residual networks for single image super-resolution-baseline)和EDSR_baseline组成的多通道无交叉网络(no cross network model, NOC)的对比实验。分别统计训练300个epoch后验证集的PSNR的变化，如图 7所示。为清晰对比递归交叉网络对系统的影响，将PSNR的分布图局部放大后显示为图 7(b)。

图 7 三种方法在验证集的PSNR收敛曲线

Fig. 7 PSNR convergence curves of the three methods in the verification set((a) PSNR distribution map; (b) PSNR partial enlargement)

由图 7可知, MCSR和NOC都优于EDSR_baseline。而且MCSR与NOC相比，数据波动更小，数据收敛更早，收敛后PSNR值更高。

为验证递归复用去除BN层的残差块可提高网络训练效率，将MCSR、交叉方式1(如图 8所示)、交叉方式2(如图 9所示)和NOC等网络的训练时间进行实验对比，如表 1所示。

图 8 交叉方式1

Fig. 8 Crossover mode 1

图 9 交叉方式2

Fig. 9 Crossover mode 2

表 1 网络训练时间对比表
Table 1 Network training time comparison table

下载CSV

训练次数	NOC	交叉方式1	交叉方式2	MCSR(本文)
一个epoch/s	169	157	154	134
网络训练结束/h	约14	约13	约13	约11
注：加粗字体表示最优性能。

在相同的硬件环境下，改进后的网络MCSR比串联无递归网络NOC训练300次约节约3 h的时间，网络训练效率提升30%。与交叉方式1和交叉方式2相比，MCSR的交叉学习机制在网络参数数量上少于交叉方式1和交叉方式2，网络训练效率也优于交叉方式1和交叉方式2。

为进一步证明本文递归型交叉学习机制的有效性，将数据集在不同交叉学习机制下进行缩放因子为2的重建，对比实验所取得的PSNR值和SSIM值，结果见表 2。选取图像在不同交叉方式下进行重建实验，对比视觉效果差别，如图 10所示，各方式的PSNR和SSIM值如图中下方数字所示。通过数据和视觉体验对比可得，MCSR的交叉连接方式与交叉方式1和交叉方式2相比在4个数据集上PSNR平均分别提升0.4和0.14，SSIM平均分别提升0.009和0.007。在视觉感受上，MCSR重建后的图像相比交叉方式1和交叉方式2在图 10第一幅图中花瓣相交区域像素相似处的分辨能力较强，在图 10第2幅图中MCSR在工人安全帽帽檐边缘处较交叉方式1和交叉方式2其边缘更清晰且振铃效果更小。

表 2 不同交叉方式性能对比
Table 2 Performance comparison of different crossover methods

下载CSV

数据集	PSNR(/dB)/SSIM
数据集	交叉方式1	交叉方式2	MCSR(本文)
Set5	37.88/0.959	37.93/0.959	38.03/0.981
Set14	33.40/0.916	33.45/0.916	33.58/0.917
BSD100	31.35/0.891	32.11/0.898	32.18/0.899
Urban100	31.49/0.922	31.67/0.923	31.94/0.926
均值	33.53/0.922	33.79/0.924	33.93/0.931
注：加粗字体表示最优性能。

图 10 不同交叉方式重建结果对比2

Fig. 10 Comparison of reconstruction results in different crossover modes 2

((a) original image; (b) cross mode 1; (c) cross mode 2; (d) MCSR(ours))

将本文提出的MCSR网络与三次插值Bicubic、A+、SRCNN、VDSR、LapSRN(deep Laplacian pyramid networks for fast and accurate syper-resolution)(Lai等，2017)、EDSR_baseline和EDSR_baseline组成的多通道无交叉网络(NOC)进行对比，各个网络模型在不同缩放因子下所取得的PSNR值和SSIM如表 3所示。通过表中数据可以得知，本文提出的MCSR网络模型在客观评价指标上明显优于其他网络。在不同缩放因子下，MCSR的PSNR相对Bicubic、SRCNN、VDSR、LapSRN、EDSR_baseline和NOC平均分别提升3.23 dB、1.35 dB、0.65 dB、0.63 dB、0.36 dB和0.03 dB，SSIM平均分别提升0.074、0.026、0.011、0.01、0.005和与NOC方法相当。尤其在细节信息较多的数据集Urban100中PSNR平均分别提升3.87 dB、1.93 dB、1.00 dB、1.12 dB、0.48 dB和0.03 dB，SSIM平均分别提升0.109、0.048、0.022、0.021、0.010和0.001。NOC因增加了网络通道数, 可获取到相比其他网络更多的特征信息，性能得到一定提升，但也仅个别情况下能与MCSR算法性能相当，而且由表 1可知，本文算法相较于NOC网络训练效率提高了30%，节约了大量硬件资源。

表 3 缩放因子为2、3、4时在基准数据集下的指标对比
Table 3 Comparison of indicators on the baseline dataset when the amplification factors are 2, 3, and 4

下载CSV

数据集	缩放因子	PSNR(/dB)/SSIM
数据集	缩放因子	Bicubic	A+	SRCNN	VDSR	LapSRN	EDSR_baseline	NOC	MCSR(本文)
Set5	×2	33.66/0.929	36.54/0.954	36.66/0.954	37.54/0.959	37.53/0.959	37.68/0.958	38.00/0.960	38.03/0.960
	×3	30.39/0.868	32.58/0.908	32.75/0.909	33.65/0.921	33.82/0.923	34.13/0.924	34.44/0.926	34.44/0.926
	×4	28.42/0.810	30.28/0.866	30.48/0.863	31.35/0.884	31.53/0.885	31.91/0.889	32.16/0.899	32.19/0.894
Set14	×2	30.24/0.868	32.28/0.905	32.43/0.907	33.03/0.912	32.99/0.912	33.28/0.914	33.54/0.916	33.58/0.917
	×3	27.55/0.774	29.13/0.774	29.30/0.822	29.78/0.831	29.87/0.832	30.14/0.838	30.33/0.842	30.37/0.842
	×4	26.00/0.703	27.32/0.749	27.50/0.750	28.01/0.767	28.19/0.772	28.44/0.777	28.59/0.781	28.63/0.782
BSD100	×2	29.56/0.843	31.21/0.886	31.34/0.888	31.90/0.896	31.80/0.895	31.29/0.897	32.16/0.899	32.18/0.899
	×3	27.21/0.738	28.29/0.784	28.41/0.786	28.82/0.798	28.83/0.798	28.98/0.801	29.10/0.804	29.11/0.805
	×4	25.96/0.668	26.82/0.709	26.90/0.711	27.30/0.725	27.33/0.727	27.45/0.731	27.58/0.735	27.58/0.736
Urban100	×2	26.88/0.840	-	29.50/0.895	30.76/0.914	30.45/0.913	31.29/0.919	31.91/0.925	31.94/0.926
	×3	24.46/0.735	-	26.25/0.799	27.14/0.828	27.08/0.828	27.69/0.841	28.07/0.850	28.10/0.851
	×4	23.14/0.657	-	24.53/0.721	25.18/0.751	25.20/0.756	25.65/0.771	26.01/0.782	26.04/0.783
均值		27.79/0.786	-	29.67/0.834	30.37/0.849	30.39/0.850	30.66/0.855	30.99/0.860	31.02/0.869
注：加粗字体表示每行最优性能，-表示此数据集在该方法下，原作者未进行相应实验。

为进一步验证本文模型相对其他算法对重建性能的提升，将Set5、Set14数据集中的19幅图像进行缩放因子为2的重建，分别计算单幅图像通过每个模型重建后的PSNR和SSIM，其计算结果如表 4和表 5所示。可见在Set5数据集上，MCSR的PSNR相对三次插值Bicubic、A+、SRCNN、VDSR、LapSRN、EDSR_baseline和NOC等方法平均分别提升4.36 dB、1.50 dB、1.37 dB、0.47 dB、0.51 dB、0.35 dB和0.03 dB，SSIM平均分别提升0.03、0.005、0.006、0.001、0.002、0.002，和NOC方法相当。在Set14数据集上，MCSR的PSNR相对三次插值Bicubic、A+、SRCNN、VDSR、LapSRN、EDSR_baseline和NOC等方法平均分别提升3.35 dB、1.16 dB、1.13 dB、0.56 dB、0.59 dB、0.30 dB和0.04 dB，SSIM平均分别提升0.048、0.017、0.010、0.004、0.005、0.003，与NOC方法相当。

表 4 在Set5上的实验结果对比
Table 4 Comparison of experimental results on Set5

下载CSV

图像	PSNR(/dB)/SSIM
图像	Bicubic	A+	SRCNN	VDSR	LapSRN	EDSR_baseline	NOC	MCSR(本文)
Baby	37.09/0.953	38.48/0.967	38.54/0.966	38.78/0.967	38.78/0.968	38.82/0.966	38.91/0.967	38.92/0.967
Bird	36.81/0.972	41.10/0.986	40.91/0.986	42.49/0.989	42.66/0.990	42.77/0.989	43.35/0.990	43.37/0.990
Butterfly	27.43/0.915	32.02/0.963	32.75/0.964	34.50/0.975	34.23/0.974	34.67/0.975	35.31/0.977	35.39/0.977
Head	34.86/0.863	35.76/0.887	35.72/0.886	35.96/0.890	35.94/0.890	35.97/0.889	36.03/0.891	36.05/0.891
Woman	32.14/0.948	35.27/0.970	35.36/0.969	36.06/0.973	35.98/0.974	36.17/0.973	36.40/0.974	36.45/0.974
均值	33.67/0.930	36.53/0.955	36.66/0.954	37.56/0.959	37.52/0.958	37.68/0.958	38.00/0.960	38.03/0.960
注：加粗字体表示最优性能。

表 5 在Set14上的实验结果对比
Table 5 Comparison of experimental results on Set14

下载CSV

图像	PSNR(/dB)/SSIM
图像	Bicubic	A+	SRCNN	VDSR	LapSRN	EDSR_baseline	NOC	MCSR(本文)
Baboon	24.86/0.695	25.57/0.763	25.74/0.767	25.94/0.778	25.89/0.775	26.15/0.785	26.40/0.795	26.41/0.795
Barbara	28.00/0.843	28.68/0.879	28.64/0.877	28.42/0.879	28.37/0.880	28.28/0.879	28.43/0.883	28.36/0.882
Bridge	26.58/0.791	27.76/0.848	27.83/0.850	28.04/0.858	28.01/0.856	29.46/0.866	29.53/0.868	29.55/0.869
Coastguard	29.12/0.793	30.56/0.849	30.83/0.856	30.98/0.856	30.90/0.856	31.14/0.858	31.23/0.858	31.36/0.861
Comic	26.02/0.853	28.31/0.916	28.52/0.917	29.40/0.933	29.28/0.933	29.61/0.937	30.01/0.943	30.02/0.942
Face	34.83/0.862	35.66/0.886	35.70/0.886	35.93/0.890	35.93/0.890	35.95/0.889	36.02/0.891	36.02/0.891
Flowers	30.37/0.900	33.00/0.936	33.32/0.937	34.34/0.946	34.19/0.945	34.49/0.946	34.92/0.950	35.00/0.950
Foreman	34.14/0.948	36.85/0.968	36.47/0.966	37.40/0.973	37.15/0.971	37.55/0.972	37.95/0.972	38.18/0.973
Lenna	34.70/0.912	36.56/0.930	36.64/0.930	37.06/0.933	37.02/0.933	37.19/0.932	37.40/0.934	37.40/0.934
Man	29.25/0.846	30.86/0.887	31.04/0.889	31.44/0.896	31.38/0.896	31.49/0.897	31.58/0.898	31.58/0.898
Monarch	32.94/0.961	37.00/0.977	37.74/0.977	39.43/0.981	39.18/0.981	39.54/0.980	40.06/0.981	40.09/0.981
Pepper	34.95/0.905	37.01/0.921	36.87/0.919	37.38/0.925	37.38/0.924	37.45/0.923	37.59/0.924	37.60/0.924
Ppt3	26.87/0.947	32.120.978	31.52/0.980	32.35/0.985	32.82/0.989	33.32/0.988	33.96/0.990	33.98/0.990
Zebra	30.63/0.908	33.60/0.941	33.49/0.942	34.19/0.945	34.34/0.945	34.33/0.945	34.55/0.947	34.58/0.947
均值	30.23/0.869	32.42/0.900	32.45/0.907	33.02/0.913	32.99/0.912	33.28/0.914	33.54/0.917	33.58/0.917
注：加粗字体表示最优性能。

为验证提出的多通道递归残差网络对图像细节信息重建性能的提升，选取Urban100中细节信息较多的图像进行实验。各算法重建后的图像如图 11—图 13所示，从重建图中可见，Bicubic、A+算法效果较差，结果出现模糊、振铃效应和锯齿状严重等问题，SRCNN、VDSR和LapSRN算法有了一定程度的改善，振铃效应基本消除，但还是存在细节模糊、边缘信息过于平滑的情况，EDSR_baseline、NOC和本文算法重建效果较好，视觉主观效果有了明显改善，纹理细节被恢复，建筑的边缘细节信息变得更加清晰。各个算法的PSNR和SSIM值如图中下方数字所示。在客观质量评价方面，MCSR在较多线条的图 11和图 13中相对现存最优网络EDSR_baseline算法，其PSRN分别提升0.54 dB和1.05 dB，SSIM分别提升0.013和0.01。在边缘信息较丰富的建筑室外窗户图 12中相较于EDSR_baseline，其PSRN提升0.37 dB，重建后的图像SSIM值提升了0.011。

图 11 Urban100中img008重建对比图

Fig. 11 Comparison of img008 reconstruction in Urban100((a) original image; (b) Bicubic; (c) A+; (d) SRCNN; (e) VDSR; (f) LapSRN; (g) EDSR_baseline; (h) NOC; (i) MCSR(ours))

图 12 Urban100中img034重建对比图

Fig. 12 Comparison of img034 reconstruction in Urban100((a) original image; (b) Bicubic; (c) A+; (d) SRCNN; (e) VDSR; (f) LapSRN; (g) EDSR_baseline; (h) NOC; (i) MCSR(ours))

图 13 Urban100中img062重建对比图

Fig. 13 Comparison of img062 reconstruction in Urban100((a) original image; (b) Bicubic; (c) A+; (d) SRCNN; (e) VDSR; (f) LapSRN; (g) EDSR_baseline; (h) NOC; (i) MCSR(ours))

3 结论

图像超分辨重建是数字图像处理技术的重要分支，针对基于卷积神经网络的超分辨重建技术目前存在的挑战，本文提出了一种多通道递归残差网络MCSR，对深度网络训练困难、特征信息利用率低和图像重建细节模糊等问题进行了优化。该模型由3通道卷积网络组成，每个通道搭建16个残差递归网络用来进行特征提取，去除不必要的网络层并实现参数共享，有效减少了网络训练时间。使用多通道交叉学习机制对特征信息进行交叉融合，提高了数据特征信息的利用效率。

在标准数据集上的实验结果表明本网络相对现有的网络模型对细节信息提取能力、网络训练效率、图像重建客观评价指标(PSNR、SSIM)、图像细节重建能力和重建后的视觉效果都有所提升。虽然本文算法和NOC网络模型在少数的测试数据中出现性能相当的情况，但是本文算法在网络训练效率上有较大提升。

由于本文主要考虑多通道和交叉学习机制对网络性能的影响，所以未考虑递归网络规模和网络深度的问题。下一步将考虑在优化实验中改进递归网络规模和交叉学习机制，进一步优化特征融合与网络信息传输效率。

参考文献

Cao Y J, Jia L L, Chen Y X, Lin N, Li X X. 2018. Review of computer vision based on generative adversarial networks. Journal of Image and Graphics, 23(10): 1433-1449 (曹仰杰, 贾丽丽, 陈永霞, 林楠, 李学相. 2018. 生成式对抗网络及其计算机视觉应用研究综述. 中国图象图形学报, 23(10): 1433-1449) [DOI:10.11834/jig.180103]

Chang H, Yeung D Y and Xiong Y M. 2004. Super-resolution through neighbor embedding//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE: #1315043[DOI: 10.1109/CVPR.2004.1315043]

Chen L L, Kou Q Q, Cheng D Q, Yao J. 2020. Content-guided deep residual network for single image super-resolution. Optik, 202 [DOI:10.1016/j.ijleo.2019.163678]

Cheng D Q, Chen L L, Cai Y C, You D L, Tu Y L. 2018a. Image super-resolution reconstruction based on multi-dictionary and edge fusion. Journal of China Coal Society, 43(7): 2084-2090 (程德强, 陈亮亮, 蔡迎春, 游大磊, 屠屹磊. 2018a. 边缘融合的多字典超分辨率图像重建算法. 煤炭学报, 43(7): 2084-2090) [DOI:10.13225/j.cnki.jccs.2017.1263]

Cheng D Q, Liu W L, Shao L R, Chen L L. 2018b. Super resolution reconstruction algorithm based on kernel sparse representation and atomic correlation. Journal of Image and Graphics, 23(9): 1285-1292 (程德强, 刘威龙, 邵丽蓉, 陈亮亮. 2018b. 核稀疏表示和原子相关度的图像重建. 中国图象图形学报, 23(9): 1285-1292) [DOI:10.11834/jig.180011]

Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI: 10.1007/978-3-319-10593-2_13]

Dong C, Loy C C and Tang X O. 2016. Accelerating the super-resolution convolutional neural network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 391-407[DOI: 10.1007/978-3-319-46475-6_25]

Dou X Y, Li C Y, Shi Q, Liu M X. 2020. Super-resolution for hyperspectral remote sensing images based on the 3D attention-SRGAN network. Remote Sensing, 12(7): 1204 [DOI:10.3390/rs12071204]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern recognition (CVPR), Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]

Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 5197-5206[DOI: 10.1109/CVPR.2015.7299156]

Jiang K, Wang Z Y, Yi P, Chen C, Huang B J, Luo Y M, Ma J Y and Jiang J J. 2020. Multi-scale progressive fusion network for single image deraining[EB/OL].[2020-03-24]. https://arxiv.org/pdf/2003.10985v2.pdf

Kim J, Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182]

Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5838-5843[DOI: 10.1109/CVPR.2017.618]

Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2016. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 105-114[DOI: 10.1109/CVPR.2017.19]

Li H Y, Li C G, An J B, Ren J L. 2019. Attention mechanism improves CNN remote sensing image object dedection. Journal of Image and Graphics, 24(8): 1400-1408 (李红艳, 李春庚, 安居白, 任俊丽. 2019. 注意力机制改进卷积神经网络的遥感图像目标检测. 中国图象图形学报, 24(8): 1400-1408) [DOI:10.11834/jig.180649]

Li X, Orchard M T. 2001. New edge-directed interpolation. IEEE Transactions on Image Processing, 10(10): 1521-1527 [DOI:10.1109/83.951537]

Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern recognition workshops (CVPRW). Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151]

Liu S D, Wang X M, Zhang Y. 2019. Symmetric residual convolution neural networks for the image super-resolution reconstruction. Journal of Xidian University, 46(5): 15-23 (刘树东, 王晓敏, 张艳. 2019. 一种对称残差CNN的图像超分辨率重建方法. 西安电子科技大学学报, 46(5): 15-23) [DOI:10.19665/j.issn1001-2400.2019.05.003]

Lyu F F, Lu F, Wu J H and Lim C. 2018. MBLLEN: low-light image/video enhancement using CNNs[EB/OL].[2020-03-24].http://www.bmva.org/bmvc/2018/contents/papers/0700.pdf

Nah S, Kim T H and Lee K M. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 257-265[DOI: 10.1109/CVPR.2017.35]

Peled S, Yeshurun Y. 2001. Superresolution in MRI: application to human white matter fiber tract visualization by diffusion tensor imaging. Magnetic Resonance in Medicine, 45(1): 29-35 [DOI:10.1002/1522-2594(200101)45:1<29::aid-mrm1005>3.0.co;2-z]

Peleg S, Keren D, Schweitzer L. 1987. Improving image resolution using subpixel motion. Pattern Recognition Letters, 5(3): 223-226 [DOI:10.1016/0167-8655(87)90067-5]

Schultz R R, Stevenson R L. 1996. Extraction of high-resolution frames from video sequences. IEEE Transactions on Image Processing, 5(6): 996-1011 [DOI:10.1109/83.503915]

Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1874-1883[DOI: 10.1109/CVPR.2016.207]

Sun J, Xu Z B and Shum H Y. 2008. Image super-resolution using gradient profile prior//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587659]

Tan K, Wang X, Du P J. 2019. Research progress of the remote sensing classification combining deep learning and semi-supervised learning. Journal of Image and Graphics, 24(11): 1823-1841 (谭琨, 王雪, 杜培军. 2019. 结合深度学习和半监督学习的遥感影像分类进展. 中国图象图形学报, 24(11): 1823-1841) [DOI:10.11834/jig.190348]

Timofte R, De Smet V and Van Gool L. 2014. A+: adjusted anchored neighborhood regression for fast super-resolution//Proceedings of the 12th Asian Conference on Computer Vision. Singapore, Republic of Singapore: Springer: 111-126[DOI: 10.1007/978-3-319-16817-3_8]

Timofte R, De V and Gool L V. 2013. Anchored neighborhood regression for fast example-based super-resolution//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1920-1927[DOI: 10.1109/ICCV.2013.241]

Unser M, Aldroubi A, Eden M. 1991. Fast B-spline transforms for continuous image representation and interpolation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3): 277-285 [DOI:10.1109/34.75515]

Xiao J S, Liu E Y, Zhu L, Lei J F. 2017. Improved image super-resolution algorithm based on convolutional neural network. Acta Optica Sinica, 37(3): 0318011 (肖进胜, 刘恩雨, 朱力, 雷俊锋. 2017. 改进的基于卷积神经网络的图像超分辨率算法. 光学学报, 37(3): 0318011) [DOI:10.3788/aos201737.0318011]

Yang J C, Wright J, Huang T and Ma Y. 2008. Image super-resolution as sparse representation of raw image patches//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2008.4587647]

Zhang W, Qu C F, Ma L, Guan J W, Huang R. 2016. Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network. Pattern Recognition, 59: 176-187 [DOI:10.1016/j.patcog.2016.01.034]