发布时间: 2021-12-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200555
2021 | Volume 26 | Number 12

图像处理和编码

轻量级注意力特征选择循环网络的超分重建

徐雯捷, 宋慧慧, 袁晓彤, 刘青山

1. 南京信息工程大学大气环境与装备技术协同创新中心, 南京 210044;

2. 江苏省大数据分析技术重点实验室, 南京 210044

收稿日期: 2020-09-21; 修回日期: 2021-01-06; 预印本日期: 2021-01-13

基金项目: 国家新一代人工智能重大项目（2018AAA0100400）；国家自然科学基金项目（61872189，61532009）；江苏省自然科学基金项目（BK20191397）；江苏省研究生科研与实践创新计划项目（KYCX20_0968）

作者简介: 徐雯捷, 1997年生, 女, 硕士研究生, 主要研究方向为超分辨率重建。E-mail: 779108558@qq.com
宋慧慧, 通信作者, 女, 教授, 主要研究方向为遥感图像处理。E-mail: songhuihui@nuist.edu.cn
袁晓彤, 男, 教授, 主要研究方向为机器学习与计算机视觉。E-mail: xtyuan1980@gmail.com
刘青山,男,教授,主要研究方向为视频内容分析与理解。E-mail: qsliu@nuist.edu.cn
*通信作者: 宋慧慧 songhuihui@nuist.edu.cn

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2021)12-2826-10

摘要

目的深度卷积网络在图像超分辨率重建领域具有优异性能，越来越多的方法趋向于更深、更宽的网络设计。然而，复杂的网络结构对计算资源的要求也越来越高。随着智能边缘设备（如智能手机）的流行，高效能的超分重建算法有着巨大的实际应用场景。因此，本文提出一种极轻量的高效超分网络，通过循环特征选择单元和参数共享机制，不仅大幅降低了参数量和浮点运算次数（floating point operations，FLOPs），而且具有优异的重建性能。方法本文网络由浅层特征提取、深层特征提取和上采样重建3部分构成。浅层特征提取模块包含一个卷积层，产生的特征循环经过一个带有高效通道注意力模块的特征选择单元进行非线性映射提取出深层特征。该特征选择单元含有多个卷积层的特征增强模块，通过保留每个卷积层的部分特征并在模块末端融合增强层次信息。通过高效通道注意力模块重新调整各通道的特征。借助循环机制（循环6次）可以有效提升性能且大幅减少参数量。上采样重建通过参数共享的上采样模块同时将浅层与深层特征进放大、融合得到高分辨率图像。结果与先进的轻量级网络进行对比，本文网络极大减少了参数量和FLOPs，在Set5、Set14、B100、Urban100和Manga109等基准测试数据集上进行定量评估，在图像质量指标峰值信噪比（peak signal to noise ratio，PSNR）和结构相似性（structural similarity，SSIM）上也获得了更好的结果。结论本文通过循环的特征选择单元有效挖掘出图像的高频信息，并通过参数共享机制极大减少了参数量，实现了轻量的高质量超分重建。

关键词

图像超分辨率; 轻量网络; 递归机制; 参数共享; 特征增强; 高效通道注意力

Lightweight attention feature selection recursive network for super-resolution

Xu Wenjie, Song Huihui, Yuan Xiaotong, Liu Qingshan

1. Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China;

2. Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China

Supported by: National Major Project of China for New Generation of AI (2018AAA0100400);National Natural Science Foundation of China (61872189, 61532009)

Abstract

Objective Deep convolutional neural network has shown strong reconstruction ability in image super-resolution (SR) task. Efficient super-resolution has a great practical application scenario due to the popularity of intelligent edge devices such as mobile phones. A very lightweight and efficient super-resolution network has been proposed. The proposed method has reduced the number of parameters and floating point operations(FLOPs) greatly and achieved excellent reconstruction performance based on recursive feature selection module and parameter sharing mechanism. Method The proposed lightweight attention feature selection recursive network (AFSNet)has mainly evolved three key components: low-level feature extraction, high-level feature extraction and upsample reconstruction. In the low-level feature extraction part, the input low-resolution image has passed through a 3×3 convolutional layer to extract the low-level features. In the high-level feature extraction part, a recursive feature selection module(FSM) to capture the high-level features has been designed. At the end of the network, a shared upsample block to super-resolve low-level and high-level features has been utilized to obtain the final high-resolution image. Specifically, the FSM has contained a feature enhancement block and an efficient channel attention block. The feature enhancement block has four convolutional layers. Different from other cascaded convolutional layers, this block has retained part of features in each convolutional layer and fused them at the end of this module. Features extracted from different convolutional layers have different levels of hierarchical information, so the proposed network can choose to preserve part of them step-by-step and aggregate them at the end of this module. An efficient channel attention (ECA) block has been presented following the feature enhancement block. Different from the channel attention (CA) in the residual channel attention networks(RCAN), the ECA has avoided the dimensionality reduction operation, which involves two 1×1 convolutional layers to realize no-linear mapping and cross-channel interaction. A local cross-channel interaction strategy has been implemented excluded dimensionality reduction via one-dimensional (1D) convolution. Furthermore, ECA block has adaptively opted kernel size of 1D convolution for determining coverage of local cross-channel interaction. The proposed ECA block has not increased the parameter numbers to improve the reconstruction performance.This network has employed recursive mechanism to share parameters across the efficient feature enhancement block as well to reduce the number of parameters extremely. In the end of the high-level feature extraction part, this network has concatenated and fused the output of all the FSM. The research network can capture valuable contextual information via this multi-stage feature fusion (MSFF) mechanism. In the upsample reconstruction part, this network has utilized a shared upsample block to reconstruct the low-level and high-level features into a high-resolution image, which includes a convolutional layer and a sub-pixel layer. The high-resolution image has fused low and high frequency information together without increasing the parameter numbers. Result The DF2K dataset as training dataset has been adopted, which includes 800 images from the DIV2K dataset and 2 650 images from the Flickr2k dataset. Data augmentation has been performed based on random horizontal flipping and 90 degree rotation further. The corresponding low-resolution image has been obtained by bicubic downsampling from the high-resolution image (the downscale scale is×2, ×3, ×4). The evaluation has used five benchmark datasets: Set5, Set14, B100, Urban100 and Manga109 respectively. Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) have been used as the evaluation metrics to measure reconstruction performance.The AFSNet crops borders and the metrics in the luminance channel of transformed YCbCr space have been calculated following the evaluation protocol in residual dense network(RDN). In training process, 16 low-resoution patches of size 48×48 and their corresponding high-resolution patches have been randomly cropped. In the high-level feature extraction stage, six recursive feature selection modules have been used. The number of channels in each convolution layer C=64 for our FSM has been set. In each channel split operation, the features of 16 channels have been preserved. The remaining 48 channels have been continued to perform the next convolution. The network parameters with Adam optimizer have been optimized. The network has been trained using L1 loss function. The initial learning rate has been set to 2E-4 and decreased half for every 200 epochs. The network has been implemented under the PyTorch framework with an NVIDIA 2080 Ti GPU for acceleration. The proposed AFSNet with several state-of-the-art lightweight convolutional neural networks(CNNs)-based SISR methods has been compared. The AFSNet has achieved the best performance in terms of both PSNR and SSIM among all compared methods in almost all benchmark datasets excluded×2 results on the Set5. The AFSNet has much less parameter numbers and much smaller FLOPs in particular. For×4 SR in the Set14 test dataset, the PSNR results have increased 0.4 dB, 0.6 dB and 0.43 dB respectively compared with SRFBN-S, IDN and CARN-M. The parameter numbers of AFSNet have been decreased by 47%, 53% and 38%. Meanwhile, the 24.5 G FLOPs of AFSNet have been superior to 30 G FLOPs as usual. In addition, the AFSNet has conducted ablation study on the effectiveness of the ECA module and MSFF mechanism. The AFSNet has selected×4 Set5 as test dataset.The PSNR results have been decrease by 0.09 dB and 0.11 dB, which shows the effectiveness of the proposed ECA module and MSFF mechanism when the AFSsNet dropped out ECA module and MSFF mechanism respectively. Conclusion The research has presented a lightweight attention feature selection recursive network for super-resolution, which improved reconstruction performance without large parameters and FLOPs. The network has employed a 3×3 convolutional layer in the low-level feature extraction part to extract low-resolution(LR) low-level features, then six recursive feature selection modules have been used to learn non-linear mapping and exploit high-level features. The FSM has preserved hierarchical features step-by-step and aggregated them according to the importance of candidate features based on the proposed efficient channel attention module evalution. Meanwhile, multi-stage feature fusion by concatenating outputs of all the FSM has been conducted to effectively capture contextual information of different stages. The extracted low-level and high-level features have been upsampled by a parameter-shared upsample block.

Key words

image super-resolution; lightweight networks; recursive mechanism; parameter share; feature enhancement; efficient channel attention

0 引言

图像超分辨率重建(super-resolution，SR)任务是从一幅低分辨率(low-resolution，LR)图像中重建出一幅高分辨率(high-resolution，HR)图像。作为计算机视觉领域的基础任务，超分重建有着非常广泛的应用前景，如生物特征识别、图像分析和影像监控等。近年来，由于深度神经网络强大的特征表征能力和模型适应能力，基于卷积神经网络(convolutional neural network，CNN)的超分重建算法在性能方面取得了重大突破，并逐步向实际应用场景发展，例如少样本超分(Shocher等，2018；Soh等，2020)，盲超分(Gu等，2019；Zhang等，2020)和高效超分(Ahn等，2018；Wang等，2019)等。随着智能边缘设备的普及，高效超分有着十分巨大的应用需求。Dong等人(2014)提出的SRCNN (super-resolution convolutional neural network)是将CNN引入超分重建领域的开山之作，仅使用3个卷积层就能够有效提取图像内部特征，大幅提升重建性能，远超传统方法。随后，更深更宽的网络广泛使用。Kim等人(2016a)提出的VDSR(super-resolution using very deep convolutional networks)受极深网络(Simonyan和Zisserman，2014)的启发，引入3×3小卷积核将网络加深到20层。Lim等人(2017)提出的EDSR(enhanced deep residual networks for single image super-resolution)引入一个超过60层的更深更宽的残差结构，逐步改善了图像超分重建的性能。Zhang等人(2018a)提出RDN(residual dense network)和RCAN(residual channel attention networks)超分重建算法，将网络深度增加到100层和400层，成为最新最先进的超分重建算法之一。Li等人(2018)提出MSRN(multi-scale residual network)算法，通过在多个分支上堆积不同大小的卷积核来增大感受野，从加宽而非加深网络的角度来提升性能。尽管增加网络深度与宽度可以提高超分重建结果，但也会产生巨大的计算成本并且限制算法应用于手机设备，为此提出了一系列减少参数或计算量的策略。Kim等人(2016b)提出的DRCN(deeply-recursive convolutional network)和Tai等人(2017a)提出的DRRN(deep recursive residual network)都利用递归机制减少网络参数量。Tai等人(2017b)提出的MemNet(memory network)使用递归机制并通过记忆单元解决递归循环结构中的长范围依赖问题。但是，递归中的循环结构存在巨大的信息冗余，仍耗费较大的计算资源。为此，Hui等人(2018)提出IDN(information distillation network)，利用信息蒸馏的方式，有效保留了每个卷积层中的有用特征，参数少且速度快，但是重建性能有所下降。之后，Hui等人(2019)在AIM比赛中对此改进提出多阶段信息蒸馏方式，进一步减轻网络参数量。虽然多数超分算法在进行轻量化网络设计时考虑了减少参数量，但仍存在很多冗余计算而影响超分重建性能。如何设计一个参数少、计算代价小且重建性能好的高效超分网络是一个亟待解决的问题。

本文提出一种基于轻量级注意力特征选择网络的超分网络，在得到优秀重建性能的同时，减少网络参数量和计算量。具体而言，设计一个特征选择单元，通过特征增强模块层层保留卷积层中的部分有用信息，充分捕获重建需要的丰富层次性信息；再通过后面的高效通道注意力模块根据候选特征的重要性进行聚合，在不增加网络参数量和计算资源的情况下，更好地突出提取到的图像上下文信息。本文主要贡献如下：1)提出一种新颖的轻量级图像超分重建网络，通过循环特征选择单元实现了参数少、计算量小且性能好的超分重建。2)设计了一个带有高效注意力机制的特征选择单元，充分融合层次上下文信息并能自适应调整各通道特征强弱。并对模块每一次循环输出的特征进行多阶段融合，有效挖掘图像在网络各阶段的有用信息。3)通过共享上采样模块有效提取网络不同层次信息，在不增加参数量的前提下捕捉图像的高低频信息。4)在超分重建的标准测试数据集上的峰值信噪比(peak signal to noise ratio，PSNR)和结构相似性(structural similarity，SSIM)指标远超现有先进的轻量级网络，参数量和浮点运算次数(floating point operations，FLOPs)指标远小于对比算法。

1 注意力特征选择循环网络

图 1展示了本文提出的轻量级注意力特征选择循环网络(attention feature selection recursive network, AFSNet)结构，包含浅层特征提取、深层特征提取和上采样重建3个阶段。其中，FSM(feature selection module)为循环使用的特征选择单元，Upsample为共享参数的上采样模块。

图 1 AFSNet结构

Fig. 1 Structure of AFSNet

在浅层特征提取阶段，网络输入的低分辨率图像${\mathit{\boldsymbol{I}}_{{\rm{LR}}}} $经过浅层特征提取器提取出浅层特征${\mathit{\boldsymbol{F}}_{{\rm{lf}}}} $，即

$ {\mathit{\boldsymbol{F}}_{{\rm{lf}}}} = {f_{{\rm{le}}}}({\mathit{\boldsymbol{I}}_{{\rm{LR}}}}) $

(1)

式中，${f_{{\rm{le}}}}$为浅层特征提取操作，包含一个卷积核大小为3×3卷积层。

在深层特征提取阶段，上阶段提取的浅层特征通过多次循环的特征选择单元${h_{{\rm{FS}}}}$学习非线性映射，并逐步提取每次循环的图像深层特征${\mathit{\boldsymbol{F}}_1}, \cdots, {\mathit{\boldsymbol{F}}_n}, \cdots, {\mathit{\boldsymbol{F}}_N} $，即

$ {\mathit{\boldsymbol{F}}_n} = {h_{{\rm{FS}}}}({\mathit{\boldsymbol{F}}_{n - 1}}), n = 2, \cdots, N $

(2)

式中，${\mathit{\boldsymbol{F}}_1} = {h_{{\rm{FS}}}}({\mathit{\boldsymbol{F}}_{{\rm{lf}}}}) $，$n $为特征选择单元的循环次数，$ N$为总次数。FSM先通过特征增强模块保留每个卷积层的少部分信息，之后进行融合，最后将得到的特征送入高效通道注意力模块(efficient channel attention，ECA)。ECA作为RCAN(residual channel attention networks)(Zhang等，2018a)中通道注意力模块的轻量版本，在减少参数量的前提下，可以更好地自适应调整各通道特征的强弱。

值得注意的是，本文借助循环机制实现FSM的多次循环，从而极大减少了网络的参数量。该阶段的最后，通过多阶段特征融合机制(multi-stage feature fusion，MSFF)聚合每次FSM的输出，得到最终提取出的深层特征${\mathit{\boldsymbol{F}}_{{\rm{hf}}}} $，有利于充分融合并获取丰富的多阶段上下文信息，即

$ {\mathit{\boldsymbol{F}}_{{\rm{hf}}}} = {f_{{\rm{MF}}}}([{\mathit{\boldsymbol{F}}_1}, {\mathit{\boldsymbol{F}}_2}, \cdots {\mathit{\boldsymbol{F}}_N}]) $

(3)

式中，${f_{{\rm{MF}}}} $为多阶段特征融合操作。

在上采样重建阶段，不同于其他方法，本文对提取出的浅层特征和深层特征分别使用了一个共享参数的上采样模块${f_{{\rm{up}}}}$，融合高低频信息，得到超分后的图像${\mathit{\boldsymbol{I}}_{{\rm{SR}}}} $，实现性能良好的重建。具体为

$ {\mathit{\boldsymbol{I}}_{{\rm{SR}}}} = {f_{{\rm{up}}}}({\mathit{\boldsymbol{F}}_{{\rm{lf}}}}) + {f_{{\rm{up}}}}({\mathit{\boldsymbol{F}}_{{\rm{hf}}}}) $

(4)

1.1 特征选择单元(FSM)

图 2展示了特征选择单元的结构。受IDN算法(Hui等，2018)的启发，考虑到每个卷积层中能提取到有用信息的通道只占少部分，大多数通道还需要继续进行非线性映射学习提取出表征力更强的特征，本文进行多个卷积层的通道分离，融合多个卷积层中占少部分但表征能力更强的特征。

图 2 特征选择单元

Fig. 2 Feature selection module

如图 2所示，对于第$n $次循环的特征选择单元，首先采用一个3×3的卷积提取上一个模块的输出特征，然后，在接下来的每一步，都对得到的特征进行通道分离(即图 2中的Channel split)，一部分保留，另一部分继续进行下一次的卷积操作。整个过程可以表示为

$ \begin{array}{c} \{ \mathit{\boldsymbol{F}}_{{\rm{re}}\_1}^n, \mathit{\boldsymbol{F}}_{\rm{nt}\_1}^n\} = spli{t_1}({f_{{\rm{conv}}\_1}}({\mathit{\boldsymbol{F}}_{n - 1}}))\\ \{ \mathit{\boldsymbol{F}}_{{\rm{re}}\_2}^n, \mathit{\boldsymbol{F}}_{\rm{nt}\_2}^n\} = spli{t_2}({f_{{\rm{conv}}\_2}}(\mathit{\boldsymbol{F}}_{_{{\rm{nt}}\_1}}^n))\\ \{ \mathit{\boldsymbol{F}}_{{\rm{re}}\_3}^n, \mathit{\boldsymbol{F}}_{\rm{nt}\_3}^n\} = spli{t_3}({f_{{\rm{conv}}\_3}}(\mathit{\boldsymbol{F}}_{_{{\rm{nt}}\_2}}^n))\\ \mathit{\boldsymbol{F}}_{{\rm{re}}\_4}^n = {f_{{\rm{conv}}\_4}}(\mathit{\boldsymbol{F}}_{{\rm{nt}}\_3}^n) \end{array} $

(5)

式中，保留下来的特征记为$\mathit{\boldsymbol{F}}_{{\rm{re}}}^n $，进行下一层卷积操作的特征记为$\mathit{\boldsymbol{F}}_{{\rm{nt}}}^n $，$spli{t_j} $表示第$j $次分离操作，${f_{{\rm{conv}}\_i}} $表示第$i $次卷积操作，接着将各阶段保留的特征按通道拼接(即图 2中Concat)进行卷积融合操作$ {f_{{\rm{conv}}}}$，具体为

$ {\mathit{\boldsymbol{F}}_{{\rm{re}}}} = {f_{{\rm{conv}}}}([\mathit{\boldsymbol{F}}_{{\rm{re}}\_1}^n, \mathit{\boldsymbol{F}}_{{\rm{re}}\_2}^n, \mathit{\boldsymbol{F}}_{{\rm{re}}\_3}^n, \mathit{\boldsymbol{F}}_{{\rm{re}}\_4}^n]) $

(6)

然后，将融合后的特征${\mathit{\boldsymbol{F}}_{{\rm{re}}}} $通过高效注意力模块($\rm{ECA} $)自适应地调整通道，得到FSM的残差输出特征$\mathit{\boldsymbol{F}}_{{\rm{res}}}^n $，即

$ \mathit{\boldsymbol{F}}_{{\rm{res}}}^n = {f_{{\rm{ECA}}}}({\mathit{\boldsymbol{F}}_{{\rm{re}}}}) $

(7)

最后，残差特征与输入特征相加构成第$n $次循环的特征选择单元的输出特征，即

$ {\mathit{\boldsymbol{F}}_n} = \mathit{\boldsymbol{F}}_{{\rm{res}}}^n + {\mathit{\boldsymbol{F}}_{n - 1}} $

(8)

图 3展示了$\rm{ECA} $模块的结构，其对输入特征空间进行注意力求解，并自适应地调整通道之间重要度关系。Hu等人(2018)在SENet(squeeze-and-excitation network)中提出了通道注意力模块，首先将特征图压缩，得到通道级全局特征，再对该全局特征进行激发操作，通过两个全连接层学习通道间特征之间的关系，最后使用sigmoid函数得到各通道的权重系数，使原始特征图中各通道信息更有辨别能力。RCAN(Zhang等，2018a)中的通道注意力模块沿用该思想，并运用两个具有非线性的1×1的卷积层代替原始的全连接层，从而捕捉通道间的非线性交互。尽管1×1卷积层的使用大幅减少了参数量，但其中非线性的学习都是通过降维和升维操作，而此类维度降低策略会对预测通道权重带来副作用(Wang等，2020)。受此启发，本文引入高效通道注意力机制ECA，ECA避免了维度降低，同时可以高效地捕捉通道之间的交互关系。$\rm{ECA} $通过考虑每个通道及其$k $个邻域来捕获局部跨通道交互信息。如图 3所示，仅需要卷积核大小为$k $的1维卷积(1Dconv-$k $)就能实现，其中，$k $代表局部跨通道交互的覆盖率，即该通道附近有多少个邻居参与了这个通道的注意力预测，并且$k $的大小与通道数C成正比。最后，由sigmoid函数得到各通道的权重，逐元素乘上输入特征。与不带ECA模块的网络相比，加入$\rm{ECA} $模块仅引入极少的额外参数和几乎可以忽略的计算，但网络性能得到极大增强。

图 3 高效注意力模块

Fig. 3 Efficient channel attention module

1.2 上采样模块

如图 1所示，本文设计了一个共享参数的上采样模块(upsample block)，由一个卷积层和一个亚像素卷积层构成，直接将浅层特征和深层特征同时上采样重建高分图像，最后生成的高分图像富含图像平滑区域和高频区域的信息。

2 实验设置和结果分析

2.1 实验设置

本文采用的训练数据集为由800幅DIV2K(Timofte等，2016)和2 650幅Flickr2K构成的DF2K数据集(Timofte等，2017)，并通过随机水平翻转和90°旋转进行数据增强，对应的低分辨率图像通过对高分辨率图像双三次下采样获得(采样倍数为×2，×3，×4)。测试数据集包括Set5(Bevilacqua等，2012)、Set14(Yang等，2010)、B100(Huang等，2015)、Urban100(Martin等，2001)和Manga109(Matsui等，2017)等5个基准数据集。采用PSNR和SSIM作为衡量重建性能好坏的评价指标。遵循RDN(Zhang等，2018b)中的评价标准，本文转换到YCbCr空间的Y通道上测量指标，并且裁剪图像边缘的4个像素。

输入图像块的尺寸设置为48×48像素，最小训练批次大小设置为16。在深层特征提取阶段，设置了6次循环的特征选择单元，FSM中每个卷积层通道数为64，设置每次通道分离时保留16个通道的特征，剩下的48个通道信息继续进行下一次卷积。网络参数使用ADAM优化器(Kingma和Ba，2014)和L1损失函数进行优化。初始学习率设置为2E-4，并且每200个epoch学习率减半。利用PyTorch框架(Paszke等，2017)在1块GeForce RTX 2080 Ti GPU上进行模型训练与测试。

2.2 结果分析

将本文算法与SRCNN(Dong等，2014)、FSRCNN (fast super-resolution convolutional neural network)(Dong等，2016)、VDSR(Kim等，2016a)、DRCN(Kim等，2016b)、LapSRN(deep Laplacian pyramid networks for fast and accurate super-resolution)(Lai等，2017)、MemNet(Tai等，2017b)、SRFBN-S(feedback network for image super-resolution)(Li等，2019)、IDN(Hui等，2018)和CARN-M(cascading residual network)(Ahn等，2018)等先进的轻量级图像超分算法进行比较，2倍、3倍和4倍的超分结果分别如表 1—表 3所示。可以看出，本文方法在几乎所有基准测试集上的PSNR和SSIM指标(Wang等，2004)均达到最优，在3倍和4倍的超分结果中均优于其他算法，仅在2倍超分的Set5数据集中略低于IDN。与指标次好的算法相比，本文算法的网络参数量更少且计算量更小。以4倍超分重建为例，本文算法的参数量为255 K，比SRFBN-S、IDN和CARN-M分别少47 %、53 %和38 %。计算量FLOPs为24.5 G, 比IDN和CARN-M减少近10 G，而SRFBN-S在参数量很少的情况下却有高达852.9 G FLOPs的计算复杂度。

表 1 不同方法在5个数据集上的2倍超分结果
Table 1 Quantitative results of the evaluated methods on five datasets for scaling factors two

下载CSV

模型	参数量/K	FLOPs/G	Set5		Set14		B100		Urban100		Manga109
模型	参数量/K	FLOPs/G	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	-	-	33.66	0.929 9	30.24	0.868 8	29.56	0.843 1	26.88	0.840 3	30.80	0.933 9
SRCNN	57	57.2	36.66	0.954 2	32.45	0.906 7	31.36	0.887 9	29.50	0.894 6	35.60	0.966 3
FSRCNN	12	6.0	37.00	0.955 8	32.63	0.908 8	31.53	0.892 0	29.88	0.902 0	36.67	0.971 0
VDSR	665	612.6	37.53	0.959 0	33.05	0.913 0	31.90	0.896 0	30.77	0.914 0	37.22	0.975 0
DRCN	1 774	9 788.7	37.63	0.958 8	33.04	0.911 8	31.85	0.894 2	30.75	0.913 3	37.55	0.973 2
LapSRN	813	29.9	37.52	0.959 1	33.08	0.913 0	31.08	0.895 0	30.41	0.910 1	37.27	0.974 0
MemNet	677	623.9	37.78	0.959 7	33.28	0.914 2	32.08	0.897 8	31.31	0.919 5	37.72	0.974 0
SRFBN-S	282	574.4	37.78	0.959 7	33.35	0.915 6	32.00	0.897 0	31.41	0.920 7	38.06	0.975 7
IDN	553	127.7	37.83	0.960 0	33.30	0.914 8	32.08	0.898 5	31.27	0.919 6	38.01	0.974 9
CARN-M	412	91.2	37.53	0.958 3	33.26	0.914 1	31.92	0.896 0	31.23	0.919 3	37.76	0.974 3
本文	234	68.3	37.78	0.959 0	33.45	0.915 8	32.08	0.897 9	31.70	0.923 7	38.45	0.975 9
注：加粗字体表示各列最优结果，加下划线字体表示各列次优结果, “-”表示无数值。

表 2 不同方法在5个数据集上的3倍超分结果
Table 2 Quantitative results of the evaluated methods on five datasets for scaling factors three

下载CSV

模型	参数量/K	FLOPs/G	Set5		Set14		B100		Urban100		Manga109
模型	参数量/K	FLOPs/G	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	-	-	30.39	0.868 2	27.55	0.774 2	27.21	0.738 5	24.46	0.734 9	26.95	0.855 6
SRCNN	57	52.7	32.75	0.909 0	29.30	0.821 5	28.41	0.786 3	26.24	0.798 9	30.48	0.911 7
FSRCNN	12	5.0	33.16	0.914 0	29.43	0.824 2	28.53	0.791 0	26.43	0.808 0	31.10	0.921 0
VDSR	665	612.6	33.67	0.921 0	29.78	0.832 0	28.83	0.799 0	27.14	0.829 0	32.01	0.934 0
DRCN	1 774	9 788.7	33.82	0.922 6	29.76	0.831 1	28.80	0.796 3	27.14	0.827 9	32.24	0.934 3
MemNet	677	623.9	34.09	0.924 8	30.01	0.835 0	28.96	0.800 1	27.56	0.837 6	32.51	0.936 9
SRFBN-S	375	586.4	34.20	0.925 5	30.10	0.837 2	28.96	0.801 0	27.66	0.841 5	33.02	0.940 4
IDN	553	57.0	34.11	0.925 3	29.99	0.835 4	28.95	0.801 3	27.42	0.835 9	32.71	0.938 1
CARN-M	412	46.1	33.99	0.923 6	30.08	0.836 7	28.91	0.800 0	27.55	0.838 5	32.86	0.938 9
本文	243	31.3	34.16	0.924 6	30.18	0.838 3	28.99	0.801 8	27.79	0.844 1	33.29	0.941 5
注：加粗字体表示各列最优结果，加下划线字体表示各列次优结果，“-”表示无数值。

表 3 不同方法在5个数据集上的4倍超分结果
Table 3 Quantitative results of the evaluated methods on five datasets for scaling factors four

下载CSV

模型	参数量/K	FLOPs/G	Set5		Set14		B100		Urban100		Manga109
模型	参数量/K	FLOPs/G	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	-	-	28.42	0.810 4	26.00	0.702 7	25.96	0.667 5	23.14	0.657 7	24.89	0.786 6
SRCNN	57	52.7	30.48	0.862 8	27.50	0.751 3	26.90	0.710 1	24.52	0.722 1	27.58	0.855 5
FSRCNN	12	4.6	30.71	0.865 7	27.59	0.753 5	26.98	0.715 0	24.62	0.728 0	27.90	0.861 0
VDSR	665	612.6	31.35	0.883 0	28.02	0.768 0	27.29	0.726 0	25.18	0.754 0	28.83	0.887 0
DRCN	1 774	9 788.7	31.53	0.885 4	28.02	0.767 0	27.23	0.723 3	25.18	0.752 4	28.93	0.885 4
LapSRN	813	149.4	31.54	0.885 0	28.19	0.772 0	27.32	0.727 0	25.21	0.756 0	29.09	0.890 0
MemNet	677	623.9	31.74	0.889 3	28.26	0.772 3	27.40	0.728 1	25.50	0.763 0	29.42	0.894 2
SRFBN-S	483	852.9	31.98	0.892 3	28.45	0.777 9	27.44	0.731 3	25.71	0.771 9	29.91	0.900 8
IDN	553	32.3	31.82	0.890 3	28.25	0.773 0	27.41	0.729 7	25.41	0.763 2	29.41	0.894 2
CARN-M	412	32.5	31.92	0.890 3	28.42	0.776 2	27.44	0.730 4	25.62	0.769 4	29.86	0.899 6
本文	255	24.5	32.04	0.892 0	28.85	0.792 7	27.50	0.732 6	25.85	0.777 3	30.24	0.903 9
注：加粗字体表示各列最优结果，加下划线字体表示各列次优结果，“-”表示无数值。

图 4展示了各类算法在4倍超分情况下的视觉结果图，从上至下依次为Set14数据集中的barbara图像、B100数据集中的8023图像、Urban100数据集中的img_092图像和Manga109数据集中的MukoukizuNoChonbo图像。显而易见，本文算法更能重建出视觉更清晰、细节更详细的高分辨率图像。如Set14数据集中的barbara图像，其他算法都模糊且错误地恢复了书本的线条，只有本文算法正确且清晰地重建，同样情况也出现在B100数据集中的8023图像和Urban100数据集中的img_092图像上，本文算法非常清晰地重建出了小鸟翅膀的纹理和建筑大楼的外墙线条。对Manga109数据集中的MukoukizuNoChonbo图像，其他算法重建出的字母模糊无法辨别清楚，而本文算法清晰地恢复了字母。

图 4 不同算法在4倍超分情况下的视觉结果

Fig. 4 Qualitative results of the evaluated methods on five datasets for scaling factors four

((a)original image; (b)HR; (c)VDSR; (d)IDN; (e) CARN-M; (f)SRFBN-S; (g)ours)

2.3 消融实验

为验证AFSNet中各模块的作用，对ECA模块、多阶段特征融合(MSFF)机制以及共享上采样(upsample)模块进行消融实验，表 4为不同方法在Set5数据集上进行4倍超分的结果。

表 4 AFSNet在Set5数据集上的消融实验
Table 4 Ablative study of AFSNet on Set5 dataset

下载CSV

实验	方法	参数量	PSNR/dB	SSIM
1	AFSNet+Upsample	230 768	31.83	0.889 5
2	AFSNet+Upsample+ECA	230 771	31.85	0.889 9
3	AFSNet+Upsample+MSFF	255 408	31.95	0.890 7
4	AFSNet+ECA+MSFF	283 107	31.97	0.891 3
5	AFSNet+Upsample+ECA+MSFF	255 411	32.04	0.892 0
注：加粗字体为各列最优结果。

从表 4可以看出，1)图 1中每个模块的输出都在最后进行特征融合，去掉这一机制，实验1与实验2相比，性能下降了0.2 dB，主要是因为每一个阶段网络都会提取不同层次的特征，当多阶段层次性特征在最后融合后更有利于重建出富含高低频信息的高质量图像。2)实验3与实验5相比，去掉ECA模块仅减少3个参数量，但是PSNR指标下降近0.1 dB，表明ECA模块在没有增加网络负担的同时大幅提升了重建性能。3)实验4与实验5相比，不使用相同的上采样模块时，模型的参数量增加了近3万，但结果却比上采样模块共享参数时下降了0.07 dB。因此，上采样模块的共享机制有助于模型的轻量化。

3 结论

本文提出了一种极轻量的注意力特征选择循环网络的图像超分重建算法，实现了参数量少、计算量少且准确的超分重建。首先通过一个卷积层提取低分辨率图像的浅层特征，然后通过多个递归的特征选择单元提取图像的深层特征，该模块保留并融合每个卷积的部分表征能力强的特征，并由高效通道注意力模块重新聚合各通道特征。最后分别将浅层特征和深层特征通过共享的上采样模块重建出具有丰富高低频信息的高分辨率图像。实验表明，本文提出的算法在5个基准测试数据集上超过了目前先进的轻量级超分算法，并大幅减少了参数量和计算量。

然而，现有的很多超分算法(李现国等，2018；李宇翔等，2018；沈明玉等，2019；佟骏超等，2019)都是在合成数据集上取得较好的重建性能，但真实的低分辨率图像往往都会存在模糊或噪声等情况，如何对这类真实图像超分重建是一个亟待解决的问题。在后续工作中，将进一步研究真实场景的超分，针对真实低分辨率图像中的模糊噪声叠加问题设计解决方案，继续改善算法性能。

参考文献

Ahn N, Kang B and Sohn K A. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference. Munich, Germany: Springer: 256-272[DOI: 10.1007/978-3-030-01249-6_16]

Bevilacqua M, Roumy A, Guillemot C and Morel M L A. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding//Proceedings of 2012 British Machine Vision Conference. Guildford, UK: BMVC: 135[DOI: 10.5244/C.26.135]

Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference. Zurich, Switzerland: Springer: 184-199[DOI: 10.1007/978-3-319-10593-2_13]

Dong C, Loy C C, He K M, Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307 [DOI:10.1109/TPAMI.2015.2439281]

Gu J J, Lu H N, Zuo W M and Dong C. 2019. Blind super-resolution with iterative kernel correction//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 1604-1613[DOI: 10.1109/CVPR.2019.00170]

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/CVPR.2018.00745]

Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 5197-5206[DOI: 10.1109/CVPR.2015.7299156]

Hui Z, Gao X B, Yang Y C and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 2024-2032[DOI: 10.1145/3343031.3351084]

Hui Z, Wang X M and Gao X B. 2018. Fast and accurate single image super-resolution via information distillation network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 723-731[DOI: 10.1109/cvpr.2018.00082]

Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182]

Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/CVPR.2016.181]

Kingma D P and Ba J L. 2014. Adam: a method for stochastic optimization[EB/OL]. [2020-08-20]. https://arxiv.org/pdf/1412.6980v8.pdf

Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5835-5843[DOI: 10.1109/CVPR.2017.618]

Li J C, Fang F M, Mei K F and Zhang G X. 2018. Multi-scale residual network for image super-resolution//Proceedings of the 15th European Conference. Munich, Germany: Springer: 527-542[DOI: 10.1007/978-3-030-01237-3_32]

Li X G, Sun Y M, Yang Y L, Miao C Y. 2018. Image super-resolution reconstruction based on intermediate supervision convolutional neural networks. Journal of Image and Graphics, 23(7): 984-993 (李现国, 孙叶美, 杨彦利, 苗长云. 2018. 基于中间层监督卷积神经网络的图像超分辨率重建. 中国图象图形学报, 23(7): 984-993) [DOI:10.11834/jig.170538]

Li Y X, Deng H P, Xiang S, Wu J, Zhu L. 2018. Depth map super-resolution reconstruction based on the texture edge-guided approach. Journal of Image and Graphics, 23(10): 1508-1517 (李宇翔, 邓慧萍, 向森, 吴谨, 朱磊. 2018. 纹理边缘引导的深度图像超分辨率重建. 中国图象图形学报, 23(10): 1508-1517) [DOI:10.11834/jig.180127]

Li Z, Yang J L, Liu Z, Yang X M, Jeon G and Wu W. 2019. Feedback network for image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 3862-3871[DOI: 10.1109/CVPR.2019.00399]

Lim B, Son S Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151]

Martin D, Fowlkes C, Tal D and Malik J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, Canada: IEEE: 416-423[DOI: 10.1109/ICCV.2001.937655]

Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, Aizawa K. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76(20): 21811-21838 [DOI:10.1007/s11042-016-4020-z]

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z M, Desmaison A, Antiga L and Lerer A. 2017. Automatic differentiation in PyTorch[EB/OL]. [2020-08-21]. https://openreview.net/pdf?id=BJJsrmfCZ

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-08-20]. https://arxiv.org/pdf/1409.1556.pdf

Shen M Y, Yu P F, Wang R G, Yang J, Xue L X. 2019. Image super-resolution reconstruction via deep network based on multi-staged fusion. Journal of Image and Graphics, 24(8): 1258-1269 (沈明玉, 俞鹏飞, 汪荣贵, 杨娟, 薛丽霞. 2019. 多阶段融合网络的图像超分辨率重建. 中国图象图形学报, 24(8): 1258-1269) [DOI:10.11834/jig.180619]

Shocher A, Cohen N and Irani M. 2018. Zero-shot super-resolution using deep Internal learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3118-3126[DOI: 10.1109/cvpr.2018.00329]

Soh J W, Cho S and Cho N I. 2020. Meta-transfer learning for zero-shot super-resolution[EB/OL]. [2020-02-27]. https://arxiv.org/pdf/2002.12213.pdf

Tai Y, Yang J and Liu X M. 2017a. Image super-resolution via deep recursive residual network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 2790-2798[DOI: 10.1109/CVPR.2017.298]

Tai Y, Yang J, Liu X M and Xu C Y. 2017b. MemNet: a persistent memory network for image restoration//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 4549-4557[DOI: 10.1109/ICCV.2017.486]

Timofte R, Agustsson E, van Gool L, Yang M H, Zhang L, Lim B, Son S, Kim H, Nah S, Lee K M, Wang X T, Tian Y P, Yu K, Zhang Y L, Wu S X, Dong C, Lin L, Qiao Y, Loy C C, Bae W, Yoo J, Han Y, Ye J C, Choi J S, Kim M, Fan Y C, Yu J H, Han W, Liu D, Yu H C, Wang Z Y, Shi H H, Wang X C, Huang T S, Chen Y J, Zhang K, Zuo W M, Tang Z M, Luo L K, Li S H, Fu M, Cao L, Heng W, Bui G, Le T, Duan Y, Tao D C, Wang R X, Lin X, Pang J X, Xu J C, Zhao Y, Xu X Y, Pan J S, Sun D Q, Zhang Y J, Song X B, Dai Y C, Qin X Y, Huynh X P, Guo T T, Mousavi H S, Vu T H, Monga V, Cruz C, Egiazarian K, Katkovnik V, Mehta R, Jain A K, Agarwalla A, Praveen C V S, Zhou R F, Wen H D, Zhu C, Xia Z Q, Wang Z T and Guo Q. 2017. Ntire 2017 challenge on single image super-resolution: methods and results//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 1110-1121[DOI: 10.1109/CVPRW.2017.149]

Timofte R, Rothe R and van Gool L. 2016. Seven ways to improve example-based single image super resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1865-1873[DOI: 10.1109/CVPR.2016.206]

Tong J C, Fei J L, Chen J S, Li H, Ding D D. 2019. Multi-level feature fusion image super-resolution algorithm with recursive neural network. Journal of Image and Graphics, 24(2): 302-312 (佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹. 2019. 递归式多阶特征融合图像超分辨率算法. 中国图象图形学报, 24(2): 302-312) [DOI:10.11834/jig.180410]

Wang C F, Li Z and Shi J. 2019. Lightweight image super-resolution with adaptive weighted learning network[EB/OL]. [2020-08-20]. https://arxiv.org/pdf/1904/1904.02358.pdf

Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M and Hu Q H. 2020. ECA-Net: efficient channel attention for deep convolutional neural networks[EB/OL]. [2020-08-20]. https://arxiv.org/pdf/1910.03151.pdf

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI:10.1109/TIP.2003.819861]

Yang J C, Wright J, Huang T S, Ma Y. 2010. Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11): 2861-2873 [DOI:10.1109/TIP.2010.2050625]

Zhang K, van Gool L and Timofte R. 2020. Deep unfolding network for image super-resolution[EB/OL]. [2020-08-20]. https://arxiv.org/pdf/2003.10428.pdf

Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018a. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18]

Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018b. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[DOI: 10.1109/cvpr.2018.00262]