发布时间: 2021-12-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200582
2021 | Volume 26 | Number 12

图像处理和编码

区域级通道注意力融合高频损失的图像超分辨率重建

周波¹, 李成华², 陈伟¹

1. 江南大学人工智能与计算机学院, 无锡 214000;

2. 中国科学院自动化研究所, 北京 100019

收稿日期: 2020-10-13; 修回日期: 2020-12-09; 预印本日期: 2020-12-16

基金项目: 国家重点研发计划资助（2017YFB0202303）；国家自然科学基金项目（61602213，61772013）

作者简介: 周波, 1996年生, 男, 硕士研究生, 主要研究方向为深度学习和图像处理。E-mail: 6181611033@stu.jiangnan.edu.cn
李成华, 男, 助理研究员, 主要研究方向为深度学习和计算机视觉。E-mail: lichenghua2014@ia.ac.cn
陈伟, 通信作者, 男, 副教授, 主要研究方向为计算机图形学和智能信息处理。E-mail: 8313120015@jiangnan.edu.cn
*通信作者: 陈伟 8313120015@jiangnan.edu.cn

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2021)12-2836-12

摘要

目的通道注意力机制在图像超分辨率中已经得到了广泛应用，但是当前多数算法只能在通道层面选择感兴趣的特征图而忽略了空间层面的信息，使得特征图中局部空间层面上的信息不能合理利用。针对此问题，提出了区域级通道注意力下的图像超分辨率算法。方法设计了非局部残差密集网络作为网络的主体结构，包括非局部模块和残差密集注意力模块。非局部模块提取非局部相似信息并传到后续网络中，残差密集注意力模块在残差密集块结构的基础上添加了区域级通道注意力机制，可以给不同空间区域上的通道分配不同的注意力，使空间上的信息也能得到充分利用。同时针对当前普遍使用的L1和L2损失函数容易造成生成结果平滑的问题，提出了高频关注损失，该损失函数提高了图像高频细节位置上损失的权重，从而在后期微调过程中使网络更好地关注到图像的高频细节部分。结果在4个标准测试集Set5、Set14、BSD100（Berkeley segmentation dataset）和Urban100上进行4倍放大实验，相比较于插值方法和SRCNN（image super-resolution using deep convolutional networks）算法，本文方法的PSNR（peak signal to noise ratio）均值分别提升约3.15 dB和1.58 dB。结论区域级通道注意力下的图像超分辨率算法通过使用区域级通道注意力机制自适应调整网络对不同空间区域上通道的关注程度，同时结合高频关注损失加强对图像高频细节部分的关注程度，使生成的高分辨率图像具有更好的视觉效果。

关键词

深度学习; 卷积神经网络(CNN); 超分辨率; 注意力机制; 非局部神经网络

Region-level channel attention for single image super-resolution combining high frequency loss

Zhou Bo¹, Li Chenghua², Chen Wei¹

1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China;

2. Institute of Automation, Chinese Academy of Sciences, Beijing 100019, China

Supported by: National Key R＆D Program of China (2017YFB0202303); National Natural Science Foundation of China (61602213, 61772013)

Abstract

Objective As an important branch of image processing, image super-resolution has attracted extensive attention of many scholars. The attention mechanism has originally been applied to machine translation in deep learning. As an extension of the attention mechanism, the channel attention mechanism has been widely used in image super-resolution. A single image super-resolution using region-level channel attention has been proposed. A region-level channel attention mechanism has been presented in the network, which can assign different attention to different channels in different regions. Meanwhile, the high-frequency aware loss has been demonstrated with aiming at the characteristics that L1 and L2 losses commonly used at present tend to produce very smooth results. This loss function has strengthened the weight of losses at high-frequency positions to the generation of high-frequency details. Method The network structure has consisted of three parts: low-level feature extraction, high-level feature extraction and image reconstruction. In the low-level feature extraction part, the algorithm used 1 layer of 3×3 convolution. The high-level feature extraction part has contained a non-local module and several residual dense block attention modules. The non-local module has extracted the non-local similarity information via the non-local operation. Sub-pixel convolutional layer has been used before calculating non-local similar information. The calculation has been conducted at low resolution. Dense connection has been used in the residual dense block attention modules to facilitate the network adaptive accumulation of features of different layers. Meanwhile, residual learning has been used to further optimize the gradient propagation problem. Region-level channel attention mechanism has been introduced to pay attention to information in different regions adaptively. The initial non-local similarity information has been added to the last layer by skip connection. In the image reconstruction part, sub-pixel convolution has been used to up-sampling operation on features and a 3×3 convolutional layer has been used to obtain the final reconstruction result. In terms of loss function, the high-frequency aware loss has been operated for enhancing the network's ability of reconstructing high frequency details. Before training, the locations of high-frequency details in the image have been extracted. During training, more weight has been added to the losses at these locations to better learn the reconstruction process of high-frequency details. The whole training process has been divided into two stages. In the first stage, L1 loss has used to train the network. In the second stage, the high-frequency aware loss and L1 loss has used to fine-tune the model of the first stage together. Result Region-level channel attention and the high-frequency aware loss have been verified via ablation study. The model using the region-level channel attention is significantly better on peak signal to noise ratio (PSNR). The high-frequency aware loss and L1 loss together to fine-tune the model is better on PSNR than the model only use L1 loss to fine-tune. The good effect of the region-level channel attention and the high-frequency aware loss have been verified both at the same time. Set5, Set14, Berkeley segmentation dataset (BSD100) and Urban100 have been selected for testing in comparison with other algorithms. The comparison algorithms have included Bicubic, image super-resolution using deep convolutional networks (SRCNN), accurate image super-resolution using very deep convolutional networks (VDSR), image super-resolution using very deep residual channel attention networks (RCAN), feedback network for image super-resolution (SRFBN) and single image super-resolution via a holistic attention network (HAN) respectively. On the subjective effect of the present, the results with a factor of 4, three of the results have been selected for display. The results generated by the algorithm have presented more rich in texture without any blurring or distortion. In the presentation of objective indicators, PSNR and structural similarity (SSIM) have been used as indicators to make a comprehensive comparison under three different factors of 2, 3 and 4, respectively. PSNR of the model with amplification factor of 4 under four standard test sets is 32.51 dB, 28.82 dB, 27.72 dB and 26.66 dB, respectively. Conclusion A super-resolution algorithm using region-level channel attention mechanism has been commonly used channel attention in region-level. Based on the high-frequency aware loss, the network can reconstruct high frequency details by increasing the attention degree of the network to the high frequency detail location. The experimental results have shown that the proposed algorithm has its priority in objective indicators and subjective effects via using region-level channel attention mechanism and high-frequency aware loss.

Key words

deep learning; convolutional neural network(CNN); super-resolution; attention mechanism; non-local neural network

0 引言

图像超分辨率重建的目的是将低分辨率图像变为高分辨率图像，已广泛应用于计算机视觉领域，如医学图像分析(Shi等，2013)、目标识别(Sajjadi等，2017)等。然而，图像超分辨率重建是一个经典的反问题，针对任意一幅低分辨率图像都存在无数幅与之对应的高分辨率图像。为了更好地解决这一问题，提出了大量基于学习的方法，用于学习低分辨率图像到高分辨率图像的映射。

基于深度学习的方法相比于传统算法性能取得了显著提升。Dong等人(2015)首次在图像超分辨率任务中引入了3层卷积神经网络。Kim等人(2016a, 2016b)在VDSR(accurate image super-resolution using very deep convolutional networks)和DRCN(deeply-recursive convolutional network for image super-resolution)中将网络深度提升到了20层，并且相比于SRCNN(image super-resolution using deep convolutional networks)取得了视觉效果和指标上的明显提升。在许多视觉任务中，网络深度对于模型性能至关重要。He等人(2016)首次使用残差学习训练深度网络并取得了卓越性能。Lim等人(2017)在EDSR(enhanced deep residual networks for single image super-resolution)中对普遍使用的残差结构进行改进，移除了其中的批归一化层(batch normalization, BN)(Ioffe和Szegedy，2015)，使网络性能得到极大提升。Zhang等人(2018a)受EDSR的启发，提出残差密集网络(residual dense network for image super-resolution，RDN)，将残差结构与密集连接进行结合，设计了残差密集块，在超分辨率算法中得到了广泛应用(Ma等，2019；Anwar和Barnes，2020)。

另一方面，注意力机制(Vaswani等，2017)在计算机视觉任务中的应用越来越广泛(Niu等，2020)。Wang等人(2018b)提出了非局部神经网络，将某一位置的响应计算为这一位置和其他所有空间位置上关系的加权和，Liu等人(2018)在图像重建任务中应用了这一模块。Wang等人(2018a)提出了一种无参的空间注意力机制(spatial attention, SA)用于行人重识别，更加关注空间位置上提供的有用信息。随后许多算法也将其应用到超分辨率任务中(Xing和Zhang，2019；Soh和Cho，2020)。此外，还有其他算法尝试探索通道间的相关性。Hu等人(2018)提出SENet，使用一种SE(squeeze-and-excitation)模块，使网络可以关注通道层面上的关系并在分类任务中取得了显著效果。Zhang等人(2018b)以及Duan和Xiao(2019)则进一步将通道注意力机制(channel attention, CA)应用到了超分辨率任务中。

通道注意力机制在超分辨率任务中已经获得了不错的表现，然而目前通常使用的CA存在一定的局限性。一般的CA对于一个通道所有空间区域上的关注程度都相同，导致可能忽略通道上某些区域的关键高频特征或放大次要的低频特征，不利于最终重建出高频细节丰富的图像。为此，提出了一种新颖的区域级通道注意力机制(region-level channel attention, RCA)，可以使网络对不同空间区域上的通道分配不同的注意力。同时提出了高频关注损失，用以增强网络对于高频细节的关注。

本文主要贡献如下：1)提出了区域级通道注意力机制。有别于以往通道注意力机制，RCA可以对不同的空间区域分别计算其通道维度上的注意力，而不是对整个通道都计算同一个注意力，这使得网络可以对通道中不同空间区域上的特征给予不同的关注，自适应地选择感兴趣的区域。2)设计了非局部残差密集结构来构建深度网络。通过非局部模块提取出非局部相似信息后，将其传到后续的残差密集注意力模块中，自适应地累积从非局部相似信息中提取的高级特征和低级特征，进一步强化特征的利用。3)提出了高频关注损失。训练时使网络重点关注高频细节位置上的损失，从而增强网络重建高频细节的能力。

1 相关工作

1.1 残差结构和密集连接

He等人(2016)提出了残差结构，通过跳跃连接将当前层的信息传递到其他层上，成功解决了深度网络下梯度传播困难的问题。同时，Huang等人(2017)在DenseNet(densely connected convolutional networks)中提出了密集连接结构，在当前层融合前面所有不同层的特征，并将自己的特征作为后续所有层的输入，有效缓解了梯度消失问题，增强了特征传播。Wang等人(2018c)受此启发，提出了ESRGAN(enhanced super-resolution generative adversarial networks)，在其生成器中成功将残差结构和密集连接应用到超分辨率任务中，性能得到显著提升。

为了提升网络性能，本文算法也使用残差结构和密集连接，并在此基础上设计了非局部残差密集结构，从而增强网络的特征表示能力。

1.2 通道注意力机制

注意力机制帮助深度神经网络决定关注的区域，并且强化感兴趣区域信息的利用。近年来，注意力机制逐渐变成了深度学习中的重要部分，并且已经广泛应用于计算机视觉任务中。其中，通道注意力机制通过建立特征通道间的相互依赖关系，自适应地对每个特征通道的权重进行重新分配。例如，Zhang等人(2018b)在实现通道注意力机制过程中，首先在空间维度上进行全局池化，即

$ {\mathit{\boldsymbol{z}}_c} = GP(\mathit{\boldsymbol{x}}) $

(1)

式中，$\mathit{\boldsymbol{x}} $表示输入的特征图，$ GP(\cdot)$表示全局池化操作，${\mathit{\boldsymbol{z}}_c} $表示输出，尺寸为1 × 1 × $c $，$c $表示通道数。然后使用两个全连接层，具体为

$ \mathit{\boldsymbol{z}} = {\rm{sigmoid}}({\mathit{\boldsymbol{F}}_{\rm{U}}}{\mathop{ Re}\nolimits} LU({\mathit{\boldsymbol{F}}_{\rm{D}}}({\mathit{\boldsymbol{z}}_c}))) $

(2)

式中，$ {\mathit{\boldsymbol{F}}_{\rm{D}}}$和${\mathit{\boldsymbol{F}}_{\rm{U}}} $表示全连接层，分别用于特征降维和升维，这样可以更好地拟合通道间复杂的依赖性；${\mathop{ Re}\nolimits} LU() $表示修正线性单元；$\mathit{\boldsymbol{z}} $表示每个通道对应的注意力。将注意力$\mathit{\boldsymbol{z}} $与输入的特征$ \mathit{\boldsymbol{x}}$相乘，得

$ {{\mathit{\boldsymbol{x'}}}_i} = {\mathit{\boldsymbol{z}}_i} \cdot {\mathit{\boldsymbol{x}}_i} $

(3)

式中，$ {\mathit{\boldsymbol{z}}_i}$和${\mathit{\boldsymbol{x}}_i} $分别表示第$i $个通道上的注意力和特征图。

许多超分辨率算法(Cao和Liu，2019；Lee等，2019；金炜和陈莹，2020)都使用了这种形式的通道注意力机制，然而只能在通道层面上选择感兴趣的特征图，在一定程度上限制了网络性能。因此，本文提出的区域级通道注意力机制在池化过程中对特征图上的不同区域分别进行池化操作，使网络可以自适应地关注不同通道和不同空间区域上的信息。

2 算法设计

2.1 网络结构

整体的网络结构如图 1所示，包括低级特征提取模块、高级特征提取模块和图像重建模块3部分。在低级特征提取模块中只用了一层3×3卷积，在高级特征提取部分使用非局部残差密集结构，包括一个非局部模块和多个残差密集注意力模块(residual dense block aten-tion module, RDBAM)。非局部模块(图 1上半部分蓝色虚线框)通过使用非局部操作(Wang等，2018b)实现非局部相似性信息提取，而在非局部操作中，与Vu等人(2018)方法一样，使用逆亚像素卷积层，在低分辨率下计算空间相似性信息，然后通过一层亚像素卷积层回到原来的尺寸，从而减少了计算量。图 1中${\mathit{\boldsymbol{W}}_{1 \times 1}} $代表 1×1卷积，$ \otimes $代表矩阵乘法操作，$ \oplus $代表特征相加操作。残差密集注意力模块则由残差密集结构(Wang等，2018c)以及区域级通道注意力机制组成。图像重建部分则使用亚像素卷积进行上采样，并在最后使用一层3×3卷积重构出最终的结果。

图 1 网络架构

Fig. 1 Network architecture

2.2 残差密集注意力模块

RDBAM由3个密集连接块和1个区域级通道注意力模块(RCA)组成，具体结构如图 2所示。与Wang等人(2018b)提出的结构类似，每个密集连接块由4个卷积层组成，其中，每一层的输出都会传递到后续所有层，每个RDBAM中第$ c$个密集连接块的第$i $层卷积输出为

$ {\mathit{\boldsymbol{D}}_{c, i}}(\mathit{\boldsymbol{x}}) = {\mathit{\boldsymbol{W}}_{c, i}}([{\mathit{\boldsymbol{D}}_{c, 1}}(\mathit{\boldsymbol{x}}), {\mathit{\boldsymbol{D}}_{c, 2}}(\mathit{\boldsymbol{x}}), \cdots {\mathit{\boldsymbol{D}}_{c, i - 1}}(\mathit{\boldsymbol{x}})]) $

(4)

图 2 残差密集注意力模块结构

Fig. 2 Architecture of residual dense block attention module

式中，$\mathit{\boldsymbol{x}} $表示输入的特征，${\mathit{\boldsymbol{W}}_{c, i}} $表示第$c $个密集连接块的第$i $个卷积核，这里使用的都是3×3的卷积核，[…]表示特征拼接操作。${\mathit{\boldsymbol{D}}_{c, i}} $代表第$ c$个密集连接块的第$i $层卷积输出。同时，与Lim等人(2017)和Wang等人(2018c)的方法一样，引入了残差缩放学习，即在残差学习过程中使每个密集连接块的输出都乘上一个缩放因子$\alpha $(如图 2所示)，从而使大型模型的训练更加稳定。模块最后是1个区域级通道注意力模块。

2.3 区域级通道注意力机制

使用通道注意力机制(CA)是为了关注通道中的高频特征，进而对特征图中的不同通道加以不同的关注。然而高频特征通常存在于不同的空间位置，仅关注通道层面可能会忽略许多高频信息，导致生成的结果缺乏高频细节。为了解决这个问题，研究特征图中不同区域下的特征在其通道维度上的相互依赖性，提出了区域级通道注意力机制，如图 3所示。

图 3 区域级通道注意力机制结构

Fig. 3 Architecture of region-level channel attention

为了使网络可以考虑到不同区域的信息，在进行注意力机制操作前，先将输入的特征在空间维度上分成$ {k \times k}$个不同的区域，并单独对每个区域上的通道分配不同的注意力，从而实现给同一通道上的不同区域分配不同的注意力。对给定的尺寸为$H \times W \times C $的输入$x $，首先经过两个池化操作，具体为

$ {\mathit{\boldsymbol{z}}_{\max }} = G{P_{\max }}(\mathit{\boldsymbol{x}}, k) $

(5)

$ {\mathit{\boldsymbol{z}}_{{\rm{mean}}}} = G{P_{{\rm{mean}}}}(\mathit{\boldsymbol{x}}, k) $

(6)

$ {\mathit{\boldsymbol{z}}_c} = {\mathit{\boldsymbol{z}}_{\max }} + {\mathit{\boldsymbol{z}}_{{\rm{mean}}}} $

(7)

式中，$G{P_{\max }} $和$ G{P_{{\rm{mean}}}}$分别表示最大值池化和均值池化，$k $表示池化后的尺寸(即对每个子区域分别进行全局池化)，与Woo等人(2018)的方法一样，本文使用两种不同的池化方式来更全面地聚合区域信息，得到尺寸为$k \times k \times C $的输出${\mathit{\boldsymbol{z}}_c} $。然后经过两个1×1卷积层和激活层，具体为

$ \mathit{\boldsymbol{z}} = f({\mathit{\boldsymbol{W}}_{\rm{U}}}({\mathit{\boldsymbol{W}}_{\rm{D}}}({\mathit{\boldsymbol{z}}_c}))) $

(8)

式中，${\mathit{\boldsymbol{W}}_{\rm{U}}} $和${\mathit{\boldsymbol{W}}_{\rm{D}}} $分别表示升维和降维的1 × 1卷积，目的是为了更好地拟合通道间的复杂依赖，$f $表示sigmoid激活函数。然后将输出的注意力$\mathit{\boldsymbol{z}} $与输入$\mathit{\boldsymbol{x}} $进行对应通道与对应区域的相乘，得到最终的输出。具体为

$ \mathit{\boldsymbol{x}} = \mathit{\boldsymbol{x}} \odot (U(\mathit{\boldsymbol{z}})) $

(9)

式中，$U $表示最近邻插值操作，将$\mathit{\boldsymbol{z}} $上采样到与输入$\mathit{\boldsymbol{x}} $相同的尺寸，从而使相同区域上的注意力相同；$\odot $表示元素对应相乘。

2.4 高频关注损失

多数超分算法在训练过程中使用L1和L2损失函数，导致重建结果往往趋于平滑，因此需要使网络更加关注高频区域的重建。Fritsche等人(2019)使用高通滤波提取图像的高频细节，并通过将从生成图像和真实图像中提取的高频细节进行对抗(Goodfellow等，2014)的方式提升图像的高频细节。受此启发，提出了高频关注损失。与Fritsche等人(2019)方法不同，没有使用对抗损失，而是预先提取训练图像的高频细节位置，并给这些位置额外的权重，从而在训练过程中使网络更加关注这些区域。

首先，计算高频细节图${\mathit{\boldsymbol{I}}_{\rm{H}}} $。具体为

$ {\mathit{\boldsymbol{I}}_{\rm{H}}} = {\mathit{\boldsymbol{I}}_t} - {F_1}({\mathit{\boldsymbol{I}}_t}) $

(10)

式中，${F_1} $表示低通滤波，${\mathit{\boldsymbol{I}}_t} $表示原始高分辨率图像。

然后，将${\mathit{\boldsymbol{I}}_{\rm{H}}} $中较小的像素值进行过滤，并分配不同的权重得到$ {\mathit{\boldsymbol{H}}_t}$，具体为

$ {H_t}(i) = \left\{ {\begin{array}{*{20}{l}} {f({I_\rm{H}}(i))}\\ 0 \end{array}} \right.\begin{array}{*{20}{l}} {{I_\rm{H}}(i) \ge \mathit{\Delta} }\\ {{I_\rm{H}}(i) < \mathit{{\Delta}} } \end{array} $

(11)

式中，$f $表示sigmoid激活函数，$\mathit{\Delta} $表示预设的阈值，$i $表示图像中像素的位置。

最后，定义高频关注损失${L_{\rm{H}}} $。具体为

$ {L_{\rm{H}}} = {\mathit{\boldsymbol{H}}_t} \odot (\left\| {{\mathit{\boldsymbol{I}}_g} - {\mathit{\boldsymbol{I}}_t}} \right\|) $

(12)

式中，${{\mathit{\boldsymbol{I}}_g}} $表示模型生成的结果。

最终的损失函数设置为

$ L = {L_1} + \lambda \times {L_{\rm{H}}} $

(13)

式中，$ {L_1}$表示${\rm{L1}} $损失函数，$\lambda $表示${L_{\rm{H}}} $的权重。实验过程分两个阶段。第1阶段单独使用${L_1} $损失进行训练，第2阶段联合使用${L_1} $损失和高频关注损失${L_{\rm{H}}} $对第1阶段的模型进行微调，强化对高频细节的学习。

3 实验结果及分析

3.1 实验配置

3.1.1 数据准备

使用DIV2K(DIVerse 2K resolution image dataset)作为训练集。DIV2K包括800幅分辨率为2 K的高清图像，选用DIV2K校验集中的10幅图像作为训练中的校验集。同时，通过数据增强，将训练数据左右镜像翻转和旋转不同角度(0°、90°、180°、270°)，得到原始数据8倍数据量的训练数据。测试集使用4个广泛使用的标准数据集：Set5、Set14、BSD100(Berkeley segmentation dataset)和Urban100。

3.1.2 训练设置

训练了放大倍数为2、3、4的3种模型。在网络超参的设置上，设置了20个RDBAM块，区域级通道注意力上的$k $值设定为3，残差缩放比例$\alpha $设置为0.2，高频关注损失${L_{\rm{H}}} $的权重$\lambda $设置为0.1。网络权重初始化使用He等人(2015)提出的权重初始化方法，初始学习率设置为1E-4，根据经验分别在第80、150、220个epoch时进行学习率减半操作，优化器采用Adam优化器，每次训练的epoch数为300。

第1轮训练时只使用${\rm L1} $损失函数，使网络学习重建低频信息。在后期的模型微调过程中，联合使用${L_1} $损失和高频关注损失${L_{\rm{H}}} $，使网络更好地把握高频细节重建。微调过程共100个epoch，初始学习率为1E-5，学习率在第50个epoch时减半。

3.2 模型简化测试

为验证区域级通道注意力机制和高频关注损失的作用，分别进行了4组放大因子为4的实验，使用DIV2K的部分验证集计算峰值信噪比(peak signal to noise ratio，PSNR)，如表 1所示。验证区域级通道注意力机制时，令其中的$k $(池化过后的尺寸)为1和3，分别表示普通的通道注意力机制和区域级通道注意力机制。验证高频关注损失的作用时，分别验证仅使用${L_1} $损失进行微调和使用${L_1} $损失及高频关注损失${L_{\rm{H}}} $共同进行微调的结果。从表 1可以看出，在都使用${L_1} $损失的情况下，使用区域级通道注意力机制的模型比使用普通通道注意力机制的模型的PSNR增加了0.026 dB。使用相同注意力机制情况下，使用高频关注损失进行微调的模型结果明显好于使用${L_1} $损失进行微调的结果，两者同时使用可以带来更好的结果。图 4进一步展示了表 1中的4种情况在DIV2K校验集上的视觉结果，可以看出，使用了RCA+${L_{\rm{H}}} $的结果纹理更加清晰锐利，视觉效果明显好于没有使用的情况。实验充分证明了区域级通道注意力机制及高频关注损失的有效性。

表 1 模型简化测试
Table 1 Ablation study

下载CSV

方法	PSNR/dB
全局通道注意力机制+${L_1} $	29.643
全局通道注意力机制+${L_1} $+$\lambda {L_{\rm{H}}} $	29.654
区域级通道注意力机制+ ${L_1} $	29.669
区域级通道注意力机制+${L_1} $+$\lambda {L_{\rm{H}}} $	29.674

图 4 DIV2K校验集图像实验结果对比

Fig. 4 Comparison results of DIV2K validation image experiment

((a)HR; (b)base; (c)${L_{\rm{H}}} $; (d)RCA; (e)RCA+${L_{\rm{H}}} $)

3.3 $k $值的研究

考虑到高频细节通常是区域级的，当$k $(池化后的尺寸)过大时，RCA关注的区域数量变得过多而区域尺寸变得过小，可能无法起到关注局部高频细节的作用。为此，本文进一步探索在区域级通道注意力机制中使用不同的$k $值带来的网络性能变化。为节省时间，实验时只用了8个RDBAM块，在不同的$k $值下训练100个epoch，分别测试模型在Set14上的PSNR，结果如表 2所示，其中4种$k $值的训练过程图如图 5所示。从表 2可以看出，在Set14测试集上，$k $=3时性能最佳。$k $值过大时，网络性能开始下降，无法发挥注意力机制的作用。从图 5可以看出，$k $=3时整体训练过程都较为稳定且性能最好，$k $值过大时性能就会下降。因此，适当的$k $值可以提升网络性能，而过大的$k $值会带来性能下降。

表 2 在Set14测试集上不同$k $值下的模型的PSNR
Table 2 PSNR for models with different $k $ values on the Set14 test set

下载CSV

$k $	PSNR/dB
1	28.223
2	28.231
3	28.246
4	28.233
6	28.236
8	28.214
16	28.237
48	28.106
注：加粗字体表示最优结果。

图 5 不同的$k $值在Set14上的PSNR收敛图

Fig. 5 PSNR convergence curves for different $k $ values on the Set14 test set

3.4 低通滤波的选择和阈值$\mathit{\Delta} $的设置

为验证不同的低通滤波和不同的阈值$\mathit{\Delta} $对式(11)中高频权重图${\mathit{\boldsymbol{H}}_t} $提取的影响, 分别考虑了高斯滤波、中值滤波、双边滤波和均值滤波这4种常见的低通滤波，以及0、3、5、10、15这5种不同的阈值$\mathit{\Delta} $。图 6为不同情况下的高频权重图${\mathit{\boldsymbol{H}}_t} $的可视化展现，从下至上依次为$\mathit{\Delta} $=0、$\mathit{\Delta} $=3、$\mathit{\Delta} $=5、$\mathit{\Delta} $=10和$\mathit{\Delta} $=15。可以看出，阈值$\mathit{\Delta} $相同时，使用高斯滤波和双边滤波最后提取的高频纹理效果较差，中值滤波的提取效果整体上也差于均值滤波，4种低通滤波中，使用均值滤波作为低通滤波最终提取的高频权重图效果最好。对阈值$\mathit{\Delta} $，其值较小时，${\mathit{\boldsymbol{H}}_t} $中存在较多的低频信息，其值过大时，${\mathit{\boldsymbol{H}}_t} $中高频信息会显著减少。从图 6可以看出，阈值$\mathit{\Delta} $在5附近时提取得到的高频权重图效果最好。因此，在计算图像的高频权重图时，使用均值滤波作为低通滤波，阈值$\mathit{\Delta} $的值设置为5。

图 6 不同的低通滤波和阈值$\mathit{\Delta} $下的${\mathit{\boldsymbol{H}}_t} $

Fig. 6 ${\mathit{\boldsymbol{H}}_t} $ for different filters and $\mathit{\Delta} $ values

((a)original; (b)Gaussian filter; (c)median filter; (d)bilatera filter; (e)mean filter)

3.5 权重$\lambda $的设置

高频关注损失是为了使网络关注到高频细节位置上的损失，增强网络重建高频细节的能力。然而过于关注高频位置上的损失可能会降低模型的性能，因此需要设置合适的$\lambda $值，使网络既能关注高频细节位置上的损失，又不降低模型性能。实验设定了6种不同的$\lambda $值：0、0.05、0.1、0.5、1、10，使用DIV2K的部分校验集计算PSNR，具体结果如表 3所示。可以看出，$\lambda $= 0.1时PSNR最高。$\lambda $较小时，网络对于高频位置上的关注较少，因此PSNR较低；$\lambda $过大时，网络过分关注高频信息而忽略了对整体低频部分的重建，反而会降低性能。

表 3 不同$\lambda $值下的模型在DIV2K校验集上的PSNR
Table 3 PSNR for models with different $\lambda $ values on the DIV2K validation set

下载CSV

$\lambda $值	PSNR/dB
0	29.643
0.05	29.647
0.1	29.654
0.5	29.641
1	29.639
10	29.567
注：加粗字体表示最优结果。

3.6 主观效果及客观指标

为充分验证本文算法性能，与Bicubic(Keys，1981)、SRCNN(Dong等，2015)、VDSR(accurate image super-resolution using very deep convolutional net-works)(Kim等，2016a)、RCAN(image super-resolution using very deep residual channel attention networks)(Zhang等，2018b)、SRFBN(feedback network for image super-resolution)(Li等，2019)和HAN(single image super-resolution via a holistic attention network)(Niu等，2020)算法进行对比，在4个常用数据集Set5、Set14、BSD100和Urban100上分别进行测试。为保证公平，对比算法均使用DIV2K训练集，且训练条件与本文算法相同。表 4—表 6分别展示了不同算法在2倍、3倍和4倍放大倍数下的PSNR和SSIM(structural similarity)。可以看到，与其他算法相比，本文算法性能均有显著提升。

表 4 不同算法在2倍放大倍数下的PSNR和SSIM
Table 4 Average PSNR/SSIM of different algorithms for scale factors ×2

下载CSV

方法	Set5		Set14		BSD100		Urban100
方法	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	33.66	0.929 9	30.24	0.868 8	29.56	0.843 1	26.88	0.840 3
SRCNN	36.66	0.954 2	32.45	0.906 7	31.36	0.887 9	29.50	0.894 6
VDSR	37.53	0.959 0	33.05	0.913 0	31.90	0.896 0	30.77	0.914 0
RCAN	38.08	0.959 3	33.87	0.913 5	32.13	0.899 7	32.90	0.933 9
SRFBN	38.11	0.957 4	33.79	0.913 3	32.12	0.895 5	32.78	0.928 0
HAN	38.13	0.960 5	34.04	0.916 5	32.20	0.898 0	32.93	0.934 6
本文	38.15	0.961 1	33.95	0.919 4	32.30	0.900 9	32.95	0.934 9
注：加粗字体表示各列最优结果。

表 5 不同算法在3倍放大倍数下的PSNR和SSIM
Table 5 Average PSNR/SSIM of different algorithms for scale factors ×3

下载CSV

方法	Set5		Set14		BSD100		Urban100
方法	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	30.39	0.868 2	27.55	0.774 2	25.96	0.667 5	24.46	0.734 9
SRCNN	32.75	0.909 0	29.30	0.821 5	28.41	0.786 3	26.24	0.798 9
VDSR	33.67	0.921 0	29.78	0.832 0	28.83	0.799 0	27.14	0.829 0
RCAN	34.67	0.927 6	30.49	0.845 5	29.20	0.808 8	28.73	0.866 9
SRFBN	34.63	0.926 3	30.48	0.845 7	29.13	0.808 5	28.69	0.865 3
HAN	34.65	0.926 9	30.55	0.844 4	29.26	0.803 5	28.91	0.859 9
本文	34.72	0.929 0	30.63	0.846 1	29.28	0.809 7	28.93	0.867 8
注：加粗字体表示各列最优结果。

表 6 不同算法在4倍放大倍数下的PSNR和SSIM
Table 6 Average PSNR/SSIM of different algorithms for scale factors ×4

下载CSV

方法	Set5		Set14		BSD100		Urban100
方法	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	28.42	0.810 4	26.00	0.702 7	25.96	0.667 5	23.14	0.657 7
SRCNN	30.48	0.862 8	27.50	0.751 3	26.90	0.710 1	24.52	0.722 1
VDSR	31.35	0.883 0	28.02	0.768 0	27.29	0.072 6	25.18	0.754 0
RCAN	32.55	0.897 8	28.80	0.787 6	27.71	0.743 0	26.61	0.803 2
SRFBN	32.35	0.896 9	28.75	0.779 9	27.65	0.743 3	26.53	0.796 5
HAN	32.60	0.893 0	28.82	0.783 3	27.72	0.749 1	26.70	0.798 6
本文	32.51	0.898 0	28.82	0.787 6	27.72	0.742 4	26.66	0.803 6
注：加粗字体表示各列最优结果。

为进一步验证本文算法的性能，与Bicubic、SRCNN、VDSR、RCAN、SRFBN和HAN算法的效果进行对比，图 7和图 8分别为不同算法对Urban100数据集中的img016和img024图像的4倍重建结果。可以看出，相较于其他算法，本文算法重建的楼房效果更加清晰。而对于图 8中的地板，本文算法对纹理的重建效果要更加清晰且不失真。

图 7 不同算法对Urban100数据集中img016的4倍重建结果

Fig. 7 Four times reconstruction results of img016 in Urban100 dataset by different algorithms

((a)original; (b)HR; (c)Bicubic; (d)SRCNN; (e)VDSR; (f)RCAN; (g)SRFBN; (h)HAN; (i)ours)

图 8 不同算法对Urban100数据集中img024的4倍重建结果

Fig. 8 Four times reconstruction results of img024 in Urban100 dataset by different algorithms

((a)original; (b)HR; (c)Bicubic; (d)SRCNN; (e)VDSR; (f)RCAN; (g)SRFBN; (h)HAN; (i)ours)

4 结论

针对当前常用的通道注意力机制无法关注到区域级信息问题，本文提出区域级通道注意力机制下的超分辨率算法，在进行通道注意力机制操作时将每个通道分区域进行通道注意力分配，使特征图中每个通道的不同区域获得不同关注。同时，提出了高频关注损失，通过提升网络对高频细节位置的关注程度使网络重点学习高频信息的重建。实验结果表明，本文算法在使用区域级通道注意力机制以及高频关注损失情况下，在客观指标以及主观效果上都明显好于其他对比算法。然而，本文算法也存在不足之处。为了取得更好的性能，本文算法使用了非常深的网络，导致网络推理速度较慢，整体效率较低，不能很好地应用于工程任务。在后面的研究中，将关注网络轻量化，在不过多降低性能情况下最大程度地提高模型的运行效率。

参考文献

Anwar S, Barnes N. 2020. Densely residual Laplacian super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99: 1-12 [DOI:10.1109/TPAMI.2020.3021088]

Cao F L, Liu H. 2019. Single image super-resolution via multi-scale residual channel attention network. Neurocomputing, 358: 424-436 [DOI:10.1016/j.neucom.2019.05.066]

Dong C, Loy C C, He K M, Tang X O. 2015. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307 [DOI:10.1109/TPAMI.2015.2439281]

Duan C Y, Xiao N F. 2019. Parallax-based spatial and channel attention for stereo image super-resolution. IEEE Access, 7: 183672-183679 [DOI:10.1109/ACCESS.2019.2960561]

Fritsche M, Gu S H and Timofte R. 2019. Frequency separation for real-world super-resolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3599-3608[DOI: 10.1109/iccvw.2019.00445]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680[DOI: 10.5555/2969033.2969125]

He K M, Zhang X Y, Ren S and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/cvpr.2016.90]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034[DOI: 10.1109/iccv.2015.123]

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/cvpr.2018.00745]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/cvpr.2017.243]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR: 448-456[DOI: 10.5555/3045118.3045167]

Jin W, Chen Y. 2020. Multi-scale residual channel attention network for face super-resolution. Journal of Computer-Aided Design and Computer Graphics, 32(6): 959-970 (金炜, 陈莹. 2020. 多尺度残差通道注意机制下的人脸超分辨率网络. 计算机辅助设计与图形学学报, 32(6): 959-970) [DOI:10.3724/SP.J.1089.2020.17995]

Keys R. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(6): 1153-1160 [DOI:10.1109/TASSP.1981.1163711]

Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/cvpr.2016.182]

Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/cvpr.2016.181]

Lee W Y, Chuang P Y and Wang Y C F. 2019. Perceptual quality preserving image super-resolution via channel attention//Proceedings of ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Brighton, UK: IEEE: 1737-1741[DOI: 10.1109/icassp.2019.8683507]

Li Z, Yang J L, Liu Z, Yang X M, Jeon G and Wu W. 2019. Feedback network for image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3862-3871[DOI: 10.1109/CVPR.2019.00399]

Lim B, Son S, Kim H, Nah S and Lee K. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/cvprw.2017.151]

Liu D, Wen B H, Fan Y C, Loy C C and Huang T S. 2018. Non-local recurrent network for image restoration//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montral, Canada: Curran Associates: 1680-1689[DOI: 10.5555/3326943.3327097]

Ma W, Pan Z X, Yuan F, Lei B. 2019. Super-resolution of remote sensing images via a dense residual generative adversarial network. Remote Sensing, 11(21): #2578 [DOI:10.3390/rs11212578]

Niu B, Wen W L, Ren W Q, Zhang X D, Yang L P, Wang S Z, Zhang K H, Cao X C and Shen H F. 2020. Single image super-resolution via a holistic attention network//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 191-207[DOI: 10.1007/978-3-030-58610-2_12]

Sajjadi M S M, Schõlkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510[DOI: 10.1109/iccv.2017.481]

Shi W Z, Caballero J, Ledig C, Zhuang X H, Bai W J, Bhatia K, De Marvao A M S M, Dawes T, O'Regan D and Rueckert D. 2013. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch//Proceedings of the 16th International Conference on Medical Image Computing and Computer-Assisted Intervention. Nagoya, Japan: Springer: 9-16[DOI: 10.1007/978-3-642-40760-4_2]

Soh J W, Cho N I. 2020. Lightweight single image super-resolution with multi-scale spatial attention networks. IEEE Access, 8: 35383-35391 [DOI:10.1109/access.2020.2974876]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31 st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates: 6000-6010[DOI: 10.5555/3295222.3295349]

Vu T, Nguyen C V, Pham T X, Luu T M and Yoo C D. 2018. Fast and efficient image quality enhancement via desubpixel convolutional neural networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 243-259[DOI: 10.1007/978-3-030-11021-5_16]

Wang H R, Fan Y, Wang Z X, Jiao L C and Schiele B. 2018a. Parameter-free spatial attention network for person re-identification[EB/OL]. [2020-09-13]. https://arxiv.org/pdf/1811.12150.pdf

Wang X L, Girshick R, Gupta A and He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[DOI: 10.1109/cvpr.2018.00813]

Wang X T, Yu K, Wu S X, Gu J J, Liu Y H, Dong C, Qiao Y and Loy C C. 2018c. ESRGAN: enhanced super-resolution generative adversarial networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 63-79[DOI: 10.1007/978-3-030-11021-5_5]

Woo S, Park J, Lee J Y and Kweon S I. 2018. CBAM: convolutional block attention module//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI: 10.1007/978-3-030-01234-2_1]

Xing X R, Zhang D W. 2019. Image super-resolution using aggregated residual transformation networks with spatial attention. IEEE Access, 7: 92572-92585 [DOI:10.1109/access.2019.2927238]

Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18]

Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018a. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[DOI: 10.1109/CVPR.2018.00262]