|
发布时间: 2021-12-16 |
图像处理和编码 |
|
|
收稿日期: 2020-10-13; 修回日期: 2020-12-09; 预印本日期: 2020-12-16
基金项目: 国家重点研发计划资助(2017YFB0202303);国家自然科学基金项目(61602213,61772013)
作者简介:
周波, 1996年生, 男, 硕士研究生, 主要研究方向为深度学习和图像处理。E-mail: 6181611033@stu.jiangnan.edu.cn
李成华, 男, 助理研究员, 主要研究方向为深度学习和计算机视觉。E-mail: lichenghua2014@ia.ac.cn 陈伟, 通信作者, 男, 副教授, 主要研究方向为计算机图形学和智能信息处理。E-mail: 8313120015@jiangnan.edu.cn *通信作者: 陈伟 8313120015@jiangnan.edu.cn
中图法分类号: TP391.4
文献标识码: A
文章编号: 1006-8961(2021)12-2836-12
|
摘要
目的 通道注意力机制在图像超分辨率中已经得到了广泛应用,但是当前多数算法只能在通道层面选择感兴趣的特征图而忽略了空间层面的信息,使得特征图中局部空间层面上的信息不能合理利用。针对此问题,提出了区域级通道注意力下的图像超分辨率算法。方法 设计了非局部残差密集网络作为网络的主体结构,包括非局部模块和残差密集注意力模块。非局部模块提取非局部相似信息并传到后续网络中,残差密集注意力模块在残差密集块结构的基础上添加了区域级通道注意力机制,可以给不同空间区域上的通道分配不同的注意力,使空间上的信息也能得到充分利用。同时针对当前普遍使用的L1和L2损失函数容易造成生成结果平滑的问题,提出了高频关注损失,该损失函数提高了图像高频细节位置上损失的权重,从而在后期微调过程中使网络更好地关注到图像的高频细节部分。结果 在4个标准测试集Set5、Set14、BSD100(Berkeley segmentation dataset)和Urban100上进行4倍放大实验,相比较于插值方法和SRCNN(image super-resolution using deep convolutional networks)算法,本文方法的PSNR(peak signal to noise ratio)均值分别提升约3.15 dB和1.58 dB。结论 区域级通道注意力下的图像超分辨率算法通过使用区域级通道注意力机制自适应调整网络对不同空间区域上通道的关注程度,同时结合高频关注损失加强对图像高频细节部分的关注程度,使生成的高分辨率图像具有更好的视觉效果。
关键词
深度学习; 卷积神经网络(CNN); 超分辨率; 注意力机制; 非局部神经网络
Abstract
Objective As an important branch of image processing, image super-resolution has attracted extensive attention of many scholars. The attention mechanism has originally been applied to machine translation in deep learning. As an extension of the attention mechanism, the channel attention mechanism has been widely used in image super-resolution. A single image super-resolution using region-level channel attention has been proposed. A region-level channel attention mechanism has been presented in the network, which can assign different attention to different channels in different regions. Meanwhile, the high-frequency aware loss has been demonstrated with aiming at the characteristics that L1 and L2 losses commonly used at present tend to produce very smooth results. This loss function has strengthened the weight of losses at high-frequency positions to the generation of high-frequency details. Method The network structure has consisted of three parts: low-level feature extraction, high-level feature extraction and image reconstruction. In the low-level feature extraction part, the algorithm used 1 layer of 3×3 convolution. The high-level feature extraction part has contained a non-local module and several residual dense block attention modules. The non-local module has extracted the non-local similarity information via the non-local operation. Sub-pixel convolutional layer has been used before calculating non-local similar information. The calculation has been conducted at low resolution. Dense connection has been used in the residual dense block attention modules to facilitate the network adaptive accumulation of features of different layers. Meanwhile, residual learning has been used to further optimize the gradient propagation problem. Region-level channel attention mechanism has been introduced to pay attention to information in different regions adaptively. The initial non-local similarity information has been added to the last layer by skip connection. In the image reconstruction part, sub-pixel convolution has been used to up-sampling operation on features and a 3×3 convolutional layer has been used to obtain the final reconstruction result. In terms of loss function, the high-frequency aware loss has been operated for enhancing the network's ability of reconstructing high frequency details. Before training, the locations of high-frequency details in the image have been extracted. During training, more weight has been added to the losses at these locations to better learn the reconstruction process of high-frequency details. The whole training process has been divided into two stages. In the first stage, L1 loss has used to train the network. In the second stage, the high-frequency aware loss and L1 loss has used to fine-tune the model of the first stage together. Result Region-level channel attention and the high-frequency aware loss have been verified via ablation study. The model using the region-level channel attention is significantly better on peak signal to noise ratio (PSNR). The high-frequency aware loss and L1 loss together to fine-tune the model is better on PSNR than the model only use L1 loss to fine-tune. The good effect of the region-level channel attention and the high-frequency aware loss have been verified both at the same time. Set5, Set14, Berkeley segmentation dataset (BSD100) and Urban100 have been selected for testing in comparison with other algorithms. The comparison algorithms have included Bicubic, image super-resolution using deep convolutional networks (SRCNN), accurate image super-resolution using very deep convolutional networks (VDSR), image super-resolution using very deep residual channel attention networks (RCAN), feedback network for image super-resolution (SRFBN) and single image super-resolution via a holistic attention network (HAN) respectively. On the subjective effect of the present, the results with a factor of 4, three of the results have been selected for display. The results generated by the algorithm have presented more rich in texture without any blurring or distortion. In the presentation of objective indicators, PSNR and structural similarity (SSIM) have been used as indicators to make a comprehensive comparison under three different factors of 2, 3 and 4, respectively. PSNR of the model with amplification factor of 4 under four standard test sets is 32.51 dB, 28.82 dB, 27.72 dB and 26.66 dB, respectively. Conclusion A super-resolution algorithm using region-level channel attention mechanism has been commonly used channel attention in region-level. Based on the high-frequency aware loss, the network can reconstruct high frequency details by increasing the attention degree of the network to the high frequency detail location. The experimental results have shown that the proposed algorithm has its priority in objective indicators and subjective effects via using region-level channel attention mechanism and high-frequency aware loss.
Key words
deep learning; convolutional neural network(CNN); super-resolution; attention mechanism; non-local neural network
0 引言
图像超分辨率重建的目的是将低分辨率图像变为高分辨率图像,已广泛应用于计算机视觉领域,如医学图像分析(Shi等,2013)、目标识别(Sajjadi等,2017)等。然而,图像超分辨率重建是一个经典的反问题,针对任意一幅低分辨率图像都存在无数幅与之对应的高分辨率图像。为了更好地解决这一问题,提出了大量基于学习的方法,用于学习低分辨率图像到高分辨率图像的映射。
基于深度学习的方法相比于传统算法性能取得了显著提升。Dong等人(2015)首次在图像超分辨率任务中引入了3层卷积神经网络。Kim等人(2016a, 2016b)在VDSR(accurate image super-resolution using very deep convolutional networks)和DRCN(deeply-recursive convolutional network for image super-resolution)中将网络深度提升到了20层,并且相比于SRCNN(image super-resolution using deep convolutional networks)取得了视觉效果和指标上的明显提升。在许多视觉任务中,网络深度对于模型性能至关重要。He等人(2016)首次使用残差学习训练深度网络并取得了卓越性能。Lim等人(2017)在EDSR(enhanced deep residual networks for single image super-resolution)中对普遍使用的残差结构进行改进,移除了其中的批归一化层(batch normalization, BN)(Ioffe和Szegedy,2015),使网络性能得到极大提升。Zhang等人(2018a)受EDSR的启发,提出残差密集网络(residual dense network for image super-resolution,RDN),将残差结构与密集连接进行结合,设计了残差密集块,在超分辨率算法中得到了广泛应用(Ma等,2019;Anwar和Barnes,2020)。
另一方面,注意力机制(Vaswani等,2017)在计算机视觉任务中的应用越来越广泛(Niu等,2020)。Wang等人(2018b)提出了非局部神经网络,将某一位置的响应计算为这一位置和其他所有空间位置上关系的加权和,Liu等人(2018)在图像重建任务中应用了这一模块。Wang等人(2018a)提出了一种无参的空间注意力机制(spatial attention, SA)用于行人重识别,更加关注空间位置上提供的有用信息。随后许多算法也将其应用到超分辨率任务中(Xing和Zhang,2019;Soh和Cho,2020)。此外,还有其他算法尝试探索通道间的相关性。Hu等人(2018)提出SENet,使用一种SE(squeeze-and-excitation)模块,使网络可以关注通道层面上的关系并在分类任务中取得了显著效果。Zhang等人(2018b)以及Duan和Xiao(2019)则进一步将通道注意力机制(channel attention, CA)应用到了超分辨率任务中。
通道注意力机制在超分辨率任务中已经获得了不错的表现,然而目前通常使用的CA存在一定的局限性。一般的CA对于一个通道所有空间区域上的关注程度都相同,导致可能忽略通道上某些区域的关键高频特征或放大次要的低频特征,不利于最终重建出高频细节丰富的图像。为此,提出了一种新颖的区域级通道注意力机制(region-level channel attention, RCA),可以使网络对不同空间区域上的通道分配不同的注意力。同时提出了高频关注损失,用以增强网络对于高频细节的关注。
本文主要贡献如下:1)提出了区域级通道注意力机制。有别于以往通道注意力机制,RCA可以对不同的空间区域分别计算其通道维度上的注意力,而不是对整个通道都计算同一个注意力,这使得网络可以对通道中不同空间区域上的特征给予不同的关注,自适应地选择感兴趣的区域。2)设计了非局部残差密集结构来构建深度网络。通过非局部模块提取出非局部相似信息后,将其传到后续的残差密集注意力模块中,自适应地累积从非局部相似信息中提取的高级特征和低级特征,进一步强化特征的利用。3)提出了高频关注损失。训练时使网络重点关注高频细节位置上的损失,从而增强网络重建高频细节的能力。
1 相关工作
1.1 残差结构和密集连接
He等人(2016)提出了残差结构,通过跳跃连接将当前层的信息传递到其他层上,成功解决了深度网络下梯度传播困难的问题。同时,Huang等人(2017)在DenseNet(densely connected convolutional networks)中提出了密集连接结构,在当前层融合前面所有不同层的特征,并将自己的特征作为后续所有层的输入,有效缓解了梯度消失问题,增强了特征传播。Wang等人(2018c)受此启发,提出了ESRGAN(enhanced super-resolution generative adversarial networks),在其生成器中成功将残差结构和密集连接应用到超分辨率任务中,性能得到显著提升。
为了提升网络性能,本文算法也使用残差结构和密集连接,并在此基础上设计了非局部残差密集结构,从而增强网络的特征表示能力。
1.2 通道注意力机制
注意力机制帮助深度神经网络决定关注的区域,并且强化感兴趣区域信息的利用。近年来,注意力机制逐渐变成了深度学习中的重要部分,并且已经广泛应用于计算机视觉任务中。其中,通道注意力机制通过建立特征通道间的相互依赖关系,自适应地对每个特征通道的权重进行重新分配。例如,Zhang等人(2018b)在实现通道注意力机制过程中,首先在空间维度上进行全局池化,即
$ {\mathit{\boldsymbol{z}}_c} = GP(\mathit{\boldsymbol{x}}) $ | (1) |
式中,
$ \mathit{\boldsymbol{z}} = {\rm{sigmoid}}({\mathit{\boldsymbol{F}}_{\rm{U}}}{\mathop{ Re}\nolimits} LU({\mathit{\boldsymbol{F}}_{\rm{D}}}({\mathit{\boldsymbol{z}}_c}))) $ | (2) |
式中,
$ {{\mathit{\boldsymbol{x'}}}_i} = {\mathit{\boldsymbol{z}}_i} \cdot {\mathit{\boldsymbol{x}}_i} $ | (3) |
式中,
许多超分辨率算法(Cao和Liu,2019;Lee等,2019;金炜和陈莹,2020)都使用了这种形式的通道注意力机制,然而只能在通道层面上选择感兴趣的特征图,在一定程度上限制了网络性能。因此,本文提出的区域级通道注意力机制在池化过程中对特征图上的不同区域分别进行池化操作,使网络可以自适应地关注不同通道和不同空间区域上的信息。
2 算法设计
2.1 网络结构
整体的网络结构如图 1所示,包括低级特征提取模块、高级特征提取模块和图像重建模块3部分。在低级特征提取模块中只用了一层3×3卷积,在高级特征提取部分使用非局部残差密集结构,包括一个非局部模块和多个残差密集注意力模块(residual dense block aten-tion module, RDBAM)。非局部模块(图 1上半部分蓝色虚线框)通过使用非局部操作(Wang等,2018b)实现非局部相似性信息提取,而在非局部操作中,与Vu等人(2018)方法一样,使用逆亚像素卷积层,在低分辨率下计算空间相似性信息,然后通过一层亚像素卷积层回到原来的尺寸,从而减少了计算量。图 1中
2.2 残差密集注意力模块
RDBAM由3个密集连接块和1个区域级通道注意力模块(RCA)组成,具体结构如图 2所示。与Wang等人(2018b)提出的结构类似,每个密集连接块由4个卷积层组成,其中,每一层的输出都会传递到后续所有层,每个RDBAM中第
$ {\mathit{\boldsymbol{D}}_{c, i}}(\mathit{\boldsymbol{x}}) = {\mathit{\boldsymbol{W}}_{c, i}}([{\mathit{\boldsymbol{D}}_{c, 1}}(\mathit{\boldsymbol{x}}), {\mathit{\boldsymbol{D}}_{c, 2}}(\mathit{\boldsymbol{x}}), \cdots {\mathit{\boldsymbol{D}}_{c, i - 1}}(\mathit{\boldsymbol{x}})]) $ | (4) |
式中,
2.3 区域级通道注意力机制
使用通道注意力机制(CA)是为了关注通道中的高频特征,进而对特征图中的不同通道加以不同的关注。然而高频特征通常存在于不同的空间位置,仅关注通道层面可能会忽略许多高频信息,导致生成的结果缺乏高频细节。为了解决这个问题,研究特征图中不同区域下的特征在其通道维度上的相互依赖性,提出了区域级通道注意力机制,如图 3所示。
为了使网络可以考虑到不同区域的信息,在进行注意力机制操作前,先将输入的特征在空间维度上分成
$ {\mathit{\boldsymbol{z}}_{\max }} = G{P_{\max }}(\mathit{\boldsymbol{x}}, k) $ | (5) |
$ {\mathit{\boldsymbol{z}}_{{\rm{mean}}}} = G{P_{{\rm{mean}}}}(\mathit{\boldsymbol{x}}, k) $ | (6) |
$ {\mathit{\boldsymbol{z}}_c} = {\mathit{\boldsymbol{z}}_{\max }} + {\mathit{\boldsymbol{z}}_{{\rm{mean}}}} $ | (7) |
式中,
$ \mathit{\boldsymbol{z}} = f({\mathit{\boldsymbol{W}}_{\rm{U}}}({\mathit{\boldsymbol{W}}_{\rm{D}}}({\mathit{\boldsymbol{z}}_c}))) $ | (8) |
式中,
$ \mathit{\boldsymbol{x}} = \mathit{\boldsymbol{x}} \odot (U(\mathit{\boldsymbol{z}})) $ | (9) |
式中,
2.4 高频关注损失
多数超分算法在训练过程中使用L1和L2损失函数,导致重建结果往往趋于平滑,因此需要使网络更加关注高频区域的重建。Fritsche等人(2019)使用高通滤波提取图像的高频细节,并通过将从生成图像和真实图像中提取的高频细节进行对抗(Goodfellow等,2014)的方式提升图像的高频细节。受此启发,提出了高频关注损失。与Fritsche等人(2019)方法不同,没有使用对抗损失,而是预先提取训练图像的高频细节位置,并给这些位置额外的权重,从而在训练过程中使网络更加关注这些区域。
首先,计算高频细节图
$ {\mathit{\boldsymbol{I}}_{\rm{H}}} = {\mathit{\boldsymbol{I}}_t} - {F_1}({\mathit{\boldsymbol{I}}_t}) $ | (10) |
式中,
然后,将
$ {H_t}(i) = \left\{ {\begin{array}{*{20}{l}} {f({I_\rm{H}}(i))}\\ 0 \end{array}} \right.\begin{array}{*{20}{l}} {{I_\rm{H}}(i) \ge \mathit{\Delta} }\\ {{I_\rm{H}}(i) < \mathit{{\Delta}} } \end{array} $ | (11) |
式中,
最后,定义高频关注损失
$ {L_{\rm{H}}} = {\mathit{\boldsymbol{H}}_t} \odot (\left\| {{\mathit{\boldsymbol{I}}_g} - {\mathit{\boldsymbol{I}}_t}} \right\|) $ | (12) |
式中,
最终的损失函数设置为
$ L = {L_1} + \lambda \times {L_{\rm{H}}} $ | (13) |
式中,
3 实验结果及分析
3.1 实验配置
3.1.1 数据准备
使用DIV2K(DIVerse 2K resolution image dataset)作为训练集。DIV2K包括800幅分辨率为2 K的高清图像,选用DIV2K校验集中的10幅图像作为训练中的校验集。同时,通过数据增强,将训练数据左右镜像翻转和旋转不同角度(0°、90°、180°、270°),得到原始数据8倍数据量的训练数据。测试集使用4个广泛使用的标准数据集:Set5、Set14、BSD100(Berkeley segmentation dataset)和Urban100。
3.1.2 训练设置
训练了放大倍数为2、3、4的3种模型。在网络超参的设置上,设置了20个RDBAM块,区域级通道注意力上的
第1轮训练时只使用
3.2 模型简化测试
为验证区域级通道注意力机制和高频关注损失的作用,分别进行了4组放大因子为4的实验,使用DIV2K的部分验证集计算峰值信噪比(peak signal to noise ratio,PSNR),如表 1所示。验证区域级通道注意力机制时,令其中的
表 1
模型简化测试
Table 1
Ablation study
方法 | PSNR/dB |
全局通道注意力机制+ |
29.643 |
全局通道注意力机制+ |
29.654 |
区域级通道注意力机制+ |
29.669 |
区域级通道注意力机制+ |
29.674 |
3.3 $k $ 值的研究
考虑到高频细节通常是区域级的,当
表 2
在Set14测试集上不同
Table 2
PSNR for models with different
PSNR/dB | |
1 | 28.223 |
2 | 28.231 |
3 | 28.246 |
4 | 28.233 |
6 | 28.236 |
8 | 28.214 |
16 | 28.237 |
48 | 28.106 |
注:加粗字体表示最优结果。 |
3.4 低通滤波的选择和阈值$\mathit{\Delta} $ 的设置
为验证不同的低通滤波和不同的阈值
3.5 权重$\lambda $ 的设置
高频关注损失是为了使网络关注到高频细节位置上的损失,增强网络重建高频细节的能力。然而过于关注高频位置上的损失可能会降低模型的性能,因此需要设置合适的
表 3
不同
Table 3
PSNR for models with different
PSNR/dB | |
0 | 29.643 |
0.05 | 29.647 |
0.1 | 29.654 |
0.5 | 29.641 |
1 | 29.639 |
10 | 29.567 |
注:加粗字体表示最优结果。 |
3.6 主观效果及客观指标
为充分验证本文算法性能,与Bicubic(Keys,1981)、SRCNN(Dong等,2015)、VDSR(accurate image super-resolution using very deep convolutional net-works)(Kim等,2016a)、RCAN(image super-resolution using very deep residual channel attention networks)(Zhang等,2018b)、SRFBN(feedback network for image super-resolution)(Li等,2019)和HAN(single image super-resolution via a holistic attention network)(Niu等,2020)算法进行对比,在4个常用数据集Set5、Set14、BSD100和Urban100上分别进行测试。为保证公平,对比算法均使用DIV2K训练集,且训练条件与本文算法相同。表 4—表 6分别展示了不同算法在2倍、3倍和4倍放大倍数下的PSNR和SSIM(structural similarity)。可以看到,与其他算法相比,本文算法性能均有显著提升。
表 4
不同算法在2倍放大倍数下的PSNR和SSIM
Table 4
Average PSNR/SSIM of different algorithms for scale factors ×2
方法 | Set5 | Set14 | BSD100 | Urban100 | |||||||
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | ||||
Bicubic | 33.66 | 0.929 9 | 30.24 | 0.868 8 | 29.56 | 0.843 1 | 26.88 | 0.840 3 | |||
SRCNN | 36.66 | 0.954 2 | 32.45 | 0.906 7 | 31.36 | 0.887 9 | 29.50 | 0.894 6 | |||
VDSR | 37.53 | 0.959 0 | 33.05 | 0.913 0 | 31.90 | 0.896 0 | 30.77 | 0.914 0 | |||
RCAN | 38.08 | 0.959 3 | 33.87 | 0.913 5 | 32.13 | 0.899 7 | 32.90 | 0.933 9 | |||
SRFBN | 38.11 | 0.957 4 | 33.79 | 0.913 3 | 32.12 | 0.895 5 | 32.78 | 0.928 0 | |||
HAN | 38.13 | 0.960 5 | 34.04 | 0.916 5 | 32.20 | 0.898 0 | 32.93 | 0.934 6 | |||
本文 | 38.15 | 0.961 1 | 33.95 | 0.919 4 | 32.30 | 0.900 9 | 32.95 | 0.934 9 | |||
注:加粗字体表示各列最优结果。 |
表 5
不同算法在3倍放大倍数下的PSNR和SSIM
Table 5
Average PSNR/SSIM of different algorithms for scale factors ×3
方法 | Set5 | Set14 | BSD100 | Urban100 | |||||||
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | ||||
Bicubic | 30.39 | 0.868 2 | 27.55 | 0.774 2 | 25.96 | 0.667 5 | 24.46 | 0.734 9 | |||
SRCNN | 32.75 | 0.909 0 | 29.30 | 0.821 5 | 28.41 | 0.786 3 | 26.24 | 0.798 9 | |||
VDSR | 33.67 | 0.921 0 | 29.78 | 0.832 0 | 28.83 | 0.799 0 | 27.14 | 0.829 0 | |||
RCAN | 34.67 | 0.927 6 | 30.49 | 0.845 5 | 29.20 | 0.808 8 | 28.73 | 0.866 9 | |||
SRFBN | 34.63 | 0.926 3 | 30.48 | 0.845 7 | 29.13 | 0.808 5 | 28.69 | 0.865 3 | |||
HAN | 34.65 | 0.926 9 | 30.55 | 0.844 4 | 29.26 | 0.803 5 | 28.91 | 0.859 9 | |||
本文 | 34.72 | 0.929 0 | 30.63 | 0.846 1 | 29.28 | 0.809 7 | 28.93 | 0.867 8 | |||
注:加粗字体表示各列最优结果。 |
表 6
不同算法在4倍放大倍数下的PSNR和SSIM
Table 6
Average PSNR/SSIM of different algorithms for scale factors ×4
方法 | Set5 | Set14 | BSD100 | Urban100 | |||||||
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | ||||
Bicubic | 28.42 | 0.810 4 | 26.00 | 0.702 7 | 25.96 | 0.667 5 | 23.14 | 0.657 7 | |||
SRCNN | 30.48 | 0.862 8 | 27.50 | 0.751 3 | 26.90 | 0.710 1 | 24.52 | 0.722 1 | |||
VDSR | 31.35 | 0.883 0 | 28.02 | 0.768 0 | 27.29 | 0.072 6 | 25.18 | 0.754 0 | |||
RCAN | 32.55 | 0.897 8 | 28.80 | 0.787 6 | 27.71 | 0.743 0 | 26.61 | 0.803 2 | |||
SRFBN | 32.35 | 0.896 9 | 28.75 | 0.779 9 | 27.65 | 0.743 3 | 26.53 | 0.796 5 | |||
HAN | 32.60 | 0.893 0 | 28.82 | 0.783 3 | 27.72 | 0.749 1 | 26.70 | 0.798 6 | |||
本文 | 32.51 | 0.898 0 | 28.82 | 0.787 6 | 27.72 | 0.742 4 | 26.66 | 0.803 6 | |||
注:加粗字体表示各列最优结果。 |
为进一步验证本文算法的性能,与Bicubic、SRCNN、VDSR、RCAN、SRFBN和HAN算法的效果进行对比,图 7和图 8分别为不同算法对Urban100数据集中的img016和img024图像的4倍重建结果。可以看出,相较于其他算法,本文算法重建的楼房效果更加清晰。而对于图 8中的地板,本文算法对纹理的重建效果要更加清晰且不失真。
4 结论
针对当前常用的通道注意力机制无法关注到区域级信息问题,本文提出区域级通道注意力机制下的超分辨率算法,在进行通道注意力机制操作时将每个通道分区域进行通道注意力分配,使特征图中每个通道的不同区域获得不同关注。同时,提出了高频关注损失,通过提升网络对高频细节位置的关注程度使网络重点学习高频信息的重建。实验结果表明,本文算法在使用区域级通道注意力机制以及高频关注损失情况下,在客观指标以及主观效果上都明显好于其他对比算法。然而,本文算法也存在不足之处。为了取得更好的性能,本文算法使用了非常深的网络,导致网络推理速度较慢,整体效率较低,不能很好地应用于工程任务。在后面的研究中,将关注网络轻量化,在不过多降低性能情况下最大程度地提高模型的运行效率。
参考文献
-
Anwar S, Barnes N. 2020. Densely residual Laplacian super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99: 1-12 [DOI:10.1109/TPAMI.2020.3021088]
-
Cao F L, Liu H. 2019. Single image super-resolution via multi-scale residual channel attention network. Neurocomputing, 358: 424-436 [DOI:10.1016/j.neucom.2019.05.066]
-
Dong C, Loy C C, He K M, Tang X O. 2015. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307 [DOI:10.1109/TPAMI.2015.2439281]
-
Duan C Y, Xiao N F. 2019. Parallax-based spatial and channel attention for stereo image super-resolution. IEEE Access, 7: 183672-183679 [DOI:10.1109/ACCESS.2019.2960561]
-
Fritsche M, Gu S H and Timofte R. 2019. Frequency separation for real-world super-resolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3599-3608[DOI: 10.1109/iccvw.2019.00445]
-
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680[DOI: 10.5555/2969033.2969125]
-
He K M, Zhang X Y, Ren S and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/cvpr.2016.90]
-
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034[DOI: 10.1109/iccv.2015.123]
-
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/cvpr.2018.00745]
-
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/cvpr.2017.243]
-
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR: 448-456[DOI: 10.5555/3045118.3045167]
-
Jin W, Chen Y. 2020. Multi-scale residual channel attention network for face super-resolution. Journal of Computer-Aided Design and Computer Graphics, 32(6): 959-970 (金炜, 陈莹. 2020. 多尺度残差通道注意机制下的人脸超分辨率网络. 计算机辅助设计与图形学学报, 32(6): 959-970) [DOI:10.3724/SP.J.1089.2020.17995]
-
Keys R. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(6): 1153-1160 [DOI:10.1109/TASSP.1981.1163711]
-
Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/cvpr.2016.182]
-
Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/cvpr.2016.181]
-
Lee W Y, Chuang P Y and Wang Y C F. 2019. Perceptual quality preserving image super-resolution via channel attention//Proceedings of ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Brighton, UK: IEEE: 1737-1741[DOI: 10.1109/icassp.2019.8683507]
-
Li Z, Yang J L, Liu Z, Yang X M, Jeon G and Wu W. 2019. Feedback network for image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3862-3871[DOI: 10.1109/CVPR.2019.00399]
-
Lim B, Son S, Kim H, Nah S and Lee K. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/cvprw.2017.151]
-
Liu D, Wen B H, Fan Y C, Loy C C and Huang T S. 2018. Non-local recurrent network for image restoration//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montral, Canada: Curran Associates: 1680-1689[DOI: 10.5555/3326943.3327097]
-
Ma W, Pan Z X, Yuan F, Lei B. 2019. Super-resolution of remote sensing images via a dense residual generative adversarial network. Remote Sensing, 11(21): #2578 [DOI:10.3390/rs11212578]
-
Niu B, Wen W L, Ren W Q, Zhang X D, Yang L P, Wang S Z, Zhang K H, Cao X C and Shen H F. 2020. Single image super-resolution via a holistic attention network//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 191-207[DOI: 10.1007/978-3-030-58610-2_12]
-
Sajjadi M S M, Schõlkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510[DOI: 10.1109/iccv.2017.481]
-
Shi W Z, Caballero J, Ledig C, Zhuang X H, Bai W J, Bhatia K, De Marvao A M S M, Dawes T, O'Regan D and Rueckert D. 2013. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch//Proceedings of the 16th International Conference on Medical Image Computing and Computer-Assisted Intervention. Nagoya, Japan: Springer: 9-16[DOI: 10.1007/978-3-642-40760-4_2]
-
Soh J W, Cho N I. 2020. Lightweight single image super-resolution with multi-scale spatial attention networks. IEEE Access, 8: 35383-35391 [DOI:10.1109/access.2020.2974876]
-
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31 st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates: 6000-6010[DOI: 10.5555/3295222.3295349]
-
Vu T, Nguyen C V, Pham T X, Luu T M and Yoo C D. 2018. Fast and efficient image quality enhancement via desubpixel convolutional neural networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 243-259[DOI: 10.1007/978-3-030-11021-5_16]
-
Wang H R, Fan Y, Wang Z X, Jiao L C and Schiele B. 2018a. Parameter-free spatial attention network for person re-identification[EB/OL]. [2020-09-13]. https://arxiv.org/pdf/1811.12150.pdf
-
Wang X L, Girshick R, Gupta A and He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[DOI: 10.1109/cvpr.2018.00813]
-
Wang X T, Yu K, Wu S X, Gu J J, Liu Y H, Dong C, Qiao Y and Loy C C. 2018c. ESRGAN: enhanced super-resolution generative adversarial networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 63-79[DOI: 10.1007/978-3-030-11021-5_5]
-
Woo S, Park J, Lee J Y and Kweon S I. 2018. CBAM: convolutional block attention module//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI: 10.1007/978-3-030-01234-2_1]
-
Xing X R, Zhang D W. 2019. Image super-resolution using aggregated residual transformation networks with spatial attention. IEEE Access, 7: 92572-92585 [DOI:10.1109/access.2019.2927238]
-
Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18]
-
Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018a. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[DOI: 10.1109/CVPR.2018.00262]