区域级通道注意力融合高频损失的图像超分辨率重建
Region-level channel attention for single image super-resolution combining high frequency loss
- 2021年26卷第12期 页码:2836-2847
纸质出版日期: 2021-12-16 ,
录用日期: 2020-12-16
DOI: 10.11834/jig.200582
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-12-16 ,
录用日期: 2020-12-16
移动端阅览
周波, 李成华, 陈伟. 区域级通道注意力融合高频损失的图像超分辨率重建[J]. 中国图象图形学报, 2021,26(12):2836-2847.
Bo Zhou, Chenghua Li, Wei Chen. Region-level channel attention for single image super-resolution combining high frequency loss[J]. Journal of Image and Graphics, 2021,26(12):2836-2847.
目的
2
通道注意力机制在图像超分辨率中已经得到了广泛应用,但是当前多数算法只能在通道层面选择感兴趣的特征图而忽略了空间层面的信息,使得特征图中局部空间层面上的信息不能合理利用。针对此问题,提出了区域级通道注意力下的图像超分辨率算法。
方法
2
设计了非局部残差密集网络作为网络的主体结构,包括非局部模块和残差密集注意力模块。非局部模块提取非局部相似信息并传到后续网络中,残差密集注意力模块在残差密集块结构的基础上添加了区域级通道注意力机制,可以给不同空间区域上的通道分配不同的注意力,使空间上的信息也能得到充分利用。同时针对当前普遍使用的L1和L2损失函数容易造成生成结果平滑的问题,提出了高频关注损失,该损失函数提高了图像高频细节位置上损失的权重,从而在后期微调过程中使网络更好地关注到图像的高频细节部分。
结果
2
在4个标准测试集Set5、Set14、BSD100(Berkeley segmentation dataset)和Urban100上进行4倍放大实验,相比较于插值方法和SRCNN(image super-resolution using deep convolutional networks)算法,本文方法的PSNR(peak signal to noise ratio)均值分别提升约3.15 dB和1.58 dB。
结论
2
区域级通道注意力下的图像超分辨率算法通过使用区域级通道注意力机制自适应调整网络对不同空间区域上通道的关注程度,同时结合高频关注损失加强对图像高频细节部分的关注程度,使生成的高分辨率图像具有更好的视觉效果。
Objective
2
As an important branch of image processing
image super-resolution has attracted extensive attention of many scholars. The attention mechanism has originally been applied to machine translation in deep learning. As an extension of the attention mechanism
the channel attention mechanism has been widely used in image super-resolution. A single image super-resolution using region-level channel attention has been proposed. A region-level channel attention mechanism has been presented in the network
which can assign different attention to different channels in different regions. Meanwhile
the high-frequency aware loss has been demonstrated with aiming at the characteristics that L1 and L2 losses commonly used at present tend to produce very smooth results. This loss function has strengthened the weight of losses at high-frequency positions to the generation of high-frequency details.
Method
2
The network structure has consisted of three parts: low-level feature extraction
high-level feature extraction and image reconstruction. In the low-level feature extraction part
the algorithm used 1 layer of 3×3 convolution. The high-level feature extraction part has contained a non-local module and several residual dense block attention modules. The non-local module has extracted the non-local similarity information via the non-local operation. Sub-pixel convolutional layer has been used before calculating non-local similar information. The calculation has been conducted at low resolution. Dense connection has been used in the residual dense block attention modules to facilitate the network adaptive accumulation of features of different layers. Meanwhile
residual learning has been used to further optimize the gradient propagation problem. Region-level channel attention mechanism has been introduced to pay attention to information in different regions adaptively. The initial non-local similarity information has been added to the last layer by skip connection. In the image reconstruction part
sub-pixel convolution has been used to up-sampling operation on features and a 3×3 convolutional layer has been used to obtain the final reconstruction result. In terms of loss function
the high-frequency aware loss has been operated for enhancing the network's ability of reconstructing high frequency details. Before training
the locations of high-frequency details in the image have been extracted. During training
more weight has been added to the losses at these locations to better learn the reconstruction process of high-frequency details. The whole training process has been divided into two stages. In the first stage
L1 loss has used to train the network. In the second stage
the high-frequency aware loss and L1 loss has used to fine-tune the model of the first stage together.
Result
2
Region-level channel attention and the high-frequency aware loss have been verified via ablation study. The model using the region-level channel attention is significantly better on peak signal to noise ratio (PSNR). The high-frequency aware loss and L1 loss together to fine-tune the model is better on PSNR than the model only use L1 loss to fine-tune. The good effect of the region-level channel attention and the high-frequency aware loss have been verified both at the same time. Set5
Set14
Berkeley segmentation dataset (BSD100) and Urban100 have been selected for testing in comparison with other algorithms. The comparison algorithms have included Bicubic
image super-resolution using deep convolutional networks (SRCNN)
accurate image super-resolution using very deep convolutional networks (VDSR)
image super-resolution using very deep residual channel attention networks (RCAN)
feedback network for image super-resolution (SRFBN) and single image super-resolution via a holistic attention network (HAN) respectively. On the subjective effect of the present
the results with a factor of 4
three of the results have been selected for display. The results generated by the algorithm have presented more rich in texture without any blurring or distortion. In the presentation of objective indicators
PSNR and structural similarity (SSIM) have been used as indicators to make a comprehensive comparison under three different factors of 2
3 and 4
respectively. PSNR of the model with amplification factor of 4 under four standard test sets is 32.51 dB
28.82 dB
27.72 dB and 26.66 dB
respectively.
Conclusion
2
A super-resolution algorithm using region-level channel attention mechanism has been commonly used channel attention in region-level. Based on the high-frequency aware loss
the network can reconstruct high frequency details by increasing the attention degree of the network to the high frequency detail location. The experimental results have shown that the proposed algorithm has its priority in objective indicators and subjective effects via using region-level channel attention mechanism and high-frequency aware loss.
深度学习卷积神经网络(CNN)超分辨率注意力机制非局部神经网络
deep learningconvolutional neural network(CNN)super-resolutionattention mechanismnon-local neural network
Anwar S and Barnes N. 2020. Densely residual Laplacian super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99: 1-12[DOI: 10.1109/TPAMI.2020.3021088]
Cao F L and Liu H. 2019. Single image super-resolution via multi-scale residual channel attention network. Neurocomputing, 358: 424-436[DOI: 10.1016/j.neucom.2019.05.066]
Dong C, Loy C C, He K M and Tang X O. 2015. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307[DOI: 10.1109/TPAMI.2015.2439281]
Duan C Y and Xiao N F. 2019. Parallax-based spatial and channel attention for stereo image super-resolution. IEEE Access, 7: 183672-183679[DOI: 10.1109/ACCESS.2019.2960561]
Fritsche M, Gu S H and Timofte R. 2019. Frequency separation for real-world super-resolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3599-3608[DOI: 10.1109/iccvw.2019.00445http://dx.doi.org/10.1109/iccvw.2019.00445]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680[DOI: 10.5555/2969033.2969125http://dx.doi.org/10.5555/2969033.2969125]
He K M, Zhang X Y, Ren S and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034[DOI: 10.1109/iccv.2015.123http://dx.doi.org/10.1109/iccv.2015.123]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/cvpr.2018.00745http://dx.doi.org/10.1109/cvpr.2018.00745]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/cvpr.2017.243http://dx.doi.org/10.1109/cvpr.2017.243]
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR: 448-456[DOI: 10.5555/3045118.3045167http://dx.doi.org/10.5555/3045118.3045167]
Jin W and Chen Y. 2020. Multi-scale residual channel attention network for face super-resolution. Journal of Computer-Aided Design and Computer Graphics, 32(6): 959-970
金炜, 陈莹. 2020. 多尺度残差通道注意机制下的人脸超分辨率网络. 计算机辅助设计与图形学学报, 32(6): 959-970[DOI: 10.3724/SP.J.1089.2020.17995]
Keys R. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(6): 1153-1160[DOI: 10.1109/TASSP.1981.1163711]
Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/cvpr.2016.182http://dx.doi.org/10.1109/cvpr.2016.182]
Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/cvpr.2016.181http://dx.doi.org/10.1109/cvpr.2016.181]
Lee W Y, Chuang P Y and Wang Y C F. 2019. Perceptual quality preserving image super-resolution via channel attention//Proceedings of ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Brighton, UK: IEEE: 1737-1741[DOI: 10.1109/icassp.2019.8683507http://dx.doi.org/10.1109/icassp.2019.8683507]
Li Z, Yang J L, Liu Z, Yang X M, Jeon G and Wu W. 2019. Feedback network for image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3862-3871[DOI: 10.1109/CVPR.2019.00399http://dx.doi.org/10.1109/CVPR.2019.00399]
Lim B, Son S, Kim H, Nah S and Lee K. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/cvprw.2017.151http://dx.doi.org/10.1109/cvprw.2017.151]
Liu D, Wen B H, Fan Y C, Loy C C and Huang T S. 2018. Non-local recurrent network for image restoration//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montral, Canada: Curran Associates: 1680-1689[DOI: 10.5555/3326943.3327097http://dx.doi.org/10.5555/3326943.3327097]
Ma W, Pan Z X, Yuan F and Lei B. 2019. Super-resolution of remote sensing images via a dense residual generative adversarial network. Remote Sensing, 11(21): #2578[DOI: 10.3390/rs11212578]
Niu B, Wen W L, Ren W Q, Zhang X D, Yang L P, Wang S Z, Zhang K H, Cao X C and Shen H F. 2020. Single image super-resolution via a holistic attention network//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 191-207[DOI: 10.1007/978-3-030-58610-2_12http://dx.doi.org/10.1007/978-3-030-58610-2_12]
Sajjadi MS M, Schõlkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510[DOI: 10.1109/iccv.2017.481http://dx.doi.org/10.1109/iccv.2017.481]
Shi W Z, Caballero J, Ledig C, Zhuang X H, Bai W J, Bhatia K, De Marvao A M S M, Dawes T, O'Regan D and Rueckert D. 2013. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch//Proceedings of the 16th International Conference on Medical Image Computing and Computer-Assisted Intervention. Nagoya, Japan: Springer: 9-16[DOI: 10.1007/978-3-642-40760-4_2http://dx.doi.org/10.1007/978-3-642-40760-4_2]
Soh J W and Cho N I. 2020. Lightweight single image super-resolution with multi-scale spatial attention networks. IEEE Access, 8: 35383-35391[DOI: 10.1109/access.2020.2974876]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, KaiserŁand Polosukhin I. 2017. Attention is all you need//Proceedings of the 31 stInternational Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates: 6000-6010[DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]
Vu T, Nguyen C V, Pham T X, Luu T M and Yoo C D. 2018. Fast and efficient image quality enhancement via desubpixel convolutional neural networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 243-259[DOI: 10.1007/978-3-030-11021-5_16http://dx.doi.org/10.1007/978-3-030-11021-5_16]
Wang H R, Fan Y, Wang Z X, Jiao L C and Schiele B. 2018a. Parameter-free spatial attention network for person re-identification[EB/OL]. [2020-09-13].https://arxiv.org/pdf/1811.12150.pdfhttps://arxiv.org/pdf/1811.12150.pdf
Wang X L, Girshick R, Gupta A and He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[DOI: 10.1109/cvpr.2018.00813http://dx.doi.org/10.1109/cvpr.2018.00813]
Wang X T, Yu K, Wu S X, Gu J J, Liu Y H, Dong C, Qiao Y and Loy C C. 2018c. ESRGAN: enhanced super-resolution generative adversarial networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 63-79[DOI: 10.1007/978-3-030-11021-5_5http://dx.doi.org/10.1007/978-3-030-11021-5_5]
Woo S, Park J, Lee J Y and Kweon S I. 2018. CBAM: convolutional block attention module//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer:3-19[DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xing X R and Zhang D W. 2019. Image super-resolution using aggregated residual transformation networks with spatial attention. IEEE Access, 7: 92572-92585[DOI: 10.1109/access.2019.2927238]
Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18]
Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018a. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[DOI: 10.1109/CVPR.2018.00262http://dx.doi.org/10.1109/CVPR.2018.00262]
相关作者
相关机构