全局注意力门控残差记忆网络的图像超分重建
Learning global attention-gated multi-scale memory residual networks for single-image super-resolution
- 2021年26卷第4期 页码:766-775
收稿:2020-05-18,
修回:2020-7-10,
录用:2020-7-17,
纸质出版:2021-04-16
DOI: 10.11834/jig.200174
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-05-18,
修回:2020-7-10,
录用:2020-7-17,
纸质出版:2021-04-16
移动端阅览
目的
2
随着深度卷积神经网络的兴起,图像超分重建算法在精度与速度方面均取得长足进展。然而,目前多数超分重建方法需要较深的网络才能取得良好性能,不仅训练难度大,而且到网络末端浅层特征信息容易丢失,难以充分捕获对超分重建起关键作用的高频细节信息。为此,本文融合多尺度特征充分挖掘超分重建所需的高频细节信息,提出了一种全局注意力门控残差记忆网络。
方法
2
在网络前端特征提取部分,利用单层卷积提取浅层特征信息。在网络主体非线性映射部分,级联一组递归的残差记忆模块,每个模块融合多个递归的多尺度残差单元和一个全局注意力门控模块来输出具备多层级信息的特征表征。在网络末端,并联多尺度特征并通过像素重组机制实现高质量的图像放大。
结果
2
本文分别在图像超分重建的5个基准测试数据集(Set5、Set14、B100、Urban100和Manga109)上进行评估,在评估指标峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)上相比当前先进的网络模型均获得更优性能,尤其在Manga109测试数据集上本文算法取得的PSNR结果达到39.19 dB,相比当前先进的轻量型算法AWSRN(adaptive weighted super-resolution network)提高0.32 dB。
结论
2
本文网络模型在对低分图像进行超分重建时,能够联合学习网络多层级、多尺度特征,充分挖掘图像高频信息,获得高质量的重建结果。
Objective
2
With the rapid development of deep convolutional neural networks (CNNs)
great progress has been made in the research of single-image super-resolution (SISR) in terms of accuracy and efficiency. However
existing methods often resort to deeper CNNs that are not only difficult to train
but also have limited feature resolutions to capture the rich high-frequency detail information that is essential for accurate SR prediction. To address these issues
this letter presents a global attention-gated multi-scale memory network (GAMMNet) for SISR.
Method
2
GAMMNet mainly consists of three key components: feature extraction
nonlinear mapping (NM)
and high-resolution (HR) reconstruction. In the feature extraction part
the input low-resolution (LR) image passes through a 3×3 convolution layer to learn the low-level features. Then
we utilize a recursive structure to perform NM to learn the deeper-layer features. At the end of the network
we arrange four kernels with different sizes followed by a global attention gate (GAG) to achieve the HR reconstruction. Specifically
the NM module consists of four recursive residual memory blocks (RMBs). Each RMB outputs a multi-level representation by fusing the output of the top multi-scale residual unit (MRU) and the ones from the previous MRUs
followed by a GAG module
which serves as a gate to control how much of the previous states should be memorized and decides how much of the current state should be kept. The details of how to design the two novel modules (MRU and GAG) will be explained as follows. MRU takes advantage of the wide activation architecture inspired by the wide activation super-resolution(WDSR) method. As the nonlinear ReLU(rectified linear units) layers in residual blocks hinder the transmission of information flow from shallow layers to deeper layers
the wide activation architecture increases the channels of feature map before ReLU layers to help transmit information flows. Moreover
as the depth of the network increases
problems such as underutilization of features and gradual disappearance of features during transmission occur. Making full use of features is the key to network reconstruction of high-quality images. We parallelly stack two different convolution kernels with sizes of 3×3 and 5×5 to extract multi-scale features
which are further enhanced by the proposed GAG module
yielding the residual output. Finally
the output of MRU is the addition of the input and the residual output. GAG serves as a gate to enhance the input feature channels with different weights. First
different from the residual channel attention network(RCAN) that mainly considers the correlation between feature channels
our GAG takes into account the useful spatial holistic statistic information of the feature maps. We utilize a special spatial pooling operation to improve the global average pooling used in the original channel attention to obtain global context features
and take a weighted average of all location features to establish a more efficient long-range dependency. Then
we aggregate the global context information on each location feature. We first leverage a 1×1 convolution to reduce the feature channel number to 1. Then
we use a Softmax function to capture the global context information for each pixel
yielding the pixel-wise attention weight map. Afterward
we introduce a learning parameter
λ
1
to adaptively rescale the weight map. Now the weight map is correlated with the input feature maps
outputting the channel-wise weight vector that encodes the global holistic statistic information of the feature maps. Second
in order to reduce the amount of calculations under the premise of establishing effective long-distance dependencies
we learn to use a bottleneck layer used in SE block to implement feature transform
which not only significantly reduces the amount of calculations but also captures the correlation between feature channels. We feed the channel-wise attention weight vector into a feature transform module that consists of one 1×1 convolution layer
one normalization layer
one ReLU layer
and one 1×1 convolution layer and multiply an adaptively learning parameter
λ
2
yielding the enhanced channel-wise attention weights that capture channel-wise dependencies. Finally
we channel-wisely multiply the input feature maps and the enhanced channel-wise attention weights to aggregate the global context information onto each local feature. In the image magnification stage
we design an efficient reconstruction structure to combine local multi-scale features and global features to achieve image magnification. We first leverage three different convolutions followed by a GAG to adaptively adjust the reconstruction feature weights
which make full use of the local multi-scale features of the reconstructed part. Then
a pixel-shuffle module is added behind each branch to perform image magnification. At last
all the reconstructed outputs are added together with the top network branch to combine local and global feature information
outputting the final SR image.
Result
2
We adopt the DIV2K(DIVerse 2K resolution image) dataset
the most widely used training dataset for deep CNN-based SISR
to train our model. This dataset contains 1 000 HR images
800 of which are used for training. We preprocess the HR images by bicubic down-sampling to obtain the LR images. Then
we use several commonly used benchmark datasets
including Set5
Set14
B100
Urban100
and Manga100 to test our model. The evaluation metrics are the peak signal-to-noise ratio (PSNR) and the structural similarity index measurement (SSIM) on the Y channel of the transformed YCbCr space. The input image patches are cropped with a size of 48×48 pixels
and the mini training batch size is set to 16. The hyperparameters of input
internal
and output channel numbers are set to 32
128
and 32 for the MRUs
respectively. We arrange four RMBs in the non-linear mapping module
among which each block has four MRUs. For the upscale module
we use four different kernel sizes (3×3
5×5
7×7
and 9×9)
followed by a GAG to generate the HR outputs. We compare our GAMMNet with several state-of-the-art deep CNN-based SISR methods
including super-resolution convolutional neural network(SRCNN)
deeply-recursive convolutional network(DRCN)
deep recursive residual network(DRRN)
memory network(MemNet)
cascading residual network(CARN)
multi-scale residual network(MSRN)
and adaptive weighted super-resolution network(AWSRN). Obviously
our GAMMNet achieves the best performance in terms of both PSNR and SSIM among all compared methods in almost all benchmark datasets
except for the SSIM on the Urban 100
where our GAMMNet achieves the second best performing SSIM of 0.792 6
which is slightly lower than the best AWSRN with SSIM of 0.793 0. Finally
we conduct ablation experiments on the important components of GAMMNet. We use 2×Set5 as the test set. The experiment first replaces the MRUs in GAMMNet with the smallest residual unit in WDSR and removes the GAG as the benchmark. Then
it adds the MRU and GAG and finally trains GAMMNet(simultaneously using the two proposed modules). The results show that the MRU and GAG modules improve PSNR by 0.1 and 0.08 points
respectively
and GAMMNet achieves the best performance on PSNR and SSIM
demonstrating the effectiveness of both module designs.
Conclusion
2
In this study
the shallow features of the network are first extracted by the feature extraction module. Then
a nested recursive structure is used to realize the NM and learn the deeper features. This structure combines the features of different scales to effectively learn the context information of the feature maps at each level
and solves the problem of feature disappearing during information transmission by fusing the output of different levels. Finally
in the reconstruction part
the features of different scales are parallelized
and pixel shuffle is used to achieve high-quality magnification of images.
Ahn N, Kang B and Sohn K A. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 256-272[ DOI: 10.1007/978-3-030-01249-6_16 http://dx.doi.org/10.1007/978-3-030-01249-6_16 ]
Bevilacqua M, Roumy A, Guillemot C and Alberi-Morel M L. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding//Proceedings of British Machine Vision Conference. Surrey, UK: BMVC: 221-231[ DOI: 10.5244/C.26.135 http://dx.doi.org/10.5244/C.26.135 ]
Cao Y, Xu J R, Lin S, Wei F Y and Hu H. 2019. GCNet: non-local networks meet squeeze-excitation networks and beyond//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea(South): IEEE: 1971-1980[ DOI: 10.1109/ICCVW.2019.00246 http://dx.doi.org/10.1109/ICCVW.2019.00246 ]
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[ DOI: 10.1007/978-3-319-10593-2_13 http://dx.doi.org/10.1007/978-3-319-10593-2_13 ]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI: 10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]
Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5197-5206[ DOI: 10.1109/CVPR.2015.7299156 http://dx.doi.org/10.1109/CVPR.2015.7299156 ]
Kim J, Lee J K and Lee K M. 2016. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[ DOI: 10.1109/CVPR.2016.181 http://dx.doi.org/10.1109/CVPR.2016.181 ]
Kingma D P and Ba J. 2015. Adam, the Netherlands: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. Amsterdam, theNetherlands: Amsterdam Machine Learning lab: 1-15
Li J C, Fang F M, Mei K F and Zhang G X. 2018. Multi-scale residual network for image super-resolution//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 527-542[ DOI: 10.1007/978-3-030-01237-3_32 http://dx.doi.org/10.1007/978-3-030-01237-3_32 ]
Li X G, Sun Y M, Yang Y L and Miao C Y. 2018. Image super-resolution reconstruction based on intermediate supervision convolutional neural networks. Journal of Image and Graphics, 23(7): 984-993
李现国, 孙叶美, 杨彦利, 苗长云. 2018. 基于中间层监督卷积神经网络的图像超分辨率重建. 中国图象图形学报, 23(7): 984-993 [DOI:10.11834/jig.170538]
Li Y X, Deng H P, Xiang S, Wu J and Zhu L. 2018. Depth map super-resolution reconstruction based on the texture edge-guided approach. Journal of Image and Graphics, 23(10): 1508-1517
李宇翔, 邓慧萍, 向森, 吴谨, 朱磊. 2018. 纹理边缘引导的深度图像超分辨率重建. 中国图象图形学报, 23(10): 1508-1517 [DOI:10.11834/jig.180127]
Martin D, Fowlkes C, Tal D and Malik J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, Canada: IEEE: 416-423[ DOI: 10.1109/ICCV.2001.937655 http://dx.doi.org/10.1109/ICCV.2001.937655 ]
Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T and Aizawa K. 2017. Sketch-based manga retrieval using Manga109 dataset. Multimedia Tools and Applications, 76(20): 21811-21838[DOI:10.1007/s11042-016-4020-z]
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z M, Desmaison A, Antiga L and Lerer A. 2017. Automatic differentiation in pytorch//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: NIPS: 1-4
Shen M Y, Yu P F, Wang R G, Yang J and Xue L X. 2019. Image super-resolution reconstruction via deep network based on multi-staged fusion. Journal of Image and Graphics, 24(8): 1258-1269
沈明玉, 俞鹏飞, 汪荣贵, 杨娟, 薛丽霞. 2019. 多阶段融合网络的图像超分辨率重建. 中国图象图形学报, 24(8): 1258-1269 [DOI:10.11834/jig.180619]
Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]
Song H H, Liu Q S, Wang G J, Hang R L and Huang B. 2018. Spatiotemporal satellite image fusion using deep convolutional neural networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(3): 821-829[DOI:10.1109/JSTARS.2018.2797894]
Tai Y, Yang J and Liu X M. 2017a. Image super-resolution via deep recursive residual network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2790-2798[ DOI: 10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]
Tai Y, Yang J, Liu X M and Xu C Y. 2017b. Memnet: a persistent memory network for image restoration//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4549-4557[ DOI: 10.1109/ICCV.2017.486 http://dx.doi.org/10.1109/ICCV.2017.486 ]
Timofte R, Agustsson E, Van Gool L, Yang M H, Zhang L, Lim B, Son S, Kim H, Nah S, Lee K M, Wang X T, Tian Y P, Yu K, Zhang S X, Dong C, Lin L, Qiao Y, Loy C C, Bae W, Yoo J, Han Y, Ye J C, Choi J S, Kim M, Fan Y C, Yu J H, Han W, Liu D, Yu H C, Wang Z Y, Shi H H, Wang X C, Huang T S, Chen Y J, Zhang K, Zuo W M, Tang Z M, Luo L K, Li S H, Fu M, Cao L, Heng W, Bui G, Le T, Duan Y, Tao D C, Wang R X, Lin X, Pang J X, Xu J C, Zhao Y, Xu X Y, Pan J S, Sun D Q, Zhang Y J, Song X B, Dai Y C, Qin X Y, Huynh X P, Guo T T, Mousavi H S, Vu T H, Monga V, Cruz C, Egiazarian K, Katkovnik V, Mehta R, Jain A K, Agarwalla A, Praveen C V S, Zhou R F, Wen H D, Zhu C, Xia Z Q, Wang Z T and Guo Q. 2017. NTIRE 2017 challenge on single image super-resolution: methods and results//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1110-1121[ DOI: 10.1109/CVPRW.2017.149 http://dx.doi.org/10.1109/CVPRW.2017.149 ]
Timofte R, Rothe R and Van Gool L. 2016. Seven ways to improve example-based single image super resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1865-1873[ DOI: 10.1109/CVPR.2016.206 http://dx.doi.org/10.1109/CVPR.2016.206 ]
Tong J C, Fei J L, Chen J S, Li H and Ding D D. 2019. Multi-level feature fusion image super-resolution algorithm with recursive neural network. Journal of Image and Graphics, 24(2): 302-312
佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹. 2019. 递归式多阶特征融合图像超分辨率算法. 中国图象图形学报, 24(2): 302-312 [DOI:10.11834/jig.180410]
Wang C F, Li Z and Shi J. 2019. Lightweight image super-resolution with adaptive weighted learning network[EB/OL]. [2020-04-18] . https://arxiv.org/pdf/1904.02358.pdf https://arxiv.org/pdf/1904.02358.pdf
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612[DOI:10.1109/TIP.2003.819861]
Yang J C, Wright J, Huang T S and Ma Y. 2010. Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11): 2861-2873[DOI:10.1109/TIP.2010.2050625]
Yu J, Fan Y, Yang J, Xu N, Wang Z, Wang X and Huang T. 2018. Wide activation for efficient and accurate image super-resolution. [EB/OL]. [2020-04-18] . https://arxiv.org/pdf/1808.08718v1.pdf https://arxiv.org/pdf/1808.08718v1.pdf
Zhang Y L, Li K P, Li K, Wang L C, Zhong B E and Fu Y. 2018. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 294-310[ DOI: 10.1007/978-3-030-01234-2_18 http://dx.doi.org/10.1007/978-3-030-01234-2_18 ]
Zou W W W and Yuen P C. 2012. Very low resolution face recognition problem. IEEE Transactions on Image Processing, 21(1): 327-340[DOI:10.1109/TIP.2011.2162423]
相关文章
相关作者
相关机构
京公网安备11010802024621