分层特征融合注意力网络图像超分辨率重建
Hierarchical feature fusion attention network for image super-resolution reconstruction
- 2020年25卷第9期 页码:1773-1786
纸质出版日期: 2020-09-16 ,
录用日期: 2020-03-27
DOI: 10.11834/jig.190607
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2020-09-16 ,
录用日期: 2020-03-27
移动端阅览
雷鹏程, 刘丛, 唐坚刚, 彭敦陆. 分层特征融合注意力网络图像超分辨率重建[J]. 中国图象图形学报, 2020,25(9):1773-1786.
Pengcheng Lei, Cong Liu, Jiangang Tang, Dunlu Peng. Hierarchical feature fusion attention network for image super-resolution reconstruction[J]. Journal of Image and Graphics, 2020,25(9):1773-1786.
目的
2
深层卷积神经网络在单幅图像超分辨率任务中取得了巨大成功。从3个卷积层的超分辨率重建卷积神经网络(super-resolution convolutional neural network,SRCNN)到超过300层的残差注意力网络(residual channel attention network,RCAN),网络的深度和整体性能有了显著提高。然而,尽管深层网络方法提高了重建图像的质量,但因计算量大、实时性差等问题并不适合真实场景。针对该问题,本文提出轻量级的层次特征融合空间注意力网络来快速重建图像的高频细节。
方法
2
网络由浅层特征提取层、分层特征融合层、上采样层和重建层组成。浅层特征提取层使用1个卷积层提取浅层特征,并对特征通道进行扩充;分层特征融合层由局部特征融合和全局特征融合组成,整个网络包含9个残差注意力块(residual attention block,RAB),每3个构成一个残差注意力组,分别在组内和组间进行局部特征融合和全局特征融合。在每个残差注意力块内部,首先使用卷积层提取特征,再使用空间注意力模块对特征图的不同空间位置分配不同的权重,提高高频区域特征的注意力,以快速恢复高频细节信息;上采样层使用亚像素卷积对特征图进行上采样,将特征图放大到目标图像的尺寸;重建层使用1个卷积层进行重建,得到重建后的高分辨率图像。
结果
2
在Set5、Set14、BSD(Berkeley segmentation dataset)100、Urban100和Manga109测试数据集上进行测试。当放大因子为4时,峰值信噪比分别为31.98 dB、28.40 dB、27.45 dB、25.77 dB和29.37 dB。本文算法比其他同等规模的网络在测试结果上有明显提升。
结论
2
本文提出的多层特征融合注意力网络,通过结合空间注意力模块和分层特征融合结构的优势,可以快速恢复图像的高频细节并且具有较小的计算复杂度。
Objective
2
Single-image super-resolution (SISR) techniques aim to reconstruct a high-resolution image from a single low-resolution image. Given that high-resolution images contain substantial useful information
SISR technology has been widely used in medical imaging
face authentication
public relations
security monitoring
and other tasks. With the rapid development of deep learning
the convolution neural network (CNN)-based SISR method has achieved remarkable success in the field of SISR. From super-resolution CNN (SRCNN) to residual channel attention network (RCAN)
the depth and the performance of the network have considerably improved. However
some problems need to be improved. 1) Increasing the depth of a network can improve reconstruction performance effectively; however
it also increases the calculation complexity of the network and leads to a poor real-time performance. 2) An image contains a large amount of high- and low-frequency information. The area with high-frequency information should be more important than the area with low-frequency information. However
most recent CNN-based methods treat these two areas equally and thus lack flexibility. 3) Feature maps at different depths carry different receptive field information with different scales. Integrating these feature maps can enhance the information flow of different convolution layers. Most current CNN-based methods only consider feature maps with a single scale. To solve these problems
we propose a lightweight hierarchical feature fusion spatial attention network to learn additional useful high-frequency information.
Method
2
The proposed network is mainly composed of four parts
namely
the shallow feature extraction
hierarchical feature fusion
up-sampling
and reconstruction parts. In the shallow feature extraction part
a convolution layer is used to extract the shallow feature and expand the number of channels. The hierarchical feature fusion part comprises nine residual attention blocks
which are evenly divided into three residual attention groups
each of which contains three residual attention blocks. The feature maps at different depths are fused by using local and global feature fusion strategies. On the one hand
the local feature fusion strategy is used to fuse the feature maps obtained by the three residual attention blocks in each residual attention group. On the other hand
the global feature fusion strategy is used to fuse the feature maps obtained by three residual attention groups. The two feature fusion strategies can integrate feature maps with different scales to enhance the information flow of different depths in the network. This study focuses on the residual attention block
which is composed of a residual block module and a spatial attention module. In each residual attention block
two 3×3 convolution layers are first used to extract several feature maps
and then a spatial attention module is used to assign different weights to different spatial positions for different feature maps. The core problem is how to obtain the appropriate weight set. According to our analysis
pooling along the channel axis can effectively highlight the importance of the areas with high-frequency information. Hence
we first apply average and maximum pooling along the channel axis to generate two representative feature descriptors. Afterward
a 5×5 and a 1×1 convolution layer are used to fuse the information in each position with its neighbor positions. The spatial attention value of each position is finally obtained by using a sigmoid function. The third part is the up-sampling part
which uses subpixel convolution to upsample the low-resolution (LR) feature maps and obtain a large-scale feature map. Lastly
in the reconstruction part
the number of channels is compressed to the target number by using a 3×3 convolution layer
thus obtaining a reconstructed high-resolution image. During the training stage
a DIVerse 2K(DIV2K) dateset is used to train the proposed network
and 32 000 image patches with a size of 48×48 pixels are obtained as LR images by random cropping. L1 loss is used as the loss function in our network; this function is optimized using the Adam algorithm.
Result
2
We compare our network with some traditional methods
such as bicubic interpolation
SRCNN
very deep super-resolution convolutional networks (VDSR)
deep recursive residual networks (DRRN)
residual dense networks (RDN)
and RCAN. Five datasets
including Set5
Set14
Berkeley segmentation dataset(BSD)100
Urban100
and Manga109
are used as testsets to show the performance of the proposed method. Two indices
including peak signal-to-noise ratio (PSNR) and structural similarity (SSIM)
are used to evaluate the reconstruction results of the proposed method and the other methods used for comparison. The average PSNR and SSIM values are obtained from the results of different methods on the five test datasets with different scale factors. Four test images with different scales are used to show the reconstruction results from using different methods. In addition
the proposed method is compared with enhanced deep residual networks (EDSR) in the convergence curve. Experiments show that the proposed method can recover more detailed information and clearer edges compared with most of the compared methods.
Conclusion
2
We propose a hierarchical feature fusion attention network in this study. Such network can quickly recover high-frequency details with the help of the spatial attention module and the hierarchical feature fusion structure
thus obtaining reconstructed results that have a more detailed texture.
超分辨率重建卷积神经网络分层特征融合残差学习注意力机制
super-resolution reconstructionconvolution neural network (CNN)hierarchical feature fusionresidual learningattention mechanism
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI:10.1007/978-3-319-10593-2http://dx.doi.org/10.1007/978-3-319-10593-2]
Dong C, Loy C C and Tang X O. 2016. Accelerating the super-resolution convolutional neural network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 391-407[DOI:10.1007/978-3-319-46475-6_25http://dx.doi.org/10.1007/978-3-319-46475-6_25]
Fang B W, Huang Z Q, Li Y and Wang Y. 2017. υ-support vector machine based on discriminant sparse neighborhood preserving embedding. Pattern Analysis and Applications, 20(4):1077-1089[DOI:10.1007/s10044-016-0547-x]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[DOI:10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2019. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence: 2011-2023[DOI:10.1109/TPAMI.2019.2913372http://dx.doi.org/10.1109/TPAMI.2019.2913372]
Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 5197-5206[DOI:10.1109/CVPR.2015.7299156http://dx.doi.org/10.1109/CVPR.2015.7299156]
Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1646-1654[DOI:10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182]
Li J C, Fang F M, Mei K F and Zhang G X. 2018. Multi-scale residual network for image super-resolution//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer: 527-542[DOI:10.1007/978-3-030-01237-3_32http://dx.doi.org/10.1007/978-3-030-01237-3_32]
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE: 1132-1140[DOI:10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 2117-2125[DOI:10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Martin D, Fowlkes C, Tal D and Malik J. 2002. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver: IEEE: 416-423[DOI:10.1109/ICCV.2001.937655http://dx.doi.org/10.1109/ICCV.2001.937655]
Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T and Aizawa K. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76(20):21811-21838[DOI:10.1007/s11042-016-4020-z]
Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1874-1883[DOI:10.1109/CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207]
Tai Y, Yang J and Liu X M. 2017. Image super-resolution via deep recursive residual network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 2790-2798[DOI:10.1109/CVPR.2017.298http://dx.doi.org/10.1109/CVPR.2017.298]
Yang J C, Wang Z W, Lin Z, Cohen S and Huang T. 2012. Coupled dictionary training for image super-resolution. IEEE Transactions on Image Processing, 21(8):3467-3478[DOI:10.1109/TIP.2012.2192127]
Yang X, Zhang Y, Zhou D and Yang R G. 2015. An improved iterative back projection algorithm based on ringing artifacts suppression. Neurocomputing, 162:171-179[DOI:10.1016/j.neucom.2015.03.055]
Ying Z L and Long X. 2019. Single-image super-resolution construction based on multi-scale dense residual network. Journal of Image and Graphics, 24(3):410-419
应自炉, 龙祥. 2019.多尺度密集残差网络的单幅图像超分辨率重建.中国图象图形学报, 24(3):410-419)[DOI:10.11834/jig.180431]
Zeyde R, Elad M and Protter M. 2010. On single image scale-up using sparse-representations//Proceedings of the 7th International Conference on Curves and Surfaces. Avignon: Springer: 711-730[DOI:10.1007/978-3-642-27413-8_47http://dx.doi.org/10.1007/978-3-642-27413-8_47]
Zhang Y L, Li K P, Li K, Wang L C, Zhong B E and Fu Y. 2018a. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer: 2472-2481
Zhang Y L, Tian Y P, Kong Y, Zhong B E and Fu Y. 2018b. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 294-310[DOI:10.1109/CVPR.2018.00262http://dx.doi.org/10.1109/CVPR.2018.00262]
相关作者
相关机构