多尺度特征复用混合注意力网络的图像重建
Multiscale feature reuse mixed attention network for image reconstruction
- 2021年26卷第11期 页码:2645-2658
收稿:2020-09-15,
修回:2020-12-21,
录用:2020-12-28,
纸质出版:2021-11-16
DOI: 10.11834/jig.200549
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-09-15,
修回:2020-12-21,
录用:2020-12-28,
纸质出版:2021-11-16
移动端阅览
目的
2
针对以往基于深度学习的图像超分辨率重建方法单纯加深网络、上采样信息损失和高频信息重建困难等问题,提出一种基于多尺度特征复用混合注意力网络模型用于图像的超分辨率重建。
方法
2
网络主要由预处理模块、多尺度特征复用混合注意力模块、上采样模块、补偿重建模块和重建模块5部分组成。第1部分是预处理模块,该模块使用一个卷积层来提取浅层特征和扩张特征图的通道数。第2部分是多尺度特征复用混合注意力模块,该模块加入了多路网路、混合注意力机制和长短跳连接,以此来进一步扩大特征图的感受野、提高多尺度特征的复用和加强高频信息的重建。第3部分是上采样模块,该模块使用亚像素方法将特征图上采样到目标图像尺寸。第4部分是补偿重建模块,该模块由卷积层和混合注意力机制组成,用来对经过上采样的特征图进行特征补偿和稳定模型训练。第5部分是重建模块,该模块由一个卷积层组成,用来将特征图的通道数恢复至原来数量,以此得到重建后的高分辨率图像。
结果
2
在同等规模模型的比较中,以峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似度(structural similarity index measure,SSIM)作为评价指标来评价算法性能,在Set5、Set14、BSD100(Berkeley segmentation dataset)和Urban100的基准测试集上进行测试。当缩放尺度因子为3时,各测试集上的PSNR/SSIM依次为34.40 dB/0.927 3,30.35 dB/0.842 7,29.11 dB/0.805 2和28.23 dB/0.854 0,相比其他模型有一定提升。
结论
2
量化和视觉的实验结果表明,本文模型重建得到的高分辨率图像不仅在重建边缘和纹理信息有很好的改善,而且在PSNR和SSIM客观评价指标上也有一定的提高。
Objective
2
Obtaining a high-resolution image directly is very difficult due to the interference of the external environment and hardware conditions. A low-resolution image is usually obtained at first
and then one or more image super-resolution methods are employed to obtain the corresponding high-resolution image. In addition
the number of collected images is large. Therefore
how to reconstruct a high-resolution image from a low-resolution image at a low cost has become a research hotspot in the field of computer vision. This research widely exists in the fields of medicine
remote sensing
and public safety. In recent years
many image super-resolution methods have been proposed
and these techniques can be broadly categorized into interpolation-
projection-
and learning-based methods. Among these methods
the convolutional neural network
a typical approach of the learning-based method
has attracted more attention in recent years but still has several problems. First
the reconstruction effect is often improved by simply deepening the network
which will make the network very complex and increase the difficulty of the training. Second
the high-frequency information in an image is difficult to reconstruct. The attention mechanism has been applied to overcome this problem
but the existing attention mechanisms are usually directly quoted from many high-level vision tasks
without considering the particularity of the super-resolution reconstruction tasks. Third
the existing upsampling methods have several limitations such as feature loss and training oscillations
which are difficult to solve in the field of super-resolution reconstruction. To address these problems
this paper proposes a mixed attention network model based on multiscale feature reuse for super-resolution reconstruction. The model improves the performance of the network by using several novelty strategies including multipath network
long and short hop connections
compensation reconstruction block
and mixed attention mechanism.
Method
2
The proposed network is mainly composed of five parts: the preprocessing module
the multiscale feature reuse mixed attention module
the upsampling module
the compensation reconstruction module
and the reconstruction module. The first part is the preprocessing module
which uses a convolutional layer to extract shallow features and expand the number of channels in the feature map. The second part is the multiscale feature reuse mixed attention module. This part contains three important subparts including a multichannel network
a mixed attention mechanism
and the jump connections. The multichannel network can increase the receptive fields of different feature maps and improve the reuse of multiscale features. The mixed attention mechanism can better capture the high-frequency information
and the jump connections can reduce the degradation problem of deep network and improve the learning ability. Moreover
the interdependence between shallow features and deep features can be learned by using the depth method and the widening method. The third part is the upsampling module
which uses the subpixel method to upsample the feature map to the target size. The shallow and deep features are upsampled simultaneously and fused to compensate the feature loss caused by the upsampling operation. The fourth part is the compensation reconstruction module
which is composed of a convolutional layer and a mixed attention module. This part is used to perform the secondary feature compensation and stabilize the model training on the feature maps obtained through upsampling. The fifth part is the reconstruction module
which uses a convolutional layer to expand the number of channels of the feature map to the original number to obtain the reconstructed high-resolution image. In the training phase
the DIV2K(DIVerse 2K) dataset is taken as the training set
and each image is processed by several enhancement methods such as random rotation and horizontal flip. Adaptive momentum estimation(ADAM) is used as the optimizer
and
L
1
is used as the objective function. Each run uses 800 epochs.
Result
2
The proposed method is compared with several current state-of-the-art methods including super-resolution convolutional neural network(SRCNN)
super-resolution using very deep convolutional networks(VDSR)
deep Laplacian pyramid super-resolution networks(LapSRN)
memory network for image restoration(MemNet)
super-resolution network for multiple degradations(SRMDNF)
cascading residual network(CARN)
multi-path adaptive modulation network(MAMNet)
and the simplified version of residual channel attention network (RCAN-mini). Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are applied to evaluate the performance of these algorithms on widely used benchmark testsets such as Set5
Set14
BSD100(Berkeley segmentation dataset)
and Urban100. When the scale factor is 3
the PSNR/SSIM values obtained by this model on each testsets are 34.40 dB/0.927 3
30.35 dB/0.842 7
29.11 dB/0.805 2
and 28.23 dB/0.854 0 in order. In terms of PSNR index
compared with RCAN-mini
it is increased by 0.15 dB
0.08 dB
0.07 dB
and 0.24 dB on four testsets. Compared with other methods
the reconstruction results are also improved.
Conclusion
2
A multiscale feature reuse mixed attention network
which applies a new network structure and an attention mechanism to improve the performance of super-resolution
is proposed. This model is compared with other methods by quantization experiment and visual experiment. Experiment results show that the proposed method can achieve the best reconstruction effect on the edge and texture information and can obtain higher values on the evaluation indicators of PSNR and SSIM than other methods.
Ahn N, Kang B and Sohn K A. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 256-272[ DOI: 10.1007/978-3-030-01249-6_16 http://dx.doi.org/10.1007/978-3-030-01249-6_16 ]
Bevilacqua M, Roumy A, Guillemot C and Morel M L A. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding//Proceedings of British Machine Vision Conference. Surrey, UK: BMVA: #135[ DOI: 10.5244/C.26.135 http://dx.doi.org/10.5244/C.26.135 ]
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807[ DOI: 10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Corbetta M and Shulman G L. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3): 201-215[DOI:10.1038/nrn755]
Dai T, Cai J R, Zhang Y B, Xia S T and Zhang L. 2019. Second-order attention network for single image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11057-11066[ DOI: 10.1109/CVPR.2019.01132 http://dx.doi.org/10.1109/CVPR.2019.01132 ]
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[ DOI: 10.1007/978-3-319-10593-2_13 http://dx.doi.org/10.1007/978-3-319-10593-2_13 ]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023[DOI:10.1109/TPAMI.2019.2913372]
Hu Y T, Li J, Huang Y F and Gao X B. 2019. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 30(11): 3911-3927[DOI:10.1109/TCSVT.2019.2915238]
Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5197-5206[ DOI: 10.1109/CVPR.2015.7299156 http://dx.doi.org/10.1109/CVPR.2015.7299156 ]
Jin Z, Iqbal M Z, Bobkov D, Zou W B, Li X and Steinbach E. 2020. A flexible deep CNN framework for image restoration. IEEE Transactions on Multimedia, 22(4): 1055-1068[DOI:10.1109/TMM.2019.2938340]
Kim J, Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]
Kim J H, Choi J H, Cheon M and Lee J S. 2018. MAMNet: multi-path adaptive modulation network for image super-resolution[EB/OL]. [2020-08-31] . https://arxiv.org/pdf/1811.12043.pdf https://arxiv.org/pdf/1811.12043.pdf
Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5835-5843[ DOI: 10.1109/CVPR.2017.618 http://dx.doi.org/10.1109/CVPR.2017.618 ]
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 105-114[ DOI: 10.1109/CVPR.2017.19 http://dx.doi.org/10.1109/CVPR.2017.19 ]
Li Z Z. 2019. Image super-resolution using attention based densenet with residual deconvolution[EB/OL]. [2020-08-31] . https://arxiv.org/pdf/1907.05282.pdf https://arxiv.org/pdf/1907.05282.pdf
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[ DOI: 10.1109/CVPRW.2017.151 http://dx.doi.org/10.1109/CVPRW.2017.151 ]
Liu D, Wen B H, Fan Y C, Loy C C and Huang T S. 2018. Non-local recurrent network for image restoration//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: NeurIPS: 1680-1689[ DOI: 10.5555/3326943.3327097 http://dx.doi.org/10.5555/3326943.3327097 ]
Martin D, Fowlkes C, Tal D and Malik J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, Canada: IEEE: 416-423[ DOI: 10.1109/ICCV.2001.937655 http://dx.doi.org/10.1109/ICCV.2001.937655 ]
Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on ComputerVision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Tai Y, Yang J, Liu X M and Xu C Y. 2017. MemNet: a persistent memory network for image restoration//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4549-4557[ DOI: 10.1109/ICCV.2017.486 http://dx.doi.org/10.1109/ICCV.2017.486 ]
Timofte R, Gu S H, Wu J Q, van Gool L, Zhang L, Yang M H, Haris M, Shakhnarovich G, Ukita N, Hu S J, Bei Y J, Hui Z, Jiang X, Gu Y N, Liu J, Wang Y F, Perazzi F, Mcwilliams B, Sorkine-Hornung A, Sorkine-Hornung O, Schroers C, Yu J H, Fan Y C, Yang J C, Xu N, Wang Z W, Wang X C, Huang T S, Wang X T, Yu K, Hui T W, Dong C, Lin L, Loy C C, Park D, Kim K, Chun S Y, Zhang K, Liu P, Zuo W M, Guo S, Liu J Y, Xu J C, Liu Y J, Xiong F Y, Dong Y, Bai H L, Damian A, Ravi N, Menon S, Rudin C, Seo J, Jeon T, Koo J, Jeon S, Kim S Y, Choi J S, Ki S, Seo S, Sim H, Kim S, Kim M, Chen R, Zeng K, Guo J K, Qu Y Y, Li C H, Ahn N, Kang B, Sohn K A, Yuan Y, Zhang J W, Pang J H, Xu X Y, Zhao Y, Deng W, Hussain S U, Aadil M, Rahim R, Cai X W, Huang F, Xu Y S, Michelini P N, Zhu D, Liu H W, Kim J H, Lee J S, Huang Y W, Qiu M, Jing L T, Zeng J H, Wang Y, Sharma M, Mukhopadhyay R, Upadhyay A, Koundinya S, Shukla A, Chaudhury S, Zhang Z, Hu Y H and Fu L Z. 2018. NTIRE 2018 challenge on single image super-resolution: methods and results//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 965.1-965.11[ DOI: 10.1109/CVPRW.2018.00130 http://dx.doi.org/10.1109/CVPRW.2018.00130 ]
Tong T, Li G, Liu X J and Gao Q Q. 2017. Image super-resolution using dense skip connections//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4809-4817[ DOI: 10.1109/ICCV.2017.514 http://dx.doi.org/10.1109/ICCV.2017.514 ]
Tu X G, Zhang H S, Xie M, Luo Y, Zhang Y F and Ma Z. 2019. Deep transfer across domains for face antispoofing. Journal of Electronic Imaging, 28(4): #043001[DOI:10.1117/1.JEI.28.4.043001]
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458[ DOI: 10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[ DOI: 10.1109/CVPR.2018.00813 http://dx.doi.org/10.1109/CVPR.2018.00813 ]
Wang Y F, Wang L J, Wang H Y and Li P H. 2019. Resolution-aware network for image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 29(5): 1259-1269[DOI:10.1109/TCSVT.2018.2839879]
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612[DOI:10.1109/TIP.2003.819861]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[ DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Zeyde R, Elad M and Protter M. 2010. On single image scale-up using sparse-representations//Proceedings of the 7th International Conference on Curves and Surfaces. Avignon, France: Springer: 711-730[ DOI: 10.1007/978-3-642-27413-8_47 http://dx.doi.org/10.1007/978-3-642-27413-8_47 ]
Zhang K, Zuo W M and Zhang L. 2018a. Learning a single convolutional super-resolution network for multiple degradations//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3262-3271[ DOI: 10.1109/CVPR.2018.00344 http://dx.doi.org/10.1109/CVPR.2018.00344 ]
Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018c. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference onComputer Vision. Munich, Germany: Springer: 294-310[ DOI: 10.1007/978-3-030-01234-2_18 http://dx.doi.org/10.1007/978-3-030-01234-2_18 ]
Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018b. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[ DOI: 10.1109/CVPR.2018.00262 http://dx.doi.org/10.1109/CVPR.2018.00262 ]
相关作者
相关机构
京公网安备11010802024621