多阶段融合网络的图像超分辨率重建

沈明玉; 俞鹏飞; 汪荣贵; 杨娟; 薛丽霞

doi:10.11834/jig.180619

图像处理和编码 | 浏览量 : 0 下载量: 118 CSCD: 9

PDF
导出
分享
收藏
专辑

多阶段融合网络的图像超分辨率重建
Image super-resolution reconstruction via deep network based on multi-staged fusion
2019年24卷第8期页码：1258-1269
收稿：2018-11-07，

修回：2018-12-25，

纸质出版：2019-08-16
DOI： 10.11834/jig.180619
稿件说明：

移动端阅览

沈明玉, 俞鹏飞, 汪荣贵, 杨娟, 薛丽霞. 多阶段融合网络的图像超分辨率重建[J]. 中国图象图形学报, 2019,24(8):1258-1269. DOI： 10.11834/jig.180619.

Mingyu Shen, Pengfei Yu, Ronggui Wang, Juan Yang, Lixia Xue. Image super-resolution reconstruction via deep network based on multi-staged fusion[J]. Journal of Image and Graphics, 2019, 24(8): 1258-1269. DOI： 10.11834/jig.180619.

摘要

目的

近年来，深度卷积神经网络成为单帧图像超分辨率重建任务中的研究热点。针对多数网络结构均是采用链式堆叠方式使得网络层间联系弱以及分层特征不能充分利用等问题，提出了多阶段融合网络的图像超分辨重建方法，进一步提高重建质量。

方法

首先利用特征提取网络得到图像的低频特征，并将其作为两个子网络的输入，其一通过编码网络得到低分辨率图像的结构特征信息，其二通过阶段特征融合单元组成的多路径前馈网络得到高频特征，其中融合单元将网络连续几层的特征进行融合处理并以自适应的方式获得有效特征。然后利用多路径连接的方式连接不同的特征融合单元以增强融合单元之间的联系，提取更多的有效特征，同时提高分层特征的利用率。最后将两个子网络得到的特征进行融合后，利用残差学习完成高分辨图像的重建。

结果

在4个基准测试集Set5、Set14、B100和Urban100上进行实验，其中放大规模为4时，峰值信噪比分别为31.69 dB、28.24 dB、27.39 dB和25.46 dB，相比其他方法的结果具有一定提升。

结论

本文提出的网络克服了链式结构的弊端，通过充分利用分层特征提取更多的高频信息，同时利用低分辨率图像本身携带的结构特征信息共同完成重建，并取得了较好的重建效果。

Abstract

Objective

Image super-resolution is an important branch of digital image processing and computer vision. This method has been widely used in video surveillance

medical imaging

and security and surveillance imaging in recent years. Super-resolution aims to reconstruct a high-resolution image from an observed degraded low-resolution one. Early methods include interpolation

neighborhood embedding

and sparse coding. Deep convolutional neural network has recently become a major research topic in the field of single image super-resolution reconstruction. This network can learn the mapping between high-and low-resolution images better than traditional learning-based methods. However

many deep learning-based methods present two evident drawbacks. First

most methods use chained stacking to create the network. Each layer of the network is only related to its previous layer

leading to weak inter-layer relationships. Second

the hierarchical features of the network are partially utilized. These shortcomings can lead to loss of high frequency components. A novel image super-resolution reconstruction method based on multi-staged fusion network is proposed to address these drawbacks. This method is used to improve the quality of image reconstruction.

Method

Numerous studies have shown that feature re-usage can improve the capability of the network to extract and express features. Thus

our research is based on the idea of feature re-usage. We implemented this idea through the multipath connection

which includes two forms

namely

global multipath mode and local fusion unit. First

the proposed model uses an interpolated low-resolution image as input. The feature extraction network extracts shallow features as the mixture network's input. Mixture network consists of two parts. The first one is pixel encoding network

which is used to obtain structural feature information of the image. This network presents four weight layers

each consisting of 64 filters with a size of 1×1

which can guarantee that the feature map distribution will be protected. This process is similar to those of encoding and decoding pixels. The other one is multi-path feedforward network

which is used to extract the high-frequency components needed for reconstruction. This network is formed by staged feature fusion units connected by multi-path mode. Each fusion unit is composed of dense connection

residual learning

and feature selection layers. The dense connection layer is composed of four weight layers with 32 filters with a size of 3×3. This layer is used to improve the nonlinear mapping capability of the network and extract substantial high frequency information. The residual learning layer contains a 1×1 weight layer to alleviate the vanishing gradient problem. Feature selection layer uses a 1×1 weight layer to obtain effective features. Then

the multi-path mode is used to connect different units

which could enhance the relationship between the fusion units. This mode extracts substantial effective features and increases the utilization of hierarchical features. Both sub-networks output 64 feature-maps

fusing their output features as input of reconstructed network that includes a 1×1 weight layer. Therefore

the final residual image between low-and high-resolution images can be obtained. Finally

the reconstructed image can be obtained by combining the original low-resolution and residual images. In the training process

we select the rectified linear unit as the activation function to accelerate the training process and avoid gradient vanishing. For a weight layer with a filter size of 3×3

we pad one pixel to ensure that all feature-maps have the same size

which can improve the edge information of the reconstructed image. Furthermore

the initial learning rate is set to 0.1 and then decreased to half every 10 epochs

which can accelerate network convergence. We set mini-batch size of SGD and momentum parameter to 0.9. We use 291 images as the training set. In addition

we used data augmentation (rotation 90°

180°

270°

and vertical flip) to augment the training set

which could avoid the overfitting problems and increase sample diversity. The network is trained with multiple scale factors (×2

×3

and×4) to ensure that it could be used to solve the reconstruction problem of different scale factors.

Result

All experiments are implemented under the PyTorch framework. We use four common benchmark sets (Set5

Set14

B100

and Urban100) to evaluate our model. Moreover

we use peak signal-to-noise ratio as evaluation criteria. The images of RGB space are converted to YCbCr space. The proposed algorithm only reconstructs the luminance channel Y because human vision is highly sensitive to the luminance channel. The Cb and Cr channels are reconstructed by using the interpolation method. Experimental results on four benchmark sets for scaling factor of four are 31.69 dB

28.24 dB

27.39 dB

and 25.46 dB

respectively. The proposed method shows better performance and visual effects than Bicubic

A+

SRCNN

VDSR

DRCN

and DRRN. In addition

we have validated the effectiveness of the proposed components

which includes multipath mode

staged fusion unit

and pixel coding network.

Conclusion

The proposed network overcomes the shortcoming of the chain structure and extracts substantial high-frequency information by fully utilizing the hierarchical features. Moreover

such network simultaneously uses the structural feature information carried by the low-resolution image to complete the reconstruction together. Furthermore

techniques that include dense connection and residual learning are adopted to accelerate convergence and mitigate gradient problems during training. Extensive experiments show that the proposed method can reconstruct an image with more high-frequency details than other methods with the same preprocessing step. We will consider using the idea of recursive learning and increasing the number of training samples to optimize the model further in the subsequent work.

关键词

Keywords

references

Li X, Orchard M T. New edge-directed interpolation[J]. IEEE Transactions on Image Processing, 2001, 10(10):1521-1527.[DOI:10.1109/83.951537]

Zhang L, Wu X L. An edge-guided image interpolation algorithm via directional filtering and data fusion[J]. IEEE Transactions on Image Processing, 2006, 15(8):2226-2238.[DOI:10.1109/TIP.2006.877407]

Dai S Y, Han M, Xu W, et al. SoftCuts:a soft edge smoothness prior for color image super-resolution[J]. IEEE Transactions on Image Processing, 2009, 18(5):969-981.[DOI:10.1109/TIP.2009.2012908]

Sun J, Xu Z B, Shum H Y. Image super-resolution using gradient profile prior[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[ DOI: 10.1109/CVPR.2008.4587659 http://dx.doi.org/10.1109/CVPR.2008.4587659 ]

Yang J C, Wright J, Huang T, et al. Image super-resolution as sparse representation of raw image patches[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[ DOI: 10.1109/CVPR.2008.4587647 http://dx.doi.org/10.1109/CVPR.2008.4587647 ]

Yang J C, Wright J, Huang T S, et al. Image super-resolution via sparse representation[J]. IEEE Transactions on Image Processing, 2010, 19(11):2861-2873.[DOI:10.1109/TIP.2010.2050625]

Yang J C, Wang Z W, Lin Z, et al. Coupled dictionary training for image super-resolution[J]. IEEE Transactions on Image Processing, 2012, 21(8):3467-3478.[DOI:10.1109/TIP.2012.2192127]

Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5197-5206.[ DOI: 10.1109/CVPR.2015.7299156 http://dx.doi.org/10.1109/CVPR.2015.7299156 ]

Timofte R, De V, van Gool L. Anchored neighborhood regression for fast example-based super-resolution[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2014: 1920-1927.[ DOI: 10.1109/ICCV.2013.241 http://dx.doi.org/10.1109/ICCV.2013.241 ]

Timofte R, de Smet V, van Gool L. A+: adjusted anchored neighborhood regression for fast super-resolution[M]//Cremers D, Reid I, Saito H, et al. Computer Vision-ACCV 2014. Cham: Springer, 2014: 111-126.[ DOI: 10.1007/978-3-319-16817-3_8 http://dx.doi.org/10.1007/978-3-319-16817-3_8 ]

Dong C, Loy C C, He K M, et al. Learning a deep convolutional network for image super-resolution[C]//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 184-199.[ DOI: 10.1007/978-3-319-10593-2_13 http://dx.doi.org/10.1007/978-3-319-10593-2_13 ]

Dong C, Loy C C, He K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2):295-307.[DOI:10.1109/TPAMI.2015.2439281]

Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 391-407.[ DOI: 10.1007/978-3-319-46475-6_25 http://dx.doi.org/10.1007/978-3-319-46475-6_25 ]

Shi W Z, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1874-1883.[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]

Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654.[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]

Kim J, Lee J K, Lee K M. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1637-1645.[ DOI: 10.1109/CVPR.2016.181 http://dx.doi.org/10.1109/CVPR.2016.181 ]

Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 2790-2798.[ DOI: 10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]

Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2016.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

He K M, Zhang X Y,Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2015: 770-778.[ DOI: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90 ]

Qian N. On the momentum term in gradient descent learning algorithms[J]. Neural Networks, 1999, 12(1):145-151.[DOI:10.1016/s0893-6080(98)00116-6]

Martin D, Fowlkes C, Tal D, et al. A Database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver BC, Canada: IEEE, 2002: 416-423.[ DOI: 10.1109/iccv.2001.937655 http://dx.doi.org/10.1109/iccv.2001.937655 ]

Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//Proceedings of British Machine Vision Conference. Guildford, UK: BMVA Press, 2012.[ DOI: 10.5244/c.26.135 http://dx.doi.org/10.5244/c.26.135 ]

Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//Proceedings of the 7th International Conference on Curves and Surfaces. Avignon, France: Springer, 2010: 711-730.[ DOI: 10.1007/978-3-642-27413-8_47 http://dx.doi.org/10.1007/978-3-642-27413-8_47 ]

Timofte R, Rothe R, van Gool L. Seven ways to improve example-based single image super resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016.[ DOI: 10.1109/CVPR.2016.206 http://dx.doi.org/10.1109/CVPR.2016.206 ]

He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1026-1034.[ DOI: 10.1109/ICCV.2015.123 http://dx.doi.org/10.1109/ICCV.2015.123 ]