递归式多阶特征融合图像超分辨率算法

佟骏超; 费加罗; 陈靖森; 李恒; 丁丹丹

doi:10.11834/jig.180410

ChinaMM 2018 | 浏览量 : 0 下载量: 4 CSCD: 6

PDF
导出
分享
收藏
专辑

递归式多阶特征融合图像超分辨率算法
Multi-level feature fusion image super-resolution algorithm with recursive neural network
2019年24卷第2期页码：302-312
收稿：2018-06-22，

修回：2018-7-19，

纸质出版：2019-02-16
DOI： 10.11834/jig.180410
稿件说明：

移动端阅览

佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹. 递归式多阶特征融合图像超分辨率算法[J]. 中国图象图形学报, 2019,24(2):302-312. DOI： 10.11834/jig.180410.

Junchao Tong, Jialuo Fei, Jingsen Chen, Heng Li, Dandan Ding. Multi-level feature fusion image super-resolution algorithm with recursive neural network[J]. Journal of Image and Graphics, 2019, 24(2): 302-312. DOI： 10.11834/jig.180410.

摘要

目的

近年来，卷积神经网络在解决图像超分辨率的问题上取得了巨大成功，不同结构的网络模型相继被提出。通过学习，这些网络模型对输入图像的特征进行抽象、组合，进而建立了从低分辨率的输入图像到高分辨率的目标图像的有效非线性映射。在该过程中，无论是图像的低阶像素级特征，还是高阶各层抽象特征，都对像素间相关性的挖掘起了重要作用，影响着目标高分辨图像的性能。而目前典型的超分辨率网络模型，如SRCNN（super-resolution convolutional neural network）、VDSR（very deep convolutional networks for super-resolution）、LapSRN（Laplacian pyramid super-resolution networks）等，都未充分利用这些多层次的特征。

方法

提出一种充分融合网络多阶特征的图像超分辨率算法：该模型基于递归神经网络，由相同的单元串联构成，单元间参数共享；在每个单元内部，从低阶到高阶的逐级特征被级联、融合，以获得更丰富的信息来强化网络的学习能力；在训练中，采用基于残差的策略，单元内使用局部残差学习，整体网络使用全局残差学习，以加快训练速度。

结果

所提出的网络模型在通用4个测试集上，针对分辨率放大2倍、3倍、4倍的情况，与深层超分辨率网络VDSR相比，平均分别能够获得0.24 dB、0.23 dB、0.19 dB的增益。

结论

实验结果表明，所提出的递归式多阶特征融合图像超分辨率算法，有效提升了性能，特别是在细节非常丰富的Urban100数据集上，该算法对细节的处理效果尤为明显，图像的客观质量与主观质量都得到显著改善。

Abstract

Objective

The recovery of a high-resolution (HR) image or video from its low-resolution (LR) counterpart

which is referred to as super resolution (SR)

has attracted considerable attention in computer vision community. The SR problem is inherently ill-posed because the HR image or video actually does not exist. Several methods have been proposed to address this issue. Several typical methods

such as bilinear or bicubic interpolation

Lanczos resampling

and internal patch recurrence

have been used. Recently

learning-based methods

such as sparse coding

random forest

and convolutional neural networks (CNNs)

have been utilized to create a mapping between LR and HR images. Particularly

the CNN-based scheme has achieved remarkable performance improvement. Different network models

such as SRCNN

VDSR

LapSRN

and DRRN

have been proposed. These models abstract and combine the features of LR image to establish an effective nonlinear mapping from LR input images to HR target images. In this process

low- and high-level features play an important role in determining the correlation between pixels and in improving the performance of restored HR images. However

the features of previous layer in the aforementioned typical SR network models are directly fed in the next layer

where multi-level features are incompletely utilized. Inspired by the recent DenseNet

we concatenate and fuse multi-level features from multilayers. Although multi-level features are utilized in this manner

the number of parameters is large

which costs long training time and large storage. Therefore

we employ a recursive network architecture for parameter sharing. The overall model develops an efficient CNN model that can utilize the multi-level features of CNN to improve the SR performance and can control the number of model parameters within an acceptable range.

Method

We propose an image SR model that utilizes multi-level features. The proposed multi-feature fusion recursive network (MFRN) is based on recursive neural network with the same units in series. The information of features is passed along the basic unit of MFRN

named as the multi-feature fusion unit (MFU). The parameters are shared among these basic units

and the required number of parameters is effectively reduced. The input status within each MFU is obtained from the previous unit with continuous memory mechanism. Then

the low-level to high-level features are concatenated and fused to obtain abundant features in describing the image. Valuable features are extracted and enhanced

which can accurately describe the mapping relationship between LR and HR. With regard to the training process

a residual learning strategy

which involves local residual learning inside each unit and global residual learning through the entire network

is adopted to accelerate the training speed. Specifically

a global residual learning strategy is employed in the training of the overall MFRN

and a local residual learning strategy is applied for MFU. The training difficulty is efficiently reduced

and typical phenomena

such as network degradation and vanishing gradient

can be avoided by combining the aforementioned strategies. In terms of the cost function

the averaged mean square error over the training set is minimized. We train a single model for multiple scales based on the proposed cost function and training methods.

Result

We use 291 pictures from public databases as the training set. In addition

data augmentation (rotation or flip) is applied. Images with different scales (×2

×3

and ×4) are included in the training set. Therefore

only a single model is trained for all different scales. During the training process

we adopt an adaptive learning rate and an adjustable gradient clipping to accelerate the convergence rate while suppressing exploding gradients. We evaluate four network models with different numbers of MFUs

which correspond to 29

and 81 layers. The network with nine MFUs achieves the best performance by comparing the convergence rate and performance. Hence

we adopt nine MFUs in the final CNN model. Although the proposed network has 37 layers

it elegantly converges at 230 epochs and obtains remarkable gains. The dominant evaluation criteria of image quality

such as PSNR

SSIM

and IFC

are employed for the performance assessment of restored images. Experimental results show that the proposed model achieves average PSNR gains of 0.24

0.23

and 0.19 dB compared with the very deep convolutional networks for super-resoluton(VDSR) with the general four test sets for ×2

×3

and ×4 resolutions. Specifically

the proposed MFRN considerably improves the quality of restored images in the dataset Urban100 that contains rich details. In addition

the subjective quality of restored images is illustrated. The MFRN can produce relatively sharper edges than that of other methods.

Conclusion

A multilevel feature fusion image SR algorithm based on recursive neural network

referred to as MFRN

is proposed in this study. The MFRN consists of multiple MFUs. Several recursive units are stacked to learn the residual image between the HR and LR images. The parameters with the recursive learning scheme are shared among the units

thereby effectively reducing the number of network parameters. The features of different levels within each unit are concatenated and fused to provide intensive description of the images. In this way

the proposed MFRN can extract and adaptively enhance valuable features

which leads to accurate mapping between LR and HR images. During the training procedure

we adopt a local residual learning inside each unit and a global residual learning through the entire network. Thus

a single model is trained for different scales. Experimental results show that the proposed MFRN considerably improves the performance. Specifically

in the Urban100 dataset

MFRN achieves 0.4 dB PSNR gains compared with the classical VDSR model. In comparison with the basic recursive network DRRN

0.14 dB PNSR improvement is obtained. With regard to the subjective quality

MFRN is focused on handling the details of images. The visual perception of images is remarkably improved.

关键词

Keywords

references

Shi W Z, Caballero J, Ledig C, et al. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch[C]//Proceedings of the 16th International Conference. Berlin Heidelberg: Springer, 2013: 9-16.[ DOI: 10.1007/978-3-642-40760-4_2 http://dx.doi.org/10.1007/978-3-642-40760-4_2 ]

Thornton M W, Atkinson P M, Holland D A. Sub-pixel mapping of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swapping[J]. International Journal of Remote Sensing, 2006, 27(3):473-491.[DOI:10.1080/01431160500207088]

Wilman W W Z, Yuen P C. Very low resolution face recognition problem[C]//Proceedings of 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems. Washington, DC, USA: IEEE, 2010: 1-6.[ DOI: 10.1109/BTAS.2010.5634490 http://dx.doi.org/10.1109/BTAS.2010.5634490 ]

Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5197-5206.[ DOI: 10.1109/CVPR.2015.7299156 http://dx.doi.org/10.1109/CVPR.2015.7299156 ]

Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3791-3799.[ DOI: 10.1109/CVPR.2015.7299003 http://dx.doi.org/10.1109/CVPR.2015.7299003 ]

Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Proceedings of 2015 European Conference on Computer Vision. Cham: Springer, 2016: 391-407.[ DOI: 10.1007/978-3-319-46475-6_25 http://dx.doi.org/10.1007/978-3-319-46475-6_25 ]

He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2015: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654.[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]

Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C] //Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, BC, Canada: IEEE, 2002: 416-423.[ DOI: 10.1109/ICCV.2001.937655 http://dx.doi.org/10.1109/ICCV.2001.937655 ]

Shi W Z, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1874-1883.[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]

Lai W S, Huang J B, Ahuja N, et al. Deep laplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.

Kim J, Lee J K, Lee K M. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1637-1645.[ DOI: 10.1109/CVPR.2016.181 http://dx.doi.org/10.1109/CVPR.2016.181 ]

Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//Proceedings of 2012 British Machine Vision Conference. Surrey: BMVC, 2012. http://eprints.imtlucca.it/2412/ .

Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//Proceedings of the 7th International Conference. Berlin, Heidelberg: Springer-Verlag, 2012: 711-730.[ DOI: 10.1007/978-3-642-27413-8_47 http://dx.doi.org/10.1007/978-3-642-27413-8_47 ]

Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017: 2790-2798.[ DOI: 10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]

Zhang Y L, Tian Y P, Kong, et al. Residual dense network for image super-resolution[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake: IEEE, 2018: 2472-2481. https://www.researchgate.net/publication/323410292_Residual_Dense_Network_for_Image_Super-Resolution .

Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Yang J C, Wright J, Huang T S, et al. Image super-resolution via sparse representation.[J]. IEEE Transactions on Image Processing, 2010, 19(11):2861-2873.[DOI:10.1109/TIP.2010.2050625]

He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1026-1034.[ DOI: 10.1109/ICCV.2015.123 http://dx.doi.org/10.1109/ICCV.2015.123 ]

Sheikh H R, Bovik A C, de Veciana G. An information fidelity criterion for image quality assessment using natural scene statistics[J]. IEEE Transactions on Image Processing, 2005, 14(12):2117-2128.[DOI:10.1109/TIP.2005.859389]

Yang C Y, Ma C, Yang M H. Single-image super-resolution: a benchmark[C]//Proceedings of 2014 European Conference on Computer Vision. Cham: Springer, 2014: 372-386.[ DOI: 10.1007/978-3-319-10593-2_25 http://dx.doi.org/10.1007/978-3-319-10593-2_25 ]

Pérez-Pellitero E, Salvador J, Ruiz-Hidalgo J, et al. PsyCo: manifold span reduction for super resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1837-1845.[ DOI: 10.1109/CVPR.2016.203 http://dx.doi.org/10.1109/CVPR.2016.203 ]

Tong T, Li G, Liu X J, et al. Image super-resolution using dense skip connections[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 4809-4817.[ DOI: 10.1109/ICCV.2017.514 http://dx.doi.org/10.1109/ICCV.2017.514 ]