递归式多阶特征融合图像超分辨率算法
Multi-level feature fusion image super-resolution algorithm with recursive neural network
- 2019年24卷第2期 页码:302-312
收稿:2018-06-22,
修回:2018-7-19,
纸质出版:2019-02-16
DOI: 10.11834/jig.180410
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-06-22,
修回:2018-7-19,
纸质出版:2019-02-16
移动端阅览
目的
2
近年来,卷积神经网络在解决图像超分辨率的问题上取得了巨大成功,不同结构的网络模型相继被提出。通过学习,这些网络模型对输入图像的特征进行抽象、组合,进而建立了从低分辨率的输入图像到高分辨率的目标图像的有效非线性映射。在该过程中,无论是图像的低阶像素级特征,还是高阶各层抽象特征,都对像素间相关性的挖掘起了重要作用,影响着目标高分辨图像的性能。而目前典型的超分辨率网络模型,如SRCNN(super-resolution convolutional neural network)、VDSR(very deep convolutional networks for super-resolution)、LapSRN(Laplacian pyramid super-resolution networks)等,都未充分利用这些多层次的特征。
方法
2
提出一种充分融合网络多阶特征的图像超分辨率算法:该模型基于递归神经网络,由相同的单元串联构成,单元间参数共享;在每个单元内部,从低阶到高阶的逐级特征被级联、融合,以获得更丰富的信息来强化网络的学习能力;在训练中,采用基于残差的策略,单元内使用局部残差学习,整体网络使用全局残差学习,以加快训练速度。
结果
2
所提出的网络模型在通用4个测试集上,针对分辨率放大2倍、3倍、4倍的情况,与深层超分辨率网络VDSR相比,平均分别能够获得0.24 dB、0.23 dB、0.19 dB的增益。
结论
2
实验结果表明,所提出的递归式多阶特征融合图像超分辨率算法,有效提升了性能,特别是在细节非常丰富的Urban100数据集上,该算法对细节的处理效果尤为明显,图像的客观质量与主观质量都得到显著改善。
Objective
2
The recovery of a high-resolution (HR) image or video from its low-resolution (LR) counterpart
which is referred to as super resolution (SR)
has attracted considerable attention in computer vision community. The SR problem is inherently ill-posed because the HR image or video actually does not exist. Several methods have been proposed to address this issue. Several typical methods
such as bilinear or bicubic interpolation
Lanczos resampling
and internal patch recurrence
have been used. Recently
learning-based methods
such as sparse coding
random forest
and convolutional neural networks (CNNs)
have been utilized to create a mapping between LR and HR images. Particularly
the CNN-based scheme has achieved remarkable performance improvement. Different network models
such as SRCNN
VDSR
LapSRN
and DRRN
have been proposed. These models abstract and combine the features of LR image to establish an effective nonlinear mapping from LR input images to HR target images. In this process
low- and high-level features play an important role in determining the correlation between pixels and in improving the performance of restored HR images. However
the features of previous layer in the aforementioned typical SR network models are directly fed in the next layer
where multi-level features are incompletely utilized. Inspired by the recent DenseNet
we concatenate and fuse multi-level features from multilayers. Although multi-level features are utilized in this manner
the number of parameters is large
which costs long training time and large storage. Therefore
we employ a recursive network architecture for parameter sharing. The overall model develops an efficient CNN model that can utilize the multi-level features of CNN to improve the SR performance and can control the number of model parameters within an acceptable range.
Method
2
We propose an image SR model that utilizes multi-level features. The proposed multi-feature fusion recursive network (MFRN) is based on recursive neural network with the same units in series. The information of features is passed along the basic unit of MFRN
named as the multi-feature fusion unit (MFU). The parameters are shared among these basic units
and the required number of parameters is effectively reduced. The input status within each MFU is obtained from the previous unit with continuous memory mechanism. Then
the low-level to high-level features are concatenated and fused to obtain abundant features in describing the image. Valuable features are extracted and enhanced
which can accurately describe the mapping relationship between LR and HR. With regard to the training process
a residual learning strategy
which involves local residual learning inside each unit and global residual learning through the entire network
is adopted to accelerate the training speed. Specifically
a global residual learning strategy is employed in the training of the overall MFRN
and a local residual learning strategy is applied for MFU. The training difficulty is efficiently reduced
and typical phenomena
such as network degradation and vanishing gradient
can be avoided by combining the aforementioned strategies. In terms of the cost function
the averaged mean square error over the training set is minimized. We train a single model for multiple scales based on the proposed cost function and training methods.
Result
2
We use 291 pictures from public databases as the training set. In addition
data augmentation (rotation or flip) is applied. Images with different scales (×2
×3
and ×4) are included in the training set. Therefore
only a single model is trained for all different scales. During the training process
we adopt an adaptive learning rate and an adjustable gradient clipping to accelerate the convergence rate while suppressing exploding gradients. We evaluate four network models with different numbers of MFUs
which correspond to 29
37
53
and 81 layers. The network with nine MFUs achieves the best performance by comparing the convergence rate and performance. Hence
we adopt nine MFUs in the final CNN model. Although the proposed network has 37 layers
it elegantly converges at 230 epochs and obtains remarkable gains. The dominant evaluation criteria of image quality
such as PSNR
SSIM
and IFC
are employed for the performance assessment of restored images. Experimental results show that the proposed model achieves average PSNR gains of 0.24
0.23
and 0.19 dB compared with the very deep convolutional networks for super-resoluton(VDSR) with the general four test sets for ×2
×3
and ×4 resolutions. Specifically
the proposed MFRN considerably improves the quality of restored images in the dataset Urban100 that contains rich details. In addition
the subjective quality of restored images is illustrated. The MFRN can produce relatively sharper edges than that of other methods.
Conclusion
2
A multilevel feature fusion image SR algorithm based on recursive neural network
referred to as MFRN
is proposed in this study. The MFRN consists of multiple MFUs. Several recursive units are stacked to learn the residual image between the HR and LR images. The parameters with the recursive learning scheme are shared among the units
thereby effectively reducing the number of network parameters. The features of different levels within each unit are concatenated and fused to provide intensive description of the images. In this way
the proposed MFRN can extract and adaptively enhance valuable features
which leads to accurate mapping between LR and HR images. During the training procedure
we adopt a local residual learning inside each unit and a global residual learning through the entire network. Thus
a single model is trained for different scales. Experimental results show that the proposed MFRN considerably improves the performance. Specifically
in the Urban100 dataset
MFRN achieves 0.4 dB PSNR gains compared with the classical VDSR model. In comparison with the basic recursive network DRRN
0.14 dB PNSR improvement is obtained. With regard to the subjective quality
MFRN is focused on handling the details of images. The visual perception of images is remarkably improved.
Shi W Z, Caballero J, Ledig C, et al. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch[C]//Proceedings of the 16th International Conference. Berlin Heidelberg: Springer, 2013: 9-16.[ DOI: 10.1007/978-3-642-40760-4_2 http://dx.doi.org/10.1007/978-3-642-40760-4_2 ]
Thornton M W, Atkinson P M, Holland D A. Sub-pixel mapping of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swapping[J]. International Journal of Remote Sensing, 2006, 27(3):473-491.[DOI:10.1080/01431160500207088]
Wilman W W Z, Yuen P C. Very low resolution face recognition problem[C]//Proceedings of 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems. Washington, DC, USA: IEEE, 2010: 1-6.[ DOI: 10.1109/BTAS.2010.5634490 http://dx.doi.org/10.1109/BTAS.2010.5634490 ]
Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5197-5206.[ DOI: 10.1109/CVPR.2015.7299156 http://dx.doi.org/10.1109/CVPR.2015.7299156 ]
Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3791-3799.[ DOI: 10.1109/CVPR.2015.7299003 http://dx.doi.org/10.1109/CVPR.2015.7299003 ]
Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Proceedings of 2015 European Conference on Computer Vision. Cham: Springer, 2016: 391-407.[ DOI: 10.1007/978-3-319-46475-6_25 http://dx.doi.org/10.1007/978-3-319-46475-6_25 ]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2015: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654.[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]
Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C] //Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, BC, Canada: IEEE, 2002: 416-423.[ DOI: 10.1109/ICCV.2001.937655 http://dx.doi.org/10.1109/ICCV.2001.937655 ]
Shi W Z, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1874-1883.[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]
Lai W S, Huang J B, Ahuja N, et al. Deep laplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.
Kim J, Lee J K, Lee K M. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1637-1645.[ DOI: 10.1109/CVPR.2016.181 http://dx.doi.org/10.1109/CVPR.2016.181 ]
Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//Proceedings of 2012 British Machine Vision Conference. Surrey: BMVC, 2012. http://eprints.imtlucca.it/2412/ .
Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//Proceedings of the 7th International Conference. Berlin, Heidelberg: Springer-Verlag, 2012: 711-730.[ DOI: 10.1007/978-3-642-27413-8_47 http://dx.doi.org/10.1007/978-3-642-27413-8_47 ]
Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017: 2790-2798.[ DOI: 10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]
Zhang Y L, Tian Y P, Kong, et al. Residual dense network for image super-resolution[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake: IEEE, 2018: 2472-2481. https://www.researchgate.net/publication/323410292_Residual_Dense_Network_for_Image_Super-Resolution .
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Yang J C, Wright J, Huang T S, et al. Image super-resolution via sparse representation.[J]. IEEE Transactions on Image Processing, 2010, 19(11):2861-2873.[DOI:10.1109/TIP.2010.2050625]
He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1026-1034.[ DOI: 10.1109/ICCV.2015.123 http://dx.doi.org/10.1109/ICCV.2015.123 ]
Sheikh H R, Bovik A C, de Veciana G. An information fidelity criterion for image quality assessment using natural scene statistics[J]. IEEE Transactions on Image Processing, 2005, 14(12):2117-2128.[DOI:10.1109/TIP.2005.859389]
Yang C Y, Ma C, Yang M H. Single-image super-resolution: a benchmark[C]//Proceedings of 2014 European Conference on Computer Vision. Cham: Springer, 2014: 372-386.[ DOI: 10.1007/978-3-319-10593-2_25 http://dx.doi.org/10.1007/978-3-319-10593-2_25 ]
Pérez-Pellitero E, Salvador J, Ruiz-Hidalgo J, et al. PsyCo: manifold span reduction for super resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1837-1845.[ DOI: 10.1109/CVPR.2016.203 http://dx.doi.org/10.1109/CVPR.2016.203 ]
Tong T, Li G, Liu X J, et al. Image super-resolution using dense skip connections[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 4809-4817.[ DOI: 10.1109/ICCV.2017.514 http://dx.doi.org/10.1109/ICCV.2017.514 ]
相关作者
相关机构
京公网安备11010802024621