目的 近年来，卷积神经网络在解决图像超分辨率的问题上取得了巨大成功，不同结构的网络模型相继被提出。通过学习，这些网络模型对输入图像的特征进行抽象、组合，进而建立了从低分辨率的输入图像到高分辨率的目标图像的有效非线性映射。在该过程中，无论是图像的低阶像素级特征，还是高阶各层抽象特征，都对像素间相关性的挖掘起了重要作用，影响着目标高分辨图像的性能。而目前典型的超分辨率网络模型，如SRCNN、VSDR、LapSRN等，都未充分利用这些多层次的特征。方法 本文提出一种充分融合网络多阶特征的图像超分辨率算法：该模型基于递归神经网络，由相同的单元串联构成，单元间参数共享；在每个单元内部，从低阶到高阶的逐级特征被级联、融合，以获得更丰富的信息来强化网络的学习能力；在训练中，采用基于残差的策略，单元内使用局部残差学习，整体网络使用全局残差学习，以加快训练速度。结果 本文所提出的网络模型在通用4个测试集上，针对分辨率放大2倍、3倍、4倍的情况，与深层超分辨率网络VDSR相比，平均分别能够获得0.24dB、0.23dB、0.19dB的增益。结论 实验结果表明，所提出的递归式多阶特征融合图像超分辨率算法，有效提升了性能，特别是在细节非常丰富的Urban100数据集上，该算法对细节的处理效果尤为明显，图像客观质量与主观质量都得到显著改善。
Objective The recovery of a high resolution (HR) image or video from its low resolution (LR) counterpart, which is referred to as super resolution (SR), has attracted lots of studies in computer vision community. The SR problem is inherently ill-posed because the HR image or video actually does not exist. To address this issue, lots of methods have been proposed. There are some typical methods like bilinear or bicubic interpolation, Lanczos resampling, and internal patch recurrence. Recently, learning based methods, like sparse coding, random forest and convolutional neural networks (CNN), are exploited to create a mapping between LR and HR images. Particularly, the CNN based scheme has achieved great performance improvement. Different network models have been proposed, like SRCNN, VDSR, LapSRN, and DRRN. These models abstract and combine the features of LR image to establish an effective nonlinear mapping from LR input images to HR target images. In this process, both low-level and high-level features play a significant role in exploring the correlation between pixels as well as improving the performance of the restored HR images. However, in the typical SR network models mentioned above, features of previous layer are fed into the next layer directly, where the multi-level features are not fully utilized. Inspirited by the recent DenseNet, we propose to concatenate and fuse multi-level features from multi-layers. Although multi-level features are exploited in this way, the amount of parameters is huge, which costs long training time and large storage. Therefore, we further propose to employ recursive network architecture for parameter sharing. The overall model aims to developing an efficient CNN model, which can utilize the multi-level features of CNN to improve the SR performance whilst control the number of model parameters within an acceptable range. Method We propose an image super-resolution model that fully makes use of multi-level features. The proposed multi-feature fusion recursive network (MFRN) is based on recursive neural network, including the same units in series. The information of features is passed along the basic unit of MFRN, named as multi-feature fusion unit (MFU). Parameters are shared among these basic units, thus the required amount of parameters is reduced effectively. Within each MFU, the input status is obtained from the previous unit with continuous memory mechanism. Then, the features from low-level to high-level are concatenated and fused, thus to obtain abundant features to describe the image. Finally, the useful features are extracted and enhanced, which can describe the mapping relationship between LR and HR accurately. With regard to the training process, the residual learning strategy, including local residual learning inside each units and global residual learning through the whole network, is adopted to accelerate the training speed. Specifically, the global residual learning strategy is employed in the training of overall MFRN and the local residual learning is for MFU. By combining the strategies above, the training difficulty is reduced efficiently and the typical phenomenon like network degradation and gradient vanish can be avoided. In terms of the cost function, the averaged mean square error over the raining set is minimized. With the proposed cost function and training methods, we train single model for multiple scales. Result We use 291 pictures from public databases as the training set. In addition, data augmentation (rotation or flip) is used. Images with different scales (×2, ×3 and ×4) are all included in the training set. Therefore, only single model is trained for all different scales. During raining process, we adopt the adaptive learning rate and adjustable gradient clipping to boost the convergence rate while suppressing exploding gradients. We test 4 network models with different number of MFUs, which is corresponding to 29, 37, 53 and 81 layers, respectively. By comparing the convergence rate and performance, it is found that the network with 9 MFUs achieves the best performance. Hence we adopt 9 MFUs in the final CNN model. Although the proposed network is as deep as 37 layers, it elegantly converges at 230 epochs and obtains significant gains. The dominant evaluation criterions of image quality including PSNR, SSIM and IFC are employed for performance assessment of restored images. The experimental results show that the proposed model achieves averaged PSNR gains of 0.24dB, 0.23dB, and 0.19dB, respectively, compared to the very deep convolutional networks VDSR with the general 4 test sets for ×2, ×3, and ×4 resolution. Especially in the data set Urban100 that contains rich details, the proposed MFRN significantly improves the quality of restored images. In addition, the subjective quality of restored images is also illustrated. It can be seen that MFRN can produce relatively sharper edges than the other methods. Conclusion A multi-level feature fusion image super resolution algorithm based on recursive neural network, referred to as MFRN, is proposed in this paper. The MFRN consists of multiple multi-level feature fusion units. Several recursive units are stacked to learn the residual image between the HR and LR images. With the recursive learning scheme, parameters are shared among units, thus effectively reduces the number of network parameters. Within each unit, features of different levels are concatenated and fused to provide intensive description of the images. In this way, the proposed MFRN can extract and enhance useful feature adaptively, which leads to accurate mapping between LR and HR images. During the training procedure, we adopt a local residual learning inside each units and a global residual learning through the whole network. As a result, single model is trained for different scales. The experimental results show that the proposed MFRN greatly improves the performance. Especially in the Urban100 data set, MFRN outperforms up to 0.4dB PSNR gains compared to the classical VDSR model. Compared with the basic recursive network DRRN, up to 0.14dB PNSR improvement is obtained. With regard to the subjective quality, MFRN is specialized at handling the details of images. The visual perception of images is significantly improved.