Current Issue Cover
递归式多阶特征融合图像超分辨率算法

佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹(杭州师范大学信息科学与工程学院, 杭州 311121)

摘 要
目的 近年来,卷积神经网络在解决图像超分辨率的问题上取得了巨大成功,不同结构的网络模型相继被提出。通过学习,这些网络模型对输入图像的特征进行抽象、组合,进而建立了从低分辨率的输入图像到高分辨率的目标图像的有效非线性映射。在该过程中,无论是图像的低阶像素级特征,还是高阶各层抽象特征,都对像素间相关性的挖掘起了重要作用,影响着目标高分辨图像的性能。而目前典型的超分辨率网络模型,如SRCNN(super-resolution convolutional neural network)、VDSR(very deep convolutional networks for super-resolution)、LapSRN(Laplacian pyramid super-resolution networks)等,都未充分利用这些多层次的特征。方法 提出一种充分融合网络多阶特征的图像超分辨率算法:该模型基于递归神经网络,由相同的单元串联构成,单元间参数共享;在每个单元内部,从低阶到高阶的逐级特征被级联、融合,以获得更丰富的信息来强化网络的学习能力;在训练中,采用基于残差的策略,单元内使用局部残差学习,整体网络使用全局残差学习,以加快训练速度。结果 所提出的网络模型在通用4个测试集上,针对分辨率放大2倍、3倍、4倍的情况,与深层超分辨率网络VDSR相比,平均分别能够获得0.24 dB、0.23 dB、0.19 dB的增益。结论 实验结果表明,所提出的递归式多阶特征融合图像超分辨率算法,有效提升了性能,特别是在细节非常丰富的Urban100数据集上,该算法对细节的处理效果尤为明显,图像的客观质量与主观质量都得到显著改善。
关键词
Multi-level feature fusion image super-resolution algorithm with recursive neural network

Tong Junchao, Fei Jialuo, Chen Jingsen, Li Heng, Ding Dandan(School of Information Science and Engineering, Hangzhou Normal University, Hangzhou 311121, China)

Abstract
Objective The recovery of a high-resolution (HR) image or video from its low-resolution (LR) counterpart, which is referred to as super resolution (SR), has attracted considerable attention in computer vision community. The SR problem is inherently ill-posed because the HR image or video actually does not exist. Several methods have been proposed to address this issue. Several typical methods, such as bilinear or bicubic interpolation, Lanczos resampling, and internal patch recurrence, have been used. Recently, learning-based methods, such as sparse coding, random forest, and convolutional neural networks (CNNs), have been utilized to create a mapping between LR and HR images. Particularly, the CNN-based scheme has achieved remarkable performance improvement. Different network models, such as SRCNN, VDSR, LapSRN, and DRRN, have been proposed. These models abstract and combine the features of LR image to establish an effective nonlinear mapping from LR input images to HR target images. In this process, low- and high-level features play an important role in determining the correlation between pixels and in improving the performance of restored HR images. However, the features of previous layer in the aforementioned typical SR network models are directly fed in the next layer, where multi-level features are incompletely utilized. Inspired by the recent DenseNet, we concatenate and fuse multi-level features from multilayers. Although multi-level features are utilized in this manner, the number of parameters is large, which costs long training time and large storage. Therefore, we employ a recursive network architecture for parameter sharing. The overall model develops an efficient CNN model that can utilize the multi-level features of CNN to improve the SR performance and can control the number of model parameters within an acceptable range. Method We propose an image SR model that utilizes multi-level features. The proposed multi-feature fusion recursive network (MFRN) is based on recursive neural network with the same units in series. The information of features is passed along the basic unit of MFRN, named as the multi-feature fusion unit (MFU). The parameters are shared among these basic units, and the required number of parameters is effectively reduced. The input status within each MFU is obtained from the previous unit with continuous memory mechanism. Then, the low-level to high-level features are concatenated and fused to obtain abundant features in describing the image. Valuable features are extracted and enhanced, which can accurately describe the mapping relationship between LR and HR. With regard to the training process, a residual learning strategy, which involves local residual learning inside each unit and global residual learning through the entire network, is adopted to accelerate the training speed. Specifically, a global residual learning strategy is employed in the training of the overall MFRN, and a local residual learning strategy is applied for MFU. The training difficulty is efficiently reduced, and typical phenomena, such as network degradation and vanishing gradient, can be avoided by combining the aforementioned strategies. In terms of the cost function, the averaged mean square error over the training set is minimized. We train a single model for multiple scales based on the proposed cost function and training methods. Result We use 291 pictures from public databases as the training set. In addition, data augmentation (rotation or flip) is applied. Images with different scales (×2,×3, and×4) are included in the training set. Therefore, only a single model is trained for all different scales. During the training process, we adopt an adaptive learning rate and an adjustable gradient clipping to accelerate the convergence rate while suppressing exploding gradients. We evaluate four network models with different numbers of MFUs, which correspond to 29, 37, 53, and 81 layers. The network with nine MFUs achieves the best performance by comparing the convergence rate and performance. Hence, we adopt nine MFUs in the final CNN model. Although the proposed network has 37 layers, it elegantly converges at 230 epochs and obtains remarkable gains. The dominant evaluation criteria of image quality, such as PSNR, SSIM, and IFC, are employed for the performance assessment of restored images. Experimental results show that the proposed model achieves average PSNR gains of 0.24, 0.23, and 0.19 dB compared with the very deep convolutional networks for super-resoluton(VDSR) with the general four test sets for×2,×3, and×4 resolutions. Specifically, the proposed MFRN considerably improves the quality of restored images in the dataset Urban100 that contains rich details. In addition, the subjective quality of restored images is illustrated. The MFRN can produce relatively sharper edges than that of other methods. Conclusion A multilevel feature fusion image SR algorithm based on recursive neural network, referred to as MFRN, is proposed in this study. The MFRN consists of multiple MFUs. Several recursive units are stacked to learn the residual image between the HR and LR images. The parameters with the recursive learning scheme are shared among the units, thereby effectively reducing the number of network parameters. The features of different levels within each unit are concatenated and fused to provide intensive description of the images. In this way, the proposed MFRN can extract and adaptively enhance valuable features, which leads to accurate mapping between LR and HR images. During the training procedure, we adopt a local residual learning inside each unit and a global residual learning through the entire network. Thus, a single model is trained for different scales. Experimental results show that the proposed MFRN considerably improves the performance. Specifically, in the Urban100 dataset, MFRN achieves 0.4 dB PSNR gains compared with the classical VDSR model. In comparison with the basic recursive network DRRN, 0.14 dB PNSR improvement is obtained. With regard to the subjective quality, MFRN is focused on handling the details of images. The visual perception of images is remarkably improved.
Keywords

订阅号|日报