李现国,孙叶美,杨彦利,苗长云(天津工业大学电子与信息工程学院, 天津 300387;天津市光电检测技术与系统重点实验室, 天津 300387)
目的 基于学习的图像超分辨率重建方法已成为近年来图像超分辨率重建研究的热点。针对基于卷积神经网络的图像超分辨率重建（SRCNN）方法网络层少、感受野小、泛化能力差等缺陷，提出了基于中间层监督卷积神经网络的图像超分辨率重建方法，以进一步提高图像重建的质量。方法 设计了具有中间层监督的卷积神经网络结构，该网络共有16层卷积层，其中第7层为中间监督层；定义了监督层误差函数和重建误差函数，用于改善深层卷积神经网络梯度消失现象。训练网络时包括图像预处理、特征提取和图像重建3个步骤，采用不同尺度因子（2、3、4）模糊的低分辨率图像交叉训练网络，以适应对不同模糊程度的图像重建；使用卷积操作提取图像特征时将参数pad设置为1，提高了对图像和特征图的边缘信息利用；利用残差学习完成高分辨率图像重建。结果 在Set5和Set14数据集上进行了实验，并和双三次插值、A+、SelfEx和SRCNN等方法的结果进行比较。在主观视觉评价方面，本文方法重建图像的清晰度和边缘锐度更好。客观评价方面，本文方法的峰值信噪比（PSNR）平均分别提高了2.26 dB、0.28 dB、0.28 dB和0.15 dB，使用训练好的网络模型重建图像耗用的时间不及SRCNN方法的一半。结论 实验结果表明，本文方法获得了更好的主观视觉评价和客观量化评价，提升了图像超分辨率重建质量，泛化能力好，而且图像重建过程耗时更短，可用于自然场景图像的超分辨率重建。
Image super-resolution reconstruction based on intermediate supervision convolutional neural networks
Li Xianguo,Sun Yemei,Yang Yanli,Miao Changyun(School of Electronics and Information Engineering, Tianjin Polytechnic University, Tianjin 300387, China;Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin 300387, China)
Objective Learning-based image super-resolution reconstruction has recently become a research hotspot. A new image super-resolution reconstruction method based on intermediate supervision convolutional neural network (CNN) is proposed to solve the problems of less network layers, small receptive field, and functionality that is limited to a single scale in the original network super-resolution CNN (SRCNN) to further improve the quality of image reconstruction.Method This method is based on the idea of deep learning CNN. First, when information regarding the input or gradient passes through many layers, such information can vanish and be "washed out" by the time it reaches the end (or beginning) of the network. Therefore, we design a CNN structure that has an intermediate supervision layer. The learning architecture has 16 weight layers, and information used for reconstruction (receptive field) is considerably substantial (31×31 versus 13×13). The learning architecture layers are the same type, except for the first, seventh, and the last layers:64 filters with a size of 3×3×64, where a filter operates on a 3×3 spatial region across 64 channels (feature maps). Each convolutional layer is followed by a rectified linear unit as an activation function. The first convolution layer operates on the input image. The seventh layer is an intermediate supervision layer that can guide the training of preceding layers in the CNN; this guidance can be considered an implicit deep supervision adopted to strengthen the learning capability during training. The last layer, which uses a single filter with size 3×3×64, is used for image reconstruction. Second, the supervision layer and reconstruction loss functions are defined to solve the vanishing gradient problem of the deep CNN. The training procedure includes three steps:image preprocessing, feature extraction, and image reconstruction. In the first step, the network is trained by the low-resolution images, which are blurred by different upscaling factors (2, 3, 4, possibly including fractional factors), to reconstruct different degrees of blurred images well. In the second step, the image feature is extracted using convolution operations. Unlike center pixels in the SRCNN, those in the center-surround relation methodology are inferred by surrounding pixels that are not fully utilized. We pad one before the convolutions to keep the sizes of all feature maps (including the output image) uniform, thereby increasing the use of edge information for images and feature maps. In the last step, a smooth loss function with a good generalization performance is easily achieved with a comprehensive use of the features of shallow complexity because the input and output (predicted) images have high similarity and the high-resolution image is reconstructed by the residual learning method.Result The proposed method is evaluated on open challenge datasets Set5 and Set14, which are often used in super-resolution methods. Experimental results show that the proposed method has better subjective visual effect and objective quantitative evaluation than bicubic interpolation, A+, SelfEx, and SRCNN. For subjective visual evaluation, the proposed method produces a reconstructed image that has superior clarity and edge sharpness. For objective evaluation, the average peak signal to noise ratio (PSNR) achieved by this method is 2.26 dB, 0.28 dB, 0.28 dB, and 0.15 dB higher, respectively, than those attained by the other approaches. Meanwhile, the time consumed is less than half that of the SRCNN method when using the trained network models to reconstruct images.Conclusion The flow of information and gradients can be smoothly propagated throughout the entire network by introducing intermediate supervision into our network, thereby enhancing the reconstruction capability of networks and the training efficiency. Extensive experiments confirm that the proposed method, which has intermediate supervision, improves the quality and efficiency of image super-resolution reconstruction. This approach has good generalization capability and can be used for the super-resolution reconstruction of natural scene images.