Current Issue Cover
多通道卷积的图像超分辨率方法

李云飞, 符冉迪, 金炜, 纪念(宁波大学信息科学与工程学院, 宁波 315211)

摘 要
目的 超分辨率技术在实际生活中具有较为广泛的应用。经典的基于卷积神经网络的超分辨率(SRCNN)方法存在重建图像纹理结构模糊以及网络模型训练收敛过慢等问题。针对这两个问题,在SRCNN的基础上,提出一种多通道卷积的图像超分辨率(MCSR)算法。方法 通过增加残差链接,选择MSRA初始化方法对网络权值进行初始化,加快模型收敛;引入多通道映射提取更加丰富的特征,使用多层3×3等小卷积核代替单层9×9等大卷积核,更加有效地利用特征,增强模型的超分辨率重构效果。结果 MCSR迭代4×106次即可收敛,在Set5与Set14数据集上边长放大3倍后的平均峰值信噪比分别是32.84 dB和29.28 dB,与SRCNN相比提升显著。结论 MCSR收敛速度更快,并且可以生成轮廓清晰的高分辨率图像,超分辨率效果更加优秀。
关键词
Image super-resolution using multi-channel convolution

Li Yunfei, Fu Randi, Jin Wei, Ji Nian(Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China)

Abstract
Objective Super-resolution (SR) technology is the method for satisfying the demand for high-quality images. The method was first proposed in the 1960s, and its goal is to obtain one or a series of high-resolution (HR) image(s) using one or a sequence of low-resolution (LR) image(s). SR technology not only can improve the visual performance of images but also can help improve the analysis and processing of images, including object recognition, image retrieval, and object detection. SR technology is widely used in real life, such as in video surveillance system, medical image processing, and remote sensing image processing. Traditional methods, such as interpolation-, reconstruction-, and learning-based algorithms, cannot achieve desirable SR results and short SR times. In recent years, a modern convolutional neural network (CNN)-based method called super-resolution CNN (SRCNN) has been proposed. The SRCNN method is a deep learning method for single-image SR and directly learns an end-to-end mapping between LR and HR images. This method achieves better performance in SR results and SR times that do the traditional ones but still presents several limitations. SRCNN uses stacked CNN structure and Gauss initialization method, thereby resulting in slow convergence and time-consuming model training. Furthermore, SRCNN exhibits poor nonlinear mapping capability and simple feature extraction because it comprises only three layers of convolution kernels. The method generates unclear HR images of blurry texture. An image SR method based on multi-channel CNN (MCSR) is proposed to resolve the aforementioned issues. Method MCSR adopts two different strategies, namely, residual CNN model and MSRA initialization method, to accelerate the convergence of model training. Given that residual CNN possesses an identity mapping from input to output, the model training aims to explicitly model the residual image, which is the difference between HR and LR images. This change is advantageous considering that LR and HR images share the same information to a large extent. The MSRA initialization method can maintain activation and back-propagated gradient variances when moving up or down the network. Both schemes result in substantially fast convergence. At the same time, the two schemes are suggested to improve the performance of image SR. The deeper the CNN structure, the better the performance of CNN. MCSR replaces the large convolution kernel, such as 9×9, as chosen by SRCNN with several layers of small convolution kernel, such as 3×3. As a result, MCSR obtains seven layers of convolution kernel and exhibits enhanced capability of nonlinear mapping. In addition to deepening, MCSR is widened to multi-channel on the nonlinear mapping part. Precisely, the basic MCSR possesses four channels of one layer of 3×3 convolution kernels, two layers of stacked 3×3 convolution kernels, one layer of 1×5 convolution kernels, and one layer of 5×1 convolution kernels. Experimental results show that different channels produce dissimilar feature maps. In particular, the 3×3 channel produces local feature maps, the 2×3×3 channel produces relative global feature maps, the 1×5 channel extracts transversal textural features, and the 5×1 channel extracts vertical textural feature. Furthermore, MCSR possesses an extra one layer of 1×1 convolution kernel for compressing the dimension of the feature map, thereby providing the method with powerful nonlinear capability. Powerful nonlinear mapping capability and diverse feature maps can result in good SR performance. Result The proposed MCSR is trained on Image91 dataset, the same as SRCNN, and tested on Set5, Set14, and BSD200 datasets. Experimental results demonstrate that MCSR converges within 4×106 backprops whereas SRCNN converges at least 1.5×107 backprops. The average peak signal-to-noise ratios (PSNRs) with an upscaling factor 3 on Set5, Set14, and BSD200 are 32.84 dB, 29.28 dB, and 29.03 dB and increase by 0.45 dB, 0.27 dB, and 0.38 dB, respectively, compared with those for SRCNN. Structural similarity image measurement also achieves considerable improvement. With regard to subjective effect, MCSR can produce high-quality HR images of clear texture. The produced images barely show shadow and ripple effects. These findings indicate that MCSR achieves good SR performance. Notably, we propose an extra method called MCSR-Ex, which extends the MCSR method to five channels. The additional channel consists of three layers of 3×3 convolution kernels and improves the PSNR by approximately 0.1 dB on Set5 dataset on average. Conclusion In this study, a new SR method called MCSR is proposed. On the one hand, the combination of residual model and MSRA initialization method can significantly accelerate the convergence of model training. On the other hand, the suggested two schemes, which include widening the CNN model to multi-channel and deepening the CNN model to seven layers, can considerably improve the performance of image SR. In other words, the good SR performance is attributed to extracting various feature maps and using feature maps.
Keywords

订阅号|日报