多通道卷积的图像超分辨率方法

李云飞; 符冉迪; 金炜; 纪念

doi:10.11834/jig.170325

图像理解和计算机视觉 | 浏览量 : 0 下载量: 433 CSCD: 6

PDF
导出
分享
收藏
专辑

多通道卷积的图像超分辨率方法
Image super-resolution using multi-channel convolution
2017年22卷第12期页码：1690-1700
网络出版：2017-12-08，

纸质出版：2017
DOI： 10.11834/jig.170325
稿件说明：

移动端阅览

李云飞, 符冉迪, 金炜, 纪念. 多通道卷积的图像超分辨率方法[J]. 中国图象图形学报, 2017,22(12):1690-1700. DOI： 10.11834/jig.170325.

Li Yunfei, Fu Randi, Jin Wei, Ji Nian. Image super-resolution using multi-channel convolution[J]. Journal of Image and Graphics, 2017, 22(12): 1690-1700. DOI： 10.11834/jig.170325.

摘要

超分辨率技术在实际生活中具有较为广泛的应用。经典的基于卷积神经网络的超分辨率（SRCNN）方法存在重建图像纹理结构模糊以及网络模型训练收敛过慢等问题。针对这两个问题，在SRCNN的基础上，提出一种多通道卷积的图像超分辨率（MCSR）算法。通过增加残差链接，选择MSRA初始化方法对网络权值进行初始化，加快模型收敛；引入多通道映射提取更加丰富的特征，使用多层3×3等小卷积核代替单层9×9等大卷积核，更加有效地利用特征，增强模型的超分辨率重构效果。 MCSR迭代4×10次即可收敛，在Set5与Set14数据集上边长放大3倍后的平均峰值信噪比分别是32.84 dB和29.28 dB，与SRCNN相比提升显著。 MCSR收敛速度更快，并且可以生成轮廓清晰的高分辨率图像，超分辨率效果更加优秀。

Abstract

Super-resolution (SR) technology is the method for satisfying the demand for high-quality images. The method was first proposed in the 1960s

and its goal is to obtain one or a series of high-resolution (HR) image(s) using one or a sequence of low-resolution (LR) image(s). SR technology not only can improve the visual performance of images but also can help improve the analysis and processing of images

including object recognition

image retrieval

and object detection. SR technology is widely used in real life

such as in video surveillance system

medical image processing

and remote sensing image processing. Traditional methods

such as interpolation-

reconstruction-

and learning-based algorithms

cannot achieve desirable SR results and short SR times. In recent years

a modern convolutional neural network (CNN)-based method called super-resolution CNN (SRCNN) has been proposed. The SRCNN method is a deep learning method for single-image SR and directly learns an end-to-end mapping between LR and HR images. This method achieves better performance in SR results and SR times that do the traditional ones but still presents several limitations. SRCNN uses stacked CNN structure and Gauss initialization method

thereby resulting in slow convergence and time-consuming model training. Furthermore

SRCNN exhibits poor nonlinear mapping capability and simple feature extraction because it comprises only three layers of convolution kernels. The method generates unclear HR images of blurry texture. An image SR method based on multi-channel CNN (MCSR) is proposed to resolve the aforementioned issues. MCSR adopts two different strategies

namely

residual CNN model and MSRA initialization method

to accelerate the convergence of model training. Given that residual CNN possesses an identity mapping from input to output

the model training aims to explicitly model the residual image

which is the difference between HR and LR images. This change is advantageous considering that LR and HR images share the same information to a large extent. The MSRA initialization method can maintain activation and back-propagated gradient variances when moving up or down the network. Both schemes result in substantially fast convergence. At the same time

the two schemes are suggested to improve the performance of image SR. The deeper the CNN structure

the better the performance of CNN. MCSR replaces the large convolution kernel

such as 9×9

as chosen by SRCNN with several layers of small convolution kernel

such as 3×3. As a result

MCSR obtains seven layers of convolution kernel and exhibits enhanced capability of nonlinear mapping. In addition to deepening

MCSR is widened to multi-channel on the nonlinear mapping part. Precisely

the basic MCSR possesses four channels of one layer of 3×3 convolution kernels

two layers of stacked 3×3 convolution kernels

one layer of 1×5 convolution kernels

and one layer of 5×1 convolution kernels. Experimental results show that different channels produce dissimilar feature maps. In particular

the 3×3 channel produces local feature maps

the 2×3×3 channel produces relative global feature maps

the 1×5 channel extracts transversal textural features

and the 5×1 channel extracts vertical textural feature. Furthermore

MCSR possesses an extra one layer of 1×1 convolution kernel for compressing the dimension of the feature map

thereby providing the method with powerful nonlinear capability. Powerful nonlinear mapping capability and diverse feature maps can result in good SR performance. The proposed MCSR is trained on Image91 dataset

the same as SRCNN

and tested on Set5

Set14

and BSD200 datasets. Experimental results demonstrate that MCSR converges within 4×10 backprops whereas SRCNN converges at least 1.5×10 backprops. The average peak signal-to-noise ratios (PSNRs) with an upscaling factor 3 on Set5

Set14

and BSD200 are 32.84 dB

29.28 dB

and 29.03 dB and increase by 0.45 dB

0.27 dB

and 0.38 dB

respectively

compared with those for SRCNN. Structural similarity image measurement also achieves considerable improvement. With regard to subjective effect

MCSR can produce high-quality HR images of clear texture. The produced images barely show shadow and ripple effects. These findings indicate that MCSR achieves good SR performance. Notably

we propose an extra method called MCSR-Ex

which extends the MCSR method to five channels. The additional channel consists of three layers of 3×3 convolution kernels and improves the PSNR by approximately 0.1 dB on Set5 dataset on average. In this study

a new SR method called MCSR is proposed. On the one hand

the combination of residual model and MSRA initialization method can significantly accelerate the convergence of model training. On the other hand

the suggested two schemes

which include widening the CNN model to multi-channel and deepening the CNN model to seven layers

can considerably improve the performance of image SR. In other words

the good SR performance is attributed to extracting various feature maps and using feature maps.