多通道卷积的图像超分辨率方法
Image super-resolution using multi-channel convolution
- 2017年22卷第12期 页码:1690-1700
网络出版:2017-12-08,
纸质出版:2017
DOI: 10.11834/jig.170325
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2017-12-08,
纸质出版:2017
移动端阅览
超分辨率技术在实际生活中具有较为广泛的应用。经典的基于卷积神经网络的超分辨率(SRCNN)方法存在重建图像纹理结构模糊以及网络模型训练收敛过慢等问题。针对这两个问题,在SRCNN的基础上,提出一种多通道卷积的图像超分辨率(MCSR)算法。 通过增加残差链接,选择MSRA初始化方法对网络权值进行初始化,加快模型收敛;引入多通道映射提取更加丰富的特征,使用多层3×3等小卷积核代替单层9×9等大卷积核,更加有效地利用特征,增强模型的超分辨率重构效果。 MCSR迭代4×10次即可收敛,在Set5与Set14数据集上边长放大3倍后的平均峰值信噪比分别是32.84 dB和29.28 dB,与SRCNN相比提升显著。 MCSR收敛速度更快,并且可以生成轮廓清晰的高分辨率图像,超分辨率效果更加优秀。
Super-resolution (SR) technology is the method for satisfying the demand for high-quality images. The method was first proposed in the 1960s
and its goal is to obtain one or a series of high-resolution (HR) image(s) using one or a sequence of low-resolution (LR) image(s). SR technology not only can improve the visual performance of images but also can help improve the analysis and processing of images
including object recognition
image retrieval
and object detection. SR technology is widely used in real life
such as in video surveillance system
medical image processing
and remote sensing image processing. Traditional methods
such as interpolation-
reconstruction-
and learning-based algorithms
cannot achieve desirable SR results and short SR times. In recent years
a modern convolutional neural network (CNN)-based method called super-resolution CNN (SRCNN) has been proposed. The SRCNN method is a deep learning method for single-image SR and directly learns an end-to-end mapping between LR and HR images. This method achieves better performance in SR results and SR times that do the traditional ones but still presents several limitations. SRCNN uses stacked CNN structure and Gauss initialization method
thereby resulting in slow convergence and time-consuming model training. Furthermore
SRCNN exhibits poor nonlinear mapping capability and simple feature extraction because it comprises only three layers of convolution kernels. The method generates unclear HR images of blurry texture. An image SR method based on multi-channel CNN (MCSR) is proposed to resolve the aforementioned issues. MCSR adopts two different strategies
namely
residual CNN model and MSRA initialization method
to accelerate the convergence of model training. Given that residual CNN possesses an identity mapping from input to output
the model training aims to explicitly model the residual image
which is the difference between HR and LR images. This change is advantageous considering that LR and HR images share the same information to a large extent. The MSRA initialization method can maintain activation and back-propagated gradient variances when moving up or down the network. Both schemes result in substantially fast convergence. At the same time
the two schemes are suggested to improve the performance of image SR. The deeper the CNN structure
the better the performance of CNN. MCSR replaces the large convolution kernel
such as 9×9
as chosen by SRCNN with several layers of small convolution kernel
such as 3×3. As a result
MCSR obtains seven layers of convolution kernel and exhibits enhanced capability of nonlinear mapping. In addition to deepening
MCSR is widened to multi-channel on the nonlinear mapping part. Precisely
the basic MCSR possesses four channels of one layer of 3×3 convolution kernels
two layers of stacked 3×3 convolution kernels
one layer of 1×5 convolution kernels
and one layer of 5×1 convolution kernels. Experimental results show that different channels produce dissimilar feature maps. In particular
the 3×3 channel produces local feature maps
the 2×3×3 channel produces relative global feature maps
the 1×5 channel extracts transversal textural features
and the 5×1 channel extracts vertical textural feature. Furthermore
MCSR possesses an extra one layer of 1×1 convolution kernel for compressing the dimension of the feature map
thereby providing the method with powerful nonlinear capability. Powerful nonlinear mapping capability and diverse feature maps can result in good SR performance. The proposed MCSR is trained on Image91 dataset
the same as SRCNN
and tested on Set5
Set14
and BSD200 datasets. Experimental results demonstrate that MCSR converges within 4×10 backprops whereas SRCNN converges at least 1.5×10 backprops. The average peak signal-to-noise ratios (PSNRs) with an upscaling factor 3 on Set5
Set14
and BSD200 are 32.84 dB
29.28 dB
and 29.03 dB and increase by 0.45 dB
0.27 dB
and 0.38 dB
respectively
compared with those for SRCNN. Structural similarity image measurement also achieves considerable improvement. With regard to subjective effect
MCSR can produce high-quality HR images of clear texture. The produced images barely show shadow and ripple effects. These findings indicate that MCSR achieves good SR performance. Notably
we propose an extra method called MCSR-Ex
which extends the MCSR method to five channels. The additional channel consists of three layers of 3×3 convolution kernels and improves the PSNR by approximately 0.1 dB on Set5 dataset on average. In this study
a new SR method called MCSR is proposed. On the one hand
the combination of residual model and MSRA initialization method can significantly accelerate the convergence of model training. On the other hand
the suggested two schemes
which include widening the CNN model to multi-channel and deepening the CNN model to seven layers
can considerably improve the performance of image SR. In other words
the good SR performance is attributed to extracting various feature maps and using feature maps.
相关作者
相关机构
京公网安备11010802024621