Current Issue Cover
多层次感知残差卷积网络的单幅图像超分重建

何蕾, 程佳豪, 占志钰, 杨雯博, 刘沛然(合肥工业大学, 合肥 230601)

摘 要
目的 单幅图像超分辨率重建的深度学习算法中,大多数网络都采用了单一尺度的卷积核来提取特征(如3×3的卷积核),往往忽略了不同卷积核尺寸带来的不同大小感受域的问题,而不同大小的感受域会使网络注意到不同程度的特征,因此只采用单一尺度的卷积核会使网络忽略了不同特征图之间的宏观联系。针对上述问题,本文提出了多层次感知残差卷积网络(multi-level perception residual convolutional network,MLP-Net,用于单幅图像超分辨率重建)。方法 通过特征提取模块提取图像低频特征作为输入。输入部分由密集连接的多个多层次感知模块组成,其中多层次感知模块分为浅层多层次特征提取和深层多层次特征提取,以确保网络既能注意到图像的低级特征,又能注意到高级特征,同时也能保证特征之间的宏观联系。结果 实验结果采用客观评价的峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)两个指标,将本文算法其他超分辨率算法进行了对比。最终结果表明本文算法在4个基准测试集上(Set5、Set14、Urban100和BSD100(Berkeley Segmentation Dataset))放大2倍的平均峰值信噪比分别为37.851 1 dB,33.933 8 dB,32.219 1 dB,32.148 9 dB,均高于其他几种算法的结果。结论 本文提出的卷积网络采用多尺度卷积充分提取分层特征中的不同层次特征,同时利用低分辨率图像本身的结构信息完成重建,并取得不错的重建效果。
关键词
Single image super-resolution reconstruction based on multi-level perceptual residual convolutional network

He Lei, Cheng Jiahao, Zhan Zhiyu, Yang Wenbo, Liu Peiran(Hefei University of Technology, Hefei 230601, China)

Abstract
Objective Single image super-resolution reconstruction (SISR) is a classic problem in computer vision. SISR aims to reconstruct one high-resolution image from single or many low-resolution (LR) images. Currently, image super-resolution (SR) technology is widely used in medical imaging, satellite remote sensing, video surveillance, and other fields. However, the SR problem is an essentially complex and morbid problem. To solve this problem, many SISR methods have been proposed, including interpolation-based methods and reconstruction-based methods. Due to large amplification factors, the repair performance will drop sharply, and the reconstructed results are very poor. With the rise of deep learning, deep convolutional neural networks have also been used to solve this problem. Researchers have proposed a series of models and made significant progress. With the gradual understanding of deep learning techniques, researchers have found that deep network brings better results than shallow network, and too deep network can cause gradient explosion or disappearance. In addition, the gradient explosion or disappearance can cause the model to be untrainable and thus unable to achieve the best results through training. In recent years, most networks based on deep learning for single-image SR reconstruction adopt single-scale convolution kernels. Generally, a 3×3 convolution kernel is used for feature extraction. Although single-scale convolution kernels can also extract a lot of detailed information, these algorithms usually ignore the problem of different receptive field sizes caused by different convolution kernel sizes. Receptive fields of different sizes will make the network pay attention to different features; therefore, only using a 3×3 convolution kernel will cause the network to ignore the macroscopic relation between different feature images. Considering these problems, this study proposes a multi-level perception network based on GoogLeNet, residual network, and dense convolutional network. Method First, the feature extraction module is used as the input, which can extract low-frequency image features. The feature extraction module consists of two 3×3 convolution layers, which is input to multiple densely connected multi-level perception modules. The multi-level perception module is composed of 3×3 and 5×5 convolution kernels. The 3×3 convolution kernel is responsible for extracting detailed feature information, and the 5×5 convolution kernel is responsible for extracting global feature information. Second, the multi-level perception module is divided into shallow multi-level feature extraction, deep multi-level feature extraction, and tandem compression unit. The shallow multi-level feature extraction is composed of 3×3 chain convolution and 5×5 chain convolution. The former is responsible for extracting fine local feature information in shallow features, whereas the latter is responsible for extracting global features in shallow features. The deep multi-level feature extraction is also composed of 3×3 chain convolution and 5×5 chain convolution. The former extracts fine local feature information in deep features, whereas the latter extracts global feature information in deep features. In the tandem compression unit, the global feature information in shallow features, the fine local feature information in deep features, the global information in deep features, and the initial input are concatenated together and then compressed into the same dimension as the input image. In this way, not only low-level and high-level features of the image can be ensured, but also the macro relationship between the features can be guaranteed. Finally, the reconstruction module is used to obtain the final output by combining the upscaling image with the residual image. This study adopts the DIV2K dataset, which consists of 800 high-definition images, and each image has probably 2 million pixels. In order to make full use of these data, the picture is randomly rotated by 90°, 180°, and 270° and horizontally flipped. Result The reconstructed results are evaluated by using the peak signal-to-noise ratio (PSNR) and structural similarity index and compared with some state-of-the-art SR reconstruction methods. The reconstructed results with 2 scaling factor show that the PSNRs of the proposed algorithm in four benchmark test sets (Set5, Set14, Berkeley Segmentation Dataset(BSD100), and Urban100) are 37.851 1 dB, 33.933 8 dB, 32.219 1 dB, and 32.148 9 dB, respectively, which are all higher than those of the other methods. Conclusion Compared with other algorithms, the proposed convolutional network model in this study can better take into account the problem of the receptive field and fully extracts different levels of hierarchical features through multi-scale convolution. At the same time, the model uses the structural feature information of the LR image itself to complete the reconstruction, and good reconstructed results can be obtained by using this model.
Keywords

订阅号|日报