卢正浩,刘丛(上海理工大学光电信息与计算机工程学院, 上海 200093)
目的 针对以往基于深度学习的图像超分辨率重建方法单纯加深网络、上采样信息损失和高频信息重建困难等问题，提出一种基于多尺度特征复用混合注意力网络模型用于图像的超分辨率重建。方法 网络主要由预处理模块、多尺度特征复用混合注意力模块、上采样模块、补偿重建模块和重建模块5部分组成。第1部分是预处理模块，该模块使用一个卷积层来提取浅层特征和扩张特征图的通道数。第2部分是多尺度特征复用混合注意力模块，该模块加入了多路网路、混合注意力机制和长短跳连接，以此来进一步扩大特征图的感受野、提高多尺度特征的复用和加强高频信息的重建。第3部分是上采样模块，该模块使用亚像素方法将特征图上采样到目标图像尺寸。第4部分是补偿重建模块，该模块由卷积层和混合注意力机制组成，用来对经过上采样的特征图进行特征补偿和稳定模型训练。第5部分是重建模块，该模块由一个卷积层组成，用来将特征图的通道数恢复至原来数量，以此得到重建后的高分辨率图像。结果 在同等规模模型的比较中，以峰值信噪比（peak signal-to-noise ratio，PSNR）和结构相似度（structural similarity index measure，SSIM）作为评价指标来评价算法性能，在Set5、Set14、BSD100（Berkeley segmentation dataset）和Urban100的基准测试集上进行测试。当缩放尺度因子为3时，各测试集上的PSNR/SSIM依次为34.40 dB/0.927 3，30.35 dB/0.842 7，29.11 dB/0.805 2和28.23 dB/0.854 0，相比其他模型有一定提升。结论 量化和视觉的实验结果表明，本文模型重建得到的高分辨率图像不仅在重建边缘和纹理信息有很好的改善，而且在PSNR和SSIM客观评价指标上也有一定的提高。
Multiscale feature reuse mixed attention network for image reconstruction
Lu Zhenghao,Liu Cong(School of Optoelectronic Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China)
Objective Obtaining a high-resolution image directly is very difficult due to the interference of the external environment and hardware conditions. A low-resolution image is usually obtained at first, and then one or more image super-resolution methods are employed to obtain the corresponding high-resolution image. In addition, the number of collected images is large. Therefore, how to reconstruct a high-resolution image from a low-resolution image at a low cost has become a research hotspot in the field of computer vision. This research widely exists in the fields of medicine, remote sensing, and public safety. In recent years, many image super-resolution methods have been proposed, and these techniques can be broadly categorized into interpolation-, projection-, and learning-based methods. Among these methods, the convolutional neural network, a typical approach of the learning-based method, has attracted more attention in recent years but still has several problems. First, the reconstruction effect is often improved by simply deepening the network, which will make the network very complex and increase the difficulty of the training. Second, the high-frequency information in an image is difficult to reconstruct. The attention mechanism has been applied to overcome this problem, but the existing attention mechanisms are usually directly quoted from many high-level vision tasks, without considering the particularity of the super-resolution reconstruction tasks. Third, the existing upsampling methods have several limitations such as feature loss and training oscillations, which are difficult to solve in the field of super-resolution reconstruction. To address these problems, this paper proposes a mixed attention network model based on multiscale feature reuse for super-resolution reconstruction. The model improves the performance of the network by using several novelty strategies including multipath network, long and short hop connections, compensation reconstruction block, and mixed attention mechanism. Method The proposed network is mainly composed of five parts:the preprocessing module, the multiscale feature reuse mixed attention module, the upsampling module, the compensation reconstruction module, and the reconstruction module. The first part is the preprocessing module, which uses a convolutional layer to extract shallow features and expand the number of channels in the feature map. The second part is the multiscale feature reuse mixed attention module. This part contains three important subparts including a multichannel network, a mixed attention mechanism, and the jump connections. The multichannel network can increase the receptive fields of different feature maps and improve the reuse of multiscale features. The mixed attention mechanism can better capture the high-frequency information, and the jump connections can reduce the degradation problem of deep network and improve the learning ability. Moreover, the interdependence between shallow features and deep features can be learned by using the depth method and the widening method. The third part is the upsampling module, which uses the subpixel method to upsample the feature map to the target size. The shallow and deep features are upsampled simultaneously and fused to compensate the feature loss caused by the upsampling operation. The fourth part is the compensation reconstruction module, which is composed of a convolutional layer and a mixed attention module. This part is used to perform the secondary feature compensation and stabilize the model training on the feature maps obtained through upsampling. The fifth part is the reconstruction module, which uses a convolutional layer to expand the number of channels of the feature map to the original number to obtain the reconstructed high-resolution image. In the training phase, the DIV2K(DIVerse 2K) dataset is taken as the training set, and each image is processed by several enhancement methods such as random rotation and horizontal flip. Adaptive momentum estimation(ADAM) is used as the optimizer, and L1 is used as the objective function. Each run uses 800 epochs. Result The proposed method is compared with several current state-of-the-art methods including super-resolution convolutional neural network(SRCNN), super-resolution using very deep convolutional networks(VDSR),deep Laplacian pyramid super-resolution networks(LapSRN),memory network for image restoration(MemNet),super-resolution network for multiple degradations(SRMDNF),cascading residual network(CARN), multi-path adaptive modulation network(MAMNet), and the simplified version of residual channel attention network (RCAN-mini). Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are applied to evaluate the performance of these algorithms on widely used benchmark testsets such as Set5, Set14, BSD100(Berkeley segmentation dataset), and Urban100. When the scale factor is 3, the PSNR/SSIM values obtained by this model on each testsets are 34.40 dB/0.927 3, 30.35 dB/0.842 7, 29.11 dB/0.805 2, and 28.23 dB/0.854 0 in order. In terms of PSNR index, compared with RCAN-mini, it is increased by 0.15 dB, 0.08 dB, 0.07 dB, and 0.24 dB on four testsets. Compared with other methods, the reconstruction results are also improved. Conclusion A multiscale feature reuse mixed attention network, which applies a new network structure and an attention mechanism to improve the performance of super-resolution, is proposed. This model is compared with other methods by quantization experiment and visual experiment. Experiment results show that the proposed method can achieve the best reconstruction effect on the edge and texture information and can obtain higher values on the evaluation indicators of PSNR and SSIM than other methods.