Current Issue Cover
跨尺度耦合的连续比例因子图像超分辨率

吴瀚霖, 李宛谕, 张立保(北京师范大学人工智能学院, 北京 100875)

摘 要
目的 虽然深度学习技术已大幅提高了图像超分辨率的性能,但是现有方法大多仅考虑了特定的整数比例因子,不能灵活地实现连续比例因子的超分辨率。现有方法通常为每个比例因子训练一次模型,导致耗费很长的训练时间和占用过多的模型存储空间。针对以上问题,本文提出了一种基于跨尺度耦合网络的连续比例因子超分辨率方法。方法 提出一个用于替代传统上采样层的跨尺度耦合上采样模块,用于实现连续比例因子上采样。其次,提出一个跨尺度卷积层,可以在多个尺度上并行提取特征,通过动态地激活和聚合不同尺度的特征来挖掘跨尺度上下文信息,有效提升连续比例因子超分辨率任务的性能。结果 在3个数据集上与最新的超分辨率方法进行比较,在连续比例因子任务中,相比于性能第2的对比算法Meta-SR(meta super-resolution),峰值信噪比提升达0.13 dB,而参数量减少了73%。在整数比例因子任务中,相比于参数量相近的轻量网络SRFBN(super-resolution feedback network),峰值信噪比提升达0.24 dB。同时,提出的算法能够生成视觉效果更加逼真、纹理更加清晰的结果。消融实验证明了所提算法中各个模块的有效性。结论 本文提出的连续比例因子超分辨率模型,仅需要一次训练,就可以在任意比例因子上获得优秀的超分辨率结果。此外,跨尺度耦合上采样模块可以用于替代常用的亚像素层或反卷积层,在实现连续比例因子上采样的同时,保持模型性能。
关键词
Cross-scale coupling network for continuous-scale image super-resolution

Wu Hanlin, Li Wanyu, Zhang Libao(School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China)

Abstract
Objective Single image super-resolution (SISR) aims to restore a high-resolution (HR) image through adding high-frequency details to its corresponding low-resolution (LR). Such application scenarios like medical imaging and remote sensing harness super-resolve LR images to multiple scales to customize the accuracy requirement. Moreover, these scales should not be restricted to integers but arbitrary positive numbers. However, a scalable super-resolution(SR) model training will leak a high computational and storage cost. Hence, it is of great significance to construct a single SR model that can process arbitrary scale factors. Deep learning technology has greatly improved the performance of SISR, but most of them are designed for specific integer scale factors. Early pre-sampling methods like the super-resolution convolutional neural network (SRCNN) can achieve continuous-scale upsampling but low computational efficiency. The post-upsampling methods use the deconvolutional or sub-pixel layer of the final step of network to obtain upsampling. However, the structure of the sub-pixel layer and deconvolutional layer is related to the scale factor, resulting in the SR network just be trained for a single scale factor each time. multi-scale deep super-resolution (MDSR) uses multiple upsampling branches to process different scale factors, but it can only super-resolve trained integer scales. Meta super-resolution (Meta-SR) is the first scale-arbitrary SR network that builds a meta-upsampling module. The meta-upsampling module uses a fully connected network to dynamically predict the weights of feature mapping in the span of the LR space and HR space. However, the Meta-SR upsampling module has high computational complexity and the number of parameters in the feature extraction part is extremely huge. Method We illustrated a cross-scale coupling network (CSCN) for continuous-scale image SR. First, we devise a fully convolutional cross-scale coupled upsampling (CSC-Up) module to reach potential decimal scale efficient and end-to-end results. Our strategy is a continuous-scale upsampling module construction through coupling features of multiple scales. The CSC-up module first maps LR features to multiple HR spaces based on a variety of multiscales derived of multiple upsampling branches. Then, the features of multiple HR spaces are adaptively fused to obtain the SR image of the targeted scale. The CSC-Up module can be easily plugged into existing SR networks. We only need to replace the original upsampling module with our CSC-Up module to obtain a continuous-scale SR network. Second, multi-scale features extraction is beneficial to SR tasks. We facilitate a novel cross-scale convolutional (CS-Conv) layer, which can adaptively extract and couple features from multiple scales and exploit cross-scale contextual information. In addition, we utilize a feedback mechanism in the cross-scale feature learning part, using high-level features to refine low-level ones. Such a recurrent structure can increase the capacity of the network without the number of parameters generated. We train our model on the 800 images of diverse 2K resalution(DIV2K) dataset. In the training step, we randomly crop LR patches of size 32×32 as inputs. The input LR patches are generated based on the bicubic downsampling model and rotated or flipped for data augmentation in random. Our model is trained for 400 epochs with a mini-batch size of 16, and each epoch contains 1 000 iterations. The initial learning rate is 1×10-3 and halves at the 1.5×105, 2.5×105 and 3.5×105 iterations. Our demonstration is implemented based on the PyTorch framework and one Telsa V100 GPU training. Result Our method is in comparison with two state-of-the-art (SotA) continuous-scale SR methods, Meta-EDSR(enhanced deep super-resolution) and Meta-RDN. Meanwhile, we define a new benchmark via bicubic resampling the output of a residual dense network (RDN) to the target size, named Bi-RDN. To verify the generality of our CSC-Up module, we replace the original upsampling layer of the RDN with a CSC-Up module and construct a continuous-scale RDN (CS-RDN). We also use the self-ensemble method to further improve the performance of CSCN, named CSCN+. The quantitative evaluation metrics in related to peak signal-to-noise ratio (PSNR) and structure similarity index metric (SSIM). Our CSCN obtains comparable results with Meta-RDN, and CSCN+ obtains the good results on all scale factors. CS-RDN also obtains satisfactory results, demonstrating that the proposed CSC-Up module can be well adapted to the existing SR methods and obtain satisfactory non-integer SR results. We also compare our CSCN with six SotA methods on integer scale factors, including super-resolution convolutional neural network(SRCNN), coarse-to-fine SRCNN (CFSRCNN), RDN, SR feedback network (SRFBN), iterative SR network (ISRN), and meta-RDN, respectively. Comparing the results of RDN and CS-RDN, we sort out that our CSC-Up module can achieve comparable or better results to that of a single-scale upsampling module. Meanwhile, our proposed CSCN and CS-RDN can be trained once and release a single model. Our proposed CSCN uses a simpler and more efficient continuous-scale upsampling module and obtains corresponding results with Meta-SR. CSCN+ achieves the best performance on all datasets and scales. Moreover, the number of parameters in our model is 6 M, which is only 27% of Meta-RDN (22 M). Benefiting from the feedback structure, our method can well balance the number of network parameters and model performance. Thus, our proposed CSCN and CSCN+ are prior to comparing SotAs. Conclusion We propose a novel CSC-Up model that can be easily plugged into the existing SR networks to activate the continuous-scale SR. We also introduce a CS-Conv layer to learn scale-robust features and adopt feedback connections to design a lightweight CSCN. Compared with the previous single-scale SR networks, the proposed CSCN tends to time efficiency and suitable model storage space.
Keywords

订阅号|日报