Current Issue Cover
轻量级图像超分辨率的蓝图可分离卷积Transformer网络

毕修平, 陈实, 张乐飞(武汉大学计算机学院)

摘 要
目的 图像超分辨率重建的目的是通过低分辨率图像复原出具有更丰富细节信息的高分辨率图像。近年来,基于Transformer的深度神经网络在图像超分辨率重建领域取得了令人瞩目的性能,然而,这些网络往往参数量巨大、计算成本较高。针对该问题,本研究设计了一种轻量级图像超分辨率重建网络。方法 本文提出了一种轻量级图像超分辨率的蓝图可分离卷积Transformer网络(Blueprint separable convolution Transformer network,BSTN)。基于蓝图可分离卷积(blueprint separable convolution,BSConv)设计了蓝图前馈神经网络和蓝图多头自注意力模块。然后设计了移动通道注意力模块(shift channel attention block,SCAB)对通道重点信息进行加强,它包括移动卷积、对比度感知通道注意力和蓝图前馈神经网络。最后设计了蓝图多头自注意力模块(blueprint multi-head self-attention block,BMSAB),通过蓝图多头自注意力与蓝图前馈神经网络以较低的计算量实现了自注意力过程。结果 本文方法在4个数据集上与10种先进的轻量级超分辨率方法进行比较。客观上,本文方法在不同数据集上取得了不同程度的领先,并且参数量和浮点运算量都处于较低水平。当放大倍数分别为2、3和4时,在Set5数据集上峰值信噪比(peak singal to noise ratio,PSNR)分别提升了0.11dB、0.16dB和0.17dB。主观上,本文方法重建图像清晰,模糊区域小,具有丰富的细节。结论 本文所提出的蓝图可分离卷积Transformer网络BSTN以较少的参数量和浮点运算量达到了世界先进水平,能获得高质量的超分辨率重建结果。
关键词
Blueprint Separable convolution Transformer Network for Lightweight Image Super-Resolution

Bi Xiuping, Chen Shi, Zhang Lefei(School of Computer,Whuhan University)

Abstract
Objective Image super-resolution aims to enhance the resolution and quality of low-resolution images, making them more visually appealing and suitable for human or machine recognition. By utilizing a series of degraded low-resolution images with coarse details, the goal is to reconstruct high-resolution images with finer details. The applications of super-resolution algorithms are vast and encompass areas such as object detection, medical pathological analysis, remote sensing satellite images, and security monitoring. The promising prospects of these applications have led to an increased recognition of the importance of image super-resolution algorithms among researchers. With the advancement of deep learning in computer vision, it has been successfully applied to image super-resolution. This has led to significant achievements. However, the substantial number of parameters and computational requirements of super-resolution models result in slow running speeds, limiting their practicality in real-world development and generation, especially on mobile and edge devices. To address this issue, several lightweight super-resolution models have been proposed. Among these models, the Transformer-based approach stands out as it provides rich detail information in the reconstructed images. However, this type of model still suffers from computational redundancy and large model sizes. To overcome these challenges, this paper presents a novel lightweight super-resolution network based on the Transformer architecture. Method A blueprint separable convolution Transformer netowrk (BSTN) is proposed for lightweight image super-resolution. BSTN is mainly divided into three parts: shallow feature extraction, deep feature extraction, and image reconstruction. In the shallow feature extraction stage, a 3×3 standard convolution operation is employed to extract low-level features from the input image. This initial feature extraction step helps capture basic image information, which is directly transmitted to the tail of the network to provide residual information by the long skip connection. The deep feature extraction component is composed of four successive residual attention transformer groups (RATG). The key elements within this stage are the shift channel attention module (SCAB) and the blueprint multi-head self-attention block (BMSAB). The SCAB and BMSAB are combined to form the hybrid attention transformer module (HATB). Two HATBs are connected together with a residual connection, and a standard convolution operation is applied to follow the two HATBs to construct the RATG. The blueprint feed-forward neural network is first designed for effectively suppressing low-information features and retaining only relevant and useful information. Then the blueprint feed-forward neural network is introduced into the above two attention modules to efficiently extract the significant deep features for super-resolution. The SCAB consists of three main components: shift convolution, contrast-aware channel attention, and blueprint feed-forward neural networks. The shift convolution reduces the number of network parameters and performs spatial information aggregation, enabling effective information fusion across different regions of the image. The contrast-aware channel attention mechanism focuses on important channel information, enhancing the representation of crucial features. The BMSAB consists of a blueprint multi-head self-attention and a blueprint feed-forward neural network. This module allows for the extraction of self-attention with reduced computational complexity while also suppressing low-information features through the blueprint feed-forward neural network. Finally, the shallow features extracted in the earlier stage and the deep features obtained from the RATGs are added together. The combined features are then processed using pixelshuffle, a technique that rearranges the features to increase their spatial resolution. This final step generates the reconstructed high-resolution image with improved quality and detail. By utilizing the designed architecture and specific components, the proposed lightweight super-resolution network achieves effective feature extraction, self-attention calculation, and image reconstruction, addressing the challenges of parameter redundancy and large model sizes commonly inTransformer-based super-resolution models. Our method is implemented using PyTorch on NVIDIA RTX 3090 GPU. The training datasets used in this paper are DIV2K and Flicr2K, which consist of 800 and 1000 images, respectively. The batch size is set to 32, and the patch size of the traning data is set to 48×48. The initial learning rate is set to 5×10-4 and updated with Adam optimizer using a cosine descent strategy, while the total iteration is 106. Result The proposed method is compared with 11 state-of-the-art approaches on four datasets. According to quantitative results, in different magnification and different datasets, the proposed method has achieved varying degrees of improvement, while the parameter size and floating-point operations are at a low level. When the magnification factor is 2, the PSNR of this model achieves the first place on Set5, Set14, BSD100, and Urban100, respectively. It performs well on Set5 and Set14, surpassing the second-best model by 0.11db and 0.08db. When the magnification factor is 3, the PSNR also achieves the first place, surpassing Set5 and Urban100 by 0.16db and 0.06db, respectively. When the magnification is 4, it still obtains the first place and outperforms the second-place models by 0.17db, 0.05db, and 0.04db on Set5, BSD100, and Urban100. According to qualitative results, the reconstructed image of the proposed method is clear, the blurred area is small, and the details are rich. Conclusion A large number of comparative experiments and ablation studies demonstrate that the proposed EBST not only achieves state-of-the-art super-resolution results with excellent quantitative and visual performance but also has fewer parameters and floating-point operations. Specifically, the proposed blueprint separable multi-head self-attention can effectively perform self-attention in Transformer bloks through a concise structure. The proposed blueprint feed-forward neural network can focus on helpful information and filter out useless information for super-resolution, resulting in high efficiency and low cost. It can be seamlessly integrated into other modules. Although our method performs well, the advantages in terms of lightweight models are not obvious and need to be further enhanced.
Keywords

订阅号|日报