Current Issue Cover
轻量级图像超分辨率的蓝图可分离卷积Transformer网络

毕修平1, 陈实1, 张乐飞1,2(1.武汉大学计算机学院国家多媒体软件工程技术研究中心, 武汉 430072;2.湖北珞珈实验室, 武汉 430079)

摘 要
目的 图像超分辨率重建的目的是将低分辨率图像复原出具有更丰富细节信息的高分辨率图像。近年来,基于Transformer的深度神经网络在图像超分辨率重建领域取得了令人瞩目的性能,然而,这些网络往往参数量巨大、计算成本较高。针对该问题,设计了一种轻量级图像超分辨率重建网络。方法 提出了一种轻量级图像超分辨率的蓝图可分离卷积Transformer网络(blueprint separable convolution Transformer network,BSTN)。基于蓝图可分离卷积(blueprint separable convolution,BSConv)设计了蓝图前馈神经网络和蓝图多头自注意力模块。然后设计了移动通道注意力模块(shift channel attention block,SCAB)对通道重点信息进行加强,包括移动卷积、对比度感知通道注意力和蓝图前馈神经网络。最后设计了蓝图多头自注意力模块(blueprint multi-head self-attention block,BMSAB),通过蓝图多头自注意力与蓝图前馈神经网络以较低的计算量实现了自注意力过程。结果 本文方法在4个数据集上与10种先进的轻量级超分辨率方法进行比较。客观上,本文方法在不同数据集上取得了不同程度的领先,并且参数量和浮点运算量都处于较低水平。当放大倍数分别为2、3和4时,在Set5数据集上相比SOTA(state-of-theart)方法,峰值信噪比(peak signal to noise ratio,PSNR)分别提升了0.11dB、0.16dB和0.17dB。主观上,本文方法重建图像清晰,模糊区域小,具有丰富的细节。结论 本文所提出的蓝图可分离卷积Transformer网络BSTN以较少的参数量和浮点运算量达到了先进水平,能获得高质量的超分辨率重建结果。
关键词
Blueprint separable convolution Transformer network for lightweight image super-resolution

Bi Xiuping1, Chen Shi1, Zhang Lefei1,2(1.National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, China;2.Hubei Luojia Laboratory, Wuhan 430079, China)

Abstract
Objective Image super-resolution aims to enhance the resolution and quality of low-resolution images,making them more visually appealing and suitable for human or machine recognition.By utilizing a series of degraded lowresolution images with coarse details,the objective is to reconstruct high-resolution images with finer details.The applications of super-resolution algorithms are vast and encompass areas,such as object detection,medical pathological analysis,remote sensing satellite images,and security monitoring.The promising prospects of these applications have led to an increased recognition of the importance of image super-resolution algorithms among researchers.With the advancement of deep learning in computer vision,this method has been successfully applied to image super-resolution,leading to significant achievements.However,the substantial number of parameters and the computational requirements of super-resolution models result in slow running speeds,limiting their practicality in real-world development and generation,particularly in mobile and edge devices.To address this issue,several lightweight super-resolution models have been proposed.Among these models,the Transformer-based approach stands out because it provides rich detail information in reconstructed images.However,this type of model still suffers from computational redundancy and large model size.To overcome these challenges,this study presents a novel lightweight super-resolution network based on the Transformer architecture.Method A blueprint separable convolution Transformer network (BSTN) is proposed for lightweight image superresolution.BSTN is divided into three parts:shallow feature extraction,deep feature extraction,and image reconstruction.In the shallow feature extraction stage,a 3 × 3 standard convolution operation is employed to extract low-level features from the input image.This initial feature extraction step helps capture basic image information,which is directly transmitted to the tail of the network to provide residual information via the long skip connection.The deep feature extraction component is composed of four successive residual attention Transformer groups(RATGs).The key elements within this stage are the shift channel attention module(SCAB)and the blueprint multi-head self-attention block(BMSAB).SCAB and BMSAB are combined to form the hybrid attention Transformer module(HATB).Two HATBs are connected together with a residual connection,and a standard convolution operation is applied to follow the two HATBs to construct the RATG.The blueprint feed-forward neural network is first designed for effectively suppressing low-information features and retaining only relevant and useful information.Then,the blueprint feed-forward neural network is introduced into the two aforementioned attention modules to efficiently extract the significant deep features for super-resolution.SCAB consists of three major components:shift convolution,contrast-aware channel attention,and blueprint feed-forward neural networks.Shift convolution reduces the number of network parameters and performs spatial information aggregation,enabling effective information fusion across different regions of the image.The contrast-aware channel attention mechanism focuses on important channel information,enhancing the representation of crucial features.BMSAB consists of a blueprint multihead self-attention and a blueprint feed-forward neural network.This module allows for the extraction of self-attention with reduced computational complexity while suppressing low-information features through the blueprint feed-forward neural network.Finally,the shallow features extracted in the earlier stage and the deep features obtained from the RATGs are added together.The combined features are then processed using pixel shuffle,a technique that rearranges features to increase their spatial resolution.This final step generates the reconstructed high-resolution image with improved quality and detail.By utilizing the designed architecture and specific components,the proposed lightweight super-resolution network achieves effective feature extraction,self-attention calculation,and image reconstruction,addressing the challenges of parameter redundancy and large model size commonly encountered in Transformer-based super-resolution models.Our method is implemented using PyTorch on NVIDIA RTX 3090 GPU.The training datasets used in this study are DIV2K and Flicr2K,which consist of 800 and 1 000 images,respectively.Batch size is set to 32,and the patch size of the training data is set to 48 × 48 pixels.The initial learning rate is set to 5×10-4 and updated with an Adam optimizer by using a cosine descent strategy,while the total iteration is 106.Result The proposed method is compared with 11 state-of-the-art approaches on 4 datasets.In accordance with the quantitative results,the proposed method has achieved varying degrees of improvement in different magnifications and datasets,while parameter size and floating-point operations are at low levels.When the magnification factor is 2,the peak signal to noise ratio (PSNR)of this model is ranked first place on Set5,Set14,BSD100,and Urban100.It performs well on Set5 and Set14,surpassing the second-best model by 0.11 dB and 0.08 dB,respectively.When the magnification factor is 3,the PSNR also ranks first place,surpassing Set5 and Urban100 by 0.16 dB and 0.06 dB,respectively.When magnification is 4,it still ranks first place and outperforms the second-place models by 0.17,0.05,and 0.04 dB on Set5,BSD100,and Urban100,respectively.In accordance with the qualitative results,the reconstructed image of the proposed method is clear,the blurred area is small,and details are rich.Conclusion A large number of comparative experiments and ablation studies demonstrate that the proposed EBST not only achieves state-of-theart super-resolution results with excellent quantitative and visual performance,but it also has fewer parameters and floating-point operations.In particular,the proposed blueprint separable multi-head self-attention can effectively perform selfattention in Transformer blocks through a concise structure.The proposed blueprint feed-forward neural network can focus on helpful information and filter out useless information for super-resolution,resulting in high efficiency and low cost.It can be seamlessly integrated into other modules.Although our method performs well,its advantages in terms of lightweight models are in evident and should be further enhanced.
Keywords

订阅号|日报