Current Issue Cover
全局注意力门控残差记忆网络的图像超分重建

王静, 宋慧慧, 张开华, 刘青山(南京信息工程大学大气环境与装备技术协同创新中心, 江苏省大数据分析技术重点实验室, 南京 210044)

摘 要
目的 随着深度卷积神经网络的兴起,图像超分重建算法在精度与速度方面均取得长足进展。然而,目前多数超分重建方法需要较深的网络才能取得良好性能,不仅训练难度大,而且到网络末端浅层特征信息容易丢失,难以充分捕获对超分重建起关键作用的高频细节信息。为此,本文融合多尺度特征充分挖掘超分重建所需的高频细节信息,提出了一种全局注意力门控残差记忆网络。方法 在网络前端特征提取部分,利用单层卷积提取浅层特征信息。在网络主体非线性映射部分,级联一组递归的残差记忆模块,每个模块融合多个递归的多尺度残差单元和一个全局注意力门控模块来输出具备多层级信息的特征表征。在网络末端,并联多尺度特征并通过像素重组机制实现高质量的图像放大。结果 本文分别在图像超分重建的5个基准测试数据集(Set5、Set14、B100、Urban100和Manga109)上进行评估,在评估指标峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)上相比当前先进的网络模型均获得更优性能,尤其在Manga109测试数据集上本文算法取得的PSNR结果达到39.19 dB,相比当前先进的轻量型算法AWSRN(adaptive weighted super-resolution network)提高0.32 dB。结论 本文网络模型在对低分图像进行超分重建时,能够联合学习网络多层级、多尺度特征,充分挖掘图像高频信息,获得高质量的重建结果。
关键词
Learning global attention-gated multi-scale memory residual networks for single-image super-resolution

Wang Jing, Song Huihui, Zhang Kaihua, Liu Qingshan(Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China)

Abstract
Objective With the rapid development of deep convolutional neural networks (CNNs), great progress has been made in the research of single-image super-resolution (SISR) in terms of accuracy and efficiency. However, existing methods often resort to deeper CNNs that are not only difficult to train, but also have limited feature resolutions to capture the rich high-frequency detail information that is essential for accurate SR prediction. To address these issues, this letter presents a global attention-gated multi-scale memory network (GAMMNet) for SISR. Method GAMMNet mainly consists of three key components:feature extraction, nonlinear mapping (NM), and high-resolution (HR) reconstruction. In the feature extraction part, the input low-resolution (LR) image passes through a 3×3 convolution layer to learn the low-level features. Then, we utilize a recursive structure to perform NM to learn the deeper-layer features. At the end of the network, we arrange four kernels with different sizes followed by a global attention gate (GAG) to achieve the HR reconstruction. Specifically, the NM module consists of four recursive residual memory blocks (RMBs). Each RMB outputs a multi-level representation by fusing the output of the top multi-scale residual unit (MRU) and the ones from the previous MRUs, followed by a GAG module, which serves as a gate to control how much of the previous states should be memorized and decides how much of the current state should be kept. The details of how to design the two novel modules (MRU and GAG) will be explained as follows. MRU takes advantage of the wide activation architecture inspired by the wide activation super-resolution(WDSR) method. As the nonlinear ReLU(rectified linear units) layers in residual blocks hinder the transmission of information flow from shallow layers to deeper layers, the wide activation architecture increases the channels of feature map before ReLU layers to help transmit information flows. Moreover, as the depth of the network increases, problems such as underutilization of features and gradual disappearance of features during transmission occur. Making full use of features is the key to network reconstruction of high-quality images. We parallelly stack two different convolution kernels with sizes of 3×3 and 5×5 to extract multi-scale features, which are further enhanced by the proposed GAG module, yielding the residual output. Finally, the output of MRU is the addition of the input and the residual output. GAG serves as a gate to enhance the input feature channels with different weights. First, different from the residual channel attention network(RCAN) that mainly considers the correlation between feature channels, our GAG takes into account the useful spatial holistic statistic information of the feature maps. We utilize a special spatial pooling operation to improve the global average pooling used in the original channel attention to obtain global context features, and take a weighted average of all location features to establish a more efficient long-range dependency. Then, we aggregate the global context information on each location feature. We first leverage a 1×1 convolution to reduce the feature channel number to 1. Then, we use a Softmax function to capture the global context information for each pixel, yielding the pixel-wise attention weight map. Afterward, we introduce a learning parameter λ1 to adaptively rescale the weight map. Now the weight map is correlated with the input feature maps, outputting the channel-wise weight vector that encodes the global holistic statistic information of the feature maps. Second, in order to reduce the amount of calculations under the premise of establishing effective long-distance dependencies, we learn to use a bottleneck layer used in SE block to implement feature transform, which not only significantly reduces the amount of calculations but also captures the correlation between feature channels. We feed the channel-wise attention weight vector into a feature transform module that consists of one 1×1 convolution layer, one normalization layer, one ReLU layer, and one 1×1 convolution layer and multiply an adaptively learning parameter λ2, yielding the enhanced channel-wise attention weights that capture channel-wise dependencies. Finally, we channel-wisely multiply 牴獨楥瘠敩?牰敵獴椠摦略慡汴?湲敥琠睭潡牰歳??剮剤丠???洠敥浮潨牡祮?湥敤琠督潨牡歮??敬洭乷敩瑳???捴慴獥据慴摩楯湮朠?牥敩獧楨摴畳愠汴?渠敡瑧睧潲牥歧???删乴???浧畬汯瑢楡?猠捣慯汮整?牸整猠楩摮畦慯汲?湡整瑩睯潮爠歯??卯删乥???愠湬摯?慡摬愠灦瑥楡癴敵?睥攮椠杉桮琠整摨?猠畩灭敡牧?爠敭獡潧汮畩瑦楩潣湡?湩敯瑮眠潳牴歡??圬匠剷乥???佳扩癧楮漠畡獮氠祥??潩畣物?????乥散瑯?慳捴桲極散癴敩獯?琠桳整?扵散獴瑵?灥攠牴景漠牣浯慭湢捩敮?椠湬?瑣敡牬洠獭?潬晴?戭潳瑣桡?健匠书剥?慴湵摲?即匠???愠浧潬湯杢?慬氠汦?捡潴浵灲慥牳攠摴?洠敡瑣桨潩摥獶?椠湩?慡汧浥漠獭瑡?慮汩汦?扣敡湴捩桯浮愮爠歗?搠慦瑩慲獳整琠獬??敥硲捡敧灥琠?晨潲牥?琠桤敩?卦卥???潴渠?瑯桮敶?啬牵扴慩湯???て??睬桯敷牥敤?潢畹爠?????丠整瑯?慡捤桡楰整癩敶獥?瑹栠敡?獪敵捳潴渠摴?扥攠獲瑥?灯敮牳晴潲牵浣楴湩杯?匠卦???潵晲??????????眠桷楨捩档?椠獭?獫汥椠杦桵瑬汬礠?汳潥眠敯牦?瑴桨慥渠?瑯档敡?戠敭獵瑬??圭即剣乡?睥椠瑦桥?却卵???漠景???????づ???楳湴慲汵汣祴??眠数?捲潴渮搠畔捨瑥?愬戠污愠瑰楩潸湥?攭硳灨敵牦楦浬敥渠瑭獯?潵湬?琠桩敳?楡浤灤潥牤琠慢湥瑨?据潤洠灥潡湣敨渠瑢獲?潮晣?????买敥瑲??坲敭?畩獭敡????????卦敩瑣??慩獯?琮栠敁?琠敬獡瑳?猬攠瑡??吠桴敨?攠硲灥散牯楮浳整湲瑵?晴楥牤猠瑯?牴数灵汴慳挠敡獲?琠桡敤??剤唠獴?楧湥?????乷敩瑴?眠楴瑨桥?瑴桯数?獮浥慴汷汯敲獫琠?牲敡獮楣摨甠慴汯?畣湯業瑢?楮湥?坬?卣剡?愠湡摮?爠敧浬潯癢敡獬?瑦桥敡?????慩獮?瑯桲敭?扴敩湯据栬洠慯牵歴??呴桴敩湮??楴瑨?愠摦摩獮?瑬栠敓??剩啭?慧湥搮?????慥湳摵?晴椼港慢氾氠祗?琠牡慤楯湰獴?????乄敉瑖?獋椨浄畉汖瑥慲湳敥漠甲獋氠祲?畳獯楬湵杴?瑯桮攠?瑭睡潧?瀩爠潤灡潴獡敳摥?洬漠摴畨汥攠獭???吠桷敩?牥敬獹甠汵瑳獥?猠桴潲睡?瑮桩慮瑧?瑤桡整??剥啴?慦湯摲?????洠潃摎畎氭敢獡?楥浤瀠牓潉癓敒?倠却乯删?扲祡?の???慲渠摭?つ?ぬ??灔潨楩湳琠獤??牡敳獥灴攠捣瑯楮癴敡汩祮??愱渠搰‰????丠敩瑭?慧捥桳椬攠瘸攰猰?瑯桦攠?扨敩獣瑨?灡敲牥映潵牳浥慤渠捦敯?漠湴?偡卩乮剩?慧渮搠?卥匠????摲敯浣潥湳獳琠牴慨瑥椠湈杒?瑩桭敡?敥晳映敢捹琠楢癩散湵敢獩獣?潤景?扮漭瑳桡?浰潬摩畮汧攠?摯攠獯楢杴湡獩???扨??潌湒挠汩畭獡楧潥湳??扔???測?瑷桥椠獵?獥琠畳摥祶??瑡桬攠?獯桭慭汯汮潬睹?晵敳慥瑤甠牢敥獮?潨晭?瑲桫攠?湡整瑡睳潥牴歳?愠物敮?晬極牤獩瑮?攠硓瑥牴愵挬琠敓摥?戱礴?琠桂攱‰昰攬愠瑕畲牢敡?攱砰琰爬愠捡瑮楤漠湍?浮潧摡由氰攰??呯栠整湥??愠?湵敲猠瑭敯摤?牬攮挠畔牨獥椠癥敶?獬瑵牡畴捩瑯畮爠敭?楴獲?畣獳攠摡?瑥漠?牨敥愠汰楥穡敫?瑳桩敧?乡??慴湯搭?汯敩慳牥渠?瑡桴敩?搠攨敐灓敎牒?映敡慮瑤甠牴敨獥??呴桲極獣?獵瑲牡畬挠瑳畩牭敩?捡潲浩扴楹渠敩獮?瑥桸攠?晥敡慳瑵畲牥敭獥?潴映?摓楓晉晍攩爠敯湮琠?獨捥愠汙攠獣?瑡潮?敥晬映敯捦琠楴癨敥氠祴?污敮慳牦湯?瑭桥敤?捙潃湢瑃敲砠瑳?楡湣晥漮爠浔慨瑥椠潩湮?潵晴?瑩桭敡?晥攠慰瑡畴牣敨?浳愠灡獲?愠瑣?敯慰捰桥?氠敷癩整汨??愠湳摩?獥漠汯癦攠猴?琦栣攲?瀵爻漴戸氠数浩?潥晬?昬攠慡瑮畤爠整?摥椠獭慩灮灩攠慴牲楡湩杮?摮畧爠楢湡杴?楨渠晳潩牺浥愠瑩楳漠湳?瑴爠慴湯猠洱椶献猠楔潨湥?扨祹?晥畲獰楡湲条?瑥桴敥?潳甠瑯灦甠瑩?潰晵?搬椠晩普整牥敲湮瑡?氬攠癡敮汤猠???楰湵慴氠汣票??楮湥?琠桮敵?牢敥捲潳渠獡瑲牥甠捳瑥楴漠湴?瀠愳爲琬??琲核攬?晡敮慤琠申爲攠獦?潲映?摨楥映晍敒牕敳測琠?獥捳慰汥散獴?慶牥敬?瀮愠牗慥氠污敲汲楡穮敧摥??慯湵摲?灒楍硂敳氠?獮栠畴晨晥氠敮?楮猭?畩獮敥摡?琠潭?慰捰桩楮敧瘠敭?桤極杬桥?焠畡慭汯楮瑧礠?浨慩杣湨椠晥楡捣慨琠楢潬湯?潫映?楡浳愠杦敯獵? MRUs. For the upscale module, we use four different kernel sizes (3×3, 5×5, 7×7, and 9×9), followed by a GAG to generate the HR outputs. We compare our GAMMNet with several state-of-the-art deep CNN-based SISR methods, including super-resolution convolutional neural network(SRCNN), deeply-recursive convolutional network(DRCN), deep recu
Keywords

订阅号|日报