Current Issue Cover
双阶段信息蒸馏的轻量级图像超分辨率网络

李明鸿1, 常侃1,2, 李恒鑫1, 谭宇飞1, 覃团发1,2(1.广西大学计算机与电子信息学院, 南宁 530004;2.广西多媒体通信与网络技术重点实验室, 南宁 530004)

摘 要
目的 在图像超分辨率(super resolution,SR)任务中采用大尺寸的卷积神经网络(convolutional neural network,CNN)可以获得理想的性能,但是会引入大量参数,导致繁重的计算负担,并不适合很多计算资源受限的应用场景。为了解决上述问题,本文提出一种基于双阶段信息蒸馏的轻量级网络模型。方法 提出一个双阶段带特征补偿的信息蒸馏模块(two-stage feature-compensated information distillation block,TFIDB)。TFIDB采用双阶段、特征补偿的信息蒸馏机制,有选择地提炼关键特征,同时将不同级别的特征进行合并,不仅提高了特征提炼的效率,还能促进网络内信息的流动。同时,TFIDB引入通道关注(channel attention,CA)机制,将经过双阶段信息蒸馏机制提炼的特征进行重要性判别,增强对特征的表达能力。以TFIDB为基础构建模块,提出完整的轻量级网络模型。在提出的网络模型中,设计了信息融合单元(information fusion unit,IFU)。IFU将网络各层级的信息进行有效融合,为最后重建阶段提供准确、丰富的层级信息。结果 在5个基准测试集上,在放大倍数为2时,相较于知名的轻量级网络CARN (cascading residual network),本文算法分别获得了0.29 dB、0.08 dB、0.08 dB、0.27 dB和0.42 dB的峰值信噪比(peak singal to noise ratio,PSNR)增益,且模型参数量和乘加运算量明显更少。结论 提出的双阶段带补偿的信息蒸馏机制可以有效提升网络模型的效率。将多个TFIDB进行级联,并辅以IFU模块构成的轻量级网络可以在模型尺寸和性能之间达到更好的平衡。
关键词
Lightweight image super-resolution network via two-stage information distillation

Li Minghong1, Chang Kan1,2, Li Hengxin1, Tan Yufei1, Qin Tuanfa1,2(1.School of Computer and Electronic Information, Guangxi University, Nanning 530004, China;2.Guangxi Key Laboratory of Multimedia Communications and Network Technology, Nanning 530004, China)

Abstract
Objective Given a low-resolution image, the task of single image super-resolution (SR) is to reconstruct the corresponding high-resolution image. Due to the ill-posed characteristic of this problem, it is challenging to recover the lost details and well preserve the structures in images. To deal with this problem, different kinds of methods have been proposed in the past two decades, including interpolation-based methods, learning-based methods, and reconstruction-based methods. Recently, convolutional neural network (CNN)-based SR methods have achieved great success and received much attention. Several CNNs have been proposed for the SR task, including residual dense network (RDN), enhanced deep residual network for super-resolution (EDSR), and residual channel attention network. Although superior performance has been achieved, many methods utilize very large-scale networks, which definitely lead to a large number of parameters and heavy computational complexity. For example, RDN costs 22.3 million (M) parameters, and the number of parameters of EDSR even reaches 43 M. As a result, those methods might not be suitable for applications with limited memory and computing resources. To solve the above problem, this study proposes a lightweight CNN model using the two-stage information distillation strategy. Method The proposed lightweight CNN model is called two-stage feature-compensated information distillation network (TFIDN). There are three main characteristics in TFIDN. First of all, a highly efficient module, called two-stage feature-compensated information distillation block (TFIDB), is proposed as the basic building block of TFIDN. In each TFIDB, the features can be accurately divided into different parts and then progressively refined by the two stages of information distillation. To this end, 1×1 convolution layers are applied in TFIDB to implicitly learn the packing strategy, which is responsible for selecting the suitable components from the target features for further refinement. Compared with the existing information distillation network (IDN) where only one stage of information distillation is carried out, the proposed two-stage information distillation strategy can extract the features much more precisely. Besides information distillation, TFIDB additionally introduces a feature compensation mechanism, which guarantees the completeness of the features and also enforces the consistence among local memory. More specifically, the operation of feature compensation is performed by concatenating and fusing the cross-layer transferred features and the refined features. Unlike IDN, there is no need to manually adjust the output feature dimensions of different convolution layers in TFIDB; thus, the structure of TFIDB is more flexible. Second, to further increase the ability of feature extraction and discrimination learning, the wide activated super-resolution (WDSR) unit and the channel attention (CA) mechanism are both introduced in TFIDB. To improve the performance of the normal residual learning block, the WDSR unit expands the features before performing activation. To maintain the same number of parameters as that of a normal residual learning block, the input feature dimension of the WDSR unit is set as 32 in this study. Although the CA unit can effectively improve the discrimination learning ability of the network, applying too many CA units could significantly increase the depth of the network. Therefore, only one CA unit is attached at the end of each TFIDB, so as to maintain the efficiency of the network. Because the CA operation is carried out on the precisely refined features, the effectiveness of the network can be ensured. Finally, to build the full TFIDN, a number of TFIDBs are cascaded. To keep a balance between model complexity and performance, the number of TFIDBs is set as 3. To fully take advantage of different levels of information, an information fusion unit (IFU) is designed to fuse the outputs of different TFIDBs. In the existing cascading residual network (CARN), dense connections are utilized among the building blocks, leading to a relatively large number of parameters. Different from CARN, to keep a small number of parameters, IFU only introduces one 1×1 convolution layer, which only results in 3 kilo (K) parameters. Result The proposed TFIDN is trained using DIV2K dataset. Five widely used datasets, including Set5, Set14, BSD100, Urban100, and Manga109, are used for testing. The ablation study shows that the proposed building block TFIDB and the IFU both contribute to the superior performance of the network. Compared with six famous lightweight models, including fast super-resolution convolutional neural networks, very deep network for super-resolution, Laplacian pyramid super-resolution network, persistent memory network, IDN, and CARN, the proposed TFIDN is able to achieve the highest peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values. Specifically, with a scale factor of 2 on five testing datasets, the PSNR improvements of TFIDN over the second best method CARN are 0.29 dB, 0.08 dB, 0.08 dB, 0.27 dB, and 0.42 dB, respectively, whereas the SSIM improvements are 0.001 6, 0.000 9, 0.001 7, 0.003 0, and 0.000 9, respectively. The significant PSNR and SSIM improvements indicate that TFIDN is more effective than CARN. On the other hand, the number of parameters and the number of mult-adds required by TFIDN are 933 K and 53.5 giga (G), respectively, both of which are smaller than those of CARN. This phenomenon suggests that TFIDN is more efficient than CARN. Although the proposed TFIDN consumes more parameters and mult-adds than IDN, TFIDN achieves significantly higher performance in terms of PSNR and SSIM. Conclusion The proposed two-stage feature-compensated information distillation mechanism is efficient and effective. By cascading a number of TFIDBs and introducing the IFU, the proposed lightweight network TFIDN can achieve a better trade-off in terms of model size, computational complexity, and performance.
Keywords

订阅号|日报