Current Issue Cover
高分辨率可见光图像引导红外图像超分辨率的Transformer网络

邱德粉1, 江俊君1, 胡星宇1, 刘贤明1, 马佳义2(1.哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150001;2.武汉大学电子信息学院, 武汉 430072)

摘 要
目的 红外图像在工业中发挥着重要的作用。但是由于技术原因,红外图像的分辨率一般较低,限制了其普遍适用性。许多低分辨率红外传感器都和高分辨率可见光传感器搭配使用,一种可行的思路是利用可见光传感器捕获的高分辨率图像,辅助红外图像进行超分辨率重建。方法 本文提出了一种使用高分辨率可见光图像引导红外图像进行超分辨率的神经网络模型,包含两个模块:引导Transformer模块和超分辨率重建模块。考虑到红外和可见光图像对一般存在一定的视差,两者之间是不完全对齐的,本文使用基于引导Transformer的信息引导与融合方法,从高分辨率可见光图像中搜索相关纹理信息,并将这些相关纹理信息与低分辨率红外图像的信息融合得到合成特征。然后这个合成特征经过后面的超分辨率重建子网络,得到最终的超分辨率红外图像。在超分辨率重建模块,本文使用通道拆分策略来消除深度模型中的冗余特征,减少计算量,提高模型性能。结果 本文方法在FLIR-aligned数据集上与其他代表性图像超分辨率方法进行对比。实验结果表明,本文方法可以取得优于对比方法的超分辨率性能。客观结果上,本文方法比其他红外图像引导超分辨率方法在峰值信噪比(peak signal to noise ratio,PSNR)上高0.75 dB;主观结果上,本文方法能够生成视觉效果更加逼真、纹理更加清晰的超分辨率图像。消融实验证明了所提算法各个模块的有效性。结论 本文提出的引导超分辨率算法能够充分利用红外图像和可见光图像之间的关联信息,同时获得红外图像的高质量超分辨率重建结果。
关键词
Guided transformer for high-resolution visible image guided infrared image super-resolution

Qiu Defen1, Jiang Junjun1, Hu Xingyu1, Liu Xianming1, Ma Jiayi2(1.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;2.Electronic Information School, Wuhan University, Wuhan 430072, China)

Abstract
Objective Infrared sensors can be dealt with poor visibility or extreme weather conditions like foggy or sleeting. However, the sensors-infrared imaging ability is constrained of poor spatial resolution compared to similar visible range RGB cameras. Therefore, the applicability of commonly-used infrared imaging systems is challenged for the spatial resolution constraints. To resolve the low-resolution infrared images, many infrared sensors are equipped with high-resolution visible range RGB cameras. Its mechanism is focused on the higher-resolution visible modality to guide the process of lower-resolution sensor-derived more detailed super resolution-optimized images in the visible images. The one challenging issue is the requirement to keep consistency for the target modality features and alleviate redundant artifacts or textures presented in the visible modality only. The other challenging problem is concerned about stereo-paired infrared and visible images and the problem-solving for the difference in their spectral range to pixel-wise align the two images, most of the guided-super resolution methods are bases on the aligned image pairs. Method Our model is focused on guided transformer super-resolution network (GTSR) for the super resolution in infrared image. Those infrared and visible images are designed as queries and keys of each in a transformer. For image reconstruction tasks, it consists of two modules-optimized of those are 1) guided transformer module for transferring the accurate texture features, and 2) super resolution reconstruction module for generating the high resolution results. Due to the misaligned problem for infrared and visible image pairs, there is a certain parallax between them. A guided transformer for information guidance and fusion is used to search for texture information-relevant originated from high-resolution visible images, and the related texture information is fused to obtain synthetic features via low-resolution infrared images. There four aspects of the guided transformer module are: a) texture extractor, b) relevance calculation, c) hard-attention-based feature transfer, and d) soft-attention-based feature synthesis. First, to extract features between infrared and visible images, texture extractor is used. Second, to obtain a hard-attention map and a soft-attention map, features-extracted are formulated from the infrared and visible image as the query and key in a transformer for the relevance calculation. Finally, to transfer and fuse high resolution features from the visible image into the infrared features extraction, hard-attention map and the soft-attention map are employed. A set of synthetic features are obtained as well. To generate the final high resolution infrared image, the features are melted into the following super-resolution reconstruction module. Most of deep networks are focused on highly redundant features extraction due to the deeper nature of networks that similar features are extracted by different layers. In the super resolution reconstruction module, the channel-splitting strategy is implemented to eliminate the redundant features in the network. The residual groups extracted feature maps are segmented into two streamlines through each scale of C channels. To extract richer information, one streamline is linked to the following residual groups. Another streamline is connected with the features to other residual groups straightforward. To preserve high-frequency details in the super-resolution images, the channel splitting can be used to extract diversified features from low resolution infrared image. Result To evaluate our method-proposed, our model is trained and tested on the FLIR-aligned dataset. The training set in FLIR-aligned is organized in terms of 1 480 pairs, and each pair is composed of an infrared and a visible image. There are 126 testing image pairs in FLIR-aligned testing set. We compare our method to the guided and single image super resolution methods proposed for the visible or infrared images either. Two kinds of deep-learning based methods are compared in relevant to the guided super resolution methods: 1) pyramidal edge-maps and attention-based super-resolution-guided (PAGSR) and 2) unaligned thermal super-resolution-guided (UGSR). Among single image super resolution methods, we compare channel split convolutional neural network (ChasNet),an infrared image super resolution method to a few state-of-the-art visible image super resolution deep super-resolution network-enhanced (EDSR), residual channel attention network(RCAN), information multi-distillation network (IMDN), holistic attention network (HAN) and image restoration using Swin Transformer (SwinIR). The super resolution results are evaluated on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Our network is optimized much more in terms of the average PSNR and SSIM values on the 126 images in the FLIR-aligned test set. Specifically, the comparative analysis is illustrated on the three aspects: 1) for the guided super-resolution method UGSR proposed in 2021: the PSNR is 0.75 dB higher and the SSIM is 0.041 higher. 2) For the infrared image super-resolution method ChasNet proposed in 2021: the PSNR and SSIM are improved by 1.106 dB and 0.06 of each. 3) For the advanced visible image super-resolution method RCAN: the PSNR is improved by 0.763 dB, and the SSIM is improved by 0.049 either. Conclusion To extract high-frequency information from the high resolution visible images and provide detailed texture, our guided transformer super resolution model is demonstrated for generating the high resolution infrared image. The correlation information-involved is beneficial to image super resolution between infrared image and visible image. We illustrate that our model has its potentials for high-frequency details reconstruction and objects’ structure preservation in terms of PSNR and SSIM.
Keywords

订阅号|日报