Guided transformer for high-resolution visible image guided infrared image super-resolution

Defen Qiu; Junjun Jiang; Xingyu Hu; Xianming Liu; Jiayi Ma

doi:10.11834/jig.220604

Infrared and Visible Image Fusion | Views : 0 下载量: 0 CSCD: 5

PDF
Export
Share
Collection
Album

Guided transformer for high-resolution visible image guided infrared image super-resolution
Vol. 28, Issue 1, Pages: 196-206(2023)
Published： 16 January 2023 ，

Accepted： 28 October 2022
DOI： 10.11834/jig.220604
稿件说明：

移动端阅览

Defen Qiu, Junjun Jiang, Xingyu Hu, Xianming Liu, Jiayi Ma. Guided transformer for high-resolution visible image guided infrared image super-resolution. [J]. Journal of Image and Graphics 28(1):196-206(2023)
DOI：

Defen Qiu, Junjun Jiang, Xingyu Hu, Xianming Liu, Jiayi Ma. Guided transformer for high-resolution visible image guided infrared image super-resolution. [J]. Journal of Image and Graphics 28(1):196-206(2023) DOI： 10.11834/jig.220604.

摘要

目的

红外图像在工业中发挥着重要的作用。但是由于技术原因，红外图像的分辨率一般较低，限制了其普遍适用性。许多低分辨率红外传感器都和高分辨率可见光传感器搭配使用，一种可行的思路是利用可见光传感器捕获的高分辨率图像，辅助红外图像进行超分辨率重建。

方法

本文提出了一种使用高分辨率可见光图像引导红外图像进行超分辨率的神经网络模型，包含两个模块：引导Transformer模块和超分辨率重建模块。考虑到红外和可见光图像对一般存在一定的视差，两者之间是不完全对齐的，本文使用基于引导Transformer的信息引导与融合方法，从高分辨率可见光图像中搜索相关纹理信息，并将这些相关纹理信息与低分辨率红外图像的信息融合得到合成特征。然后这个合成特征经过后面的超分辨率重建子网络，得到最终的超分辨率红外图像。在超分辨率重建模块，本文使用通道拆分策略来消除深度模型中的冗余特征，减少计算量，提高模型性能。

结果

本文方法在FLIR-aligned数据集上与其他代表性图像超分辨率方法进行对比。实验结果表明，本文方法可以取得优于对比方法的超分辨率性能。客观结果上，本文方法比其他红外图像引导超分辨率方法在峰值信噪比(peak signal to noise ratio

PSNR)上高0.75 dB; 主观结果上，本文方法能够生成视觉效果更加逼真、纹理更加清晰的超分辨率图像。消融实验证明了所提算法各个模块的有效性。

结论

本文提出的引导超分辨率算法能够充分利用红外图像和可见光图像之间的关联信息，同时获得红外图像的高质量超分辨率重建结果。

Abstract

Objective

Infrared sensors can be dealt with poor visibility or extreme weather conditions like foggy or sleeting. However

the sensors-infrared imaging ability is constrained of poor spatial resolution compared to similar visible range RGB cameras. Therefore

the applicability of commonly-used infrared imaging systems is challenged for the spatial resolution constraints. To resolve the low-resolution infrared images

many infrared sensors are equipped with high-resolution visible range RGB cameras. Its mechanism is focused on the higher-resolution visible modality to guide the process of lower-resolution sensor-derived more detailed super resolution-optimized images in the visible images. The one challenging issue is the requirement to keep consistency for the target modality features and alleviate redundant artifacts or textures presented in the visible modality only. The other challenging problem is concerned about stereo-paired infrared and visible images and the problem-solving for the difference in their spectral range to pixel-wise align the two images

most of the guided-super resolution methods are bases on the aligned image pairs.

Method

Our model is focused on guided transformer super-resolution network (GTSR) for the super res

olution in infrared image. Those infrared and visible images are designed as queries and keys of each in a transformer. For image reconstruction tasks

it consists of two modules-optimized of those are 1) guided transformer module for transferring the accurate texture features

and 2) super resolution reconstruction module for generating the high resolution results. Due to the misaligned problem for infrared and visible image pairs

there is a certain parallax between them. A guided transformer for information guidance and fusion is used to search for texture information-relevant originated from high-resolution visible images

and the related texture information is fused to obtain synthetic features via low-resolution infrared images. There four aspects of the guided transformer module are: a) texture extractor

b) relevance calculation

c) hard-attention-based feature transfer

and d) soft-attention-based feature synthesis. First

to extract features between infrared and visible images

texture extractor is used. Second

to obtain a hard-attention map and a soft-attention map

features-extracted are formulated from the infrared and visible image as the query and key in a transformer for the relevance calculation. Finally

to transfer and fuse high resolution features from the visible image into the infrared features extraction

hard-attention map and the soft-attention map are employed. A set of synthetic features are obtained as well. To generate the final high resolution infrared image

the features are melted into the following super-resolution reconstruction module. Most of deep networks are focused on highly redundant features extraction due to the deeper nature of networks that similar features are extracted by different layers. In the super resolution reconstruction module

the channel-splitting strategy is implemented to eliminate the redundant features in the network. The residual groups extracted feature maps are segmented into two streamlines through each scale of

$$C$$

channels. To extract richer information

one streamline is linked to the following residual groups. Another streamline is connected with the features to other residual groups straightforward. To preserve high-frequency details in the super-resolution images

the channel splitting can be used to extract diversified features from low resolution infrared image.

Result

To evaluate our method-proposed

our model is trained and tested on the FLIR-aligned dataset. The training set in FLIR-aligned is organized in terms of 1 480 pairs

and each pair is composed of an infrared and a visible image. There are 126 testing image pairs in FLIR-aligned testing set. We compare our method to the guided and single image super resolution methods proposed for the visible or infrared images either. Two kinds of deep-learning based methods are compared in relevant to the guided super resolution methods: 1) pyramidal edge-maps and attention-based super-resolution-guided (PAGSR) and 2) unaligned thermal super-resolution-guided (UGSR). Among single image super resolution methods

we compare channel split convolutional neural network (ChasNet)

an infrared image super resolution method to a few state-of-the-art visible image super resolution deep super-resolution network-enhanced (EDSR)

residual channel attention network(RCAN)

information multi-distillation network (IMDN)

holistic attention network (HAN) and image restoration using Swin Transformer (SwinIR). The super resolution results are evaluated on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Our network is optimized much more in terms of the average PSNR and SSIM values on the 126 images in the FLIR-aligned test set. Specifically

the comparative analysis is illustrated on the three aspects: 1) for the guided super-resolution method UGSR proposed in 2021: the PSNR is 0.75 dB higher and the SSIM is 0.041 higher. 2) For the infrared image super-resolution method ChasNet proposed in 2021: the PSNR and SSIM are improved by 1.106 dB and 0.06 of each. 3) For the advanced visible image super-resolution method RCAN: the PSNR is improved by 0.763 dB

and the SSIM is improved by 0.049 either.

Conclusion

To extract high-frequency information from the high resolution visible images and provide detailed texture

our guided transformer super resolution model is demonstrated for generating the high resolution infrared image. The correlation information-involved is beneficial to image super resolution between infrared image and visible image. We illustrate that our model has its potentials for high-frequency details reconstruction and objects' structure preservation in terms of PSNR and SSIM.

关键词

图像超分辨率图像融合红外图像Transformer深度学习

Keywords

image super-resolutionimage fusioninfrared imageTransformerdeep learning

references

Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI: 10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13]

Fang Q Y, Han D P and Wang Z K. 2022. Cross-modality fusion transformer for multispectral object detection. [EB/OL]. [2021-12-01].https://arxiv.org/pdf/2111.00273.pdfhttps://arxiv.org/pdf/2111.00273.pdf

Gupta H and Mitra K. 2020. Pyramidal edge-maps and attention based guided thermal super-resolution//Proceedings of the European Conference on Computer Vision. Glasgow, UK: Springer: 698-715[DOI: 10.1007/978-3-030-67070-2_42http://dx.doi.org/10.1007/978-3-030-67070-2_42]

Gupta H and Mitra K. 2022. Toward unaligned guided thermal super-resolution. IEEE Transactions on Image Processing, 31: 433-445 [DOI: 10.1109/tip.2021.3130538]

Han T Y, Kim Y J and Song B C. 2017. Convolutional neural network-based infrared image super resolution under low light environment//Proceedings of the 25th European Signal Processing Conference (EUSIPCO). Kos, Greece: IEEE: 803-807[DOI: 10.23919/EUSIPCO.2017.8081318http://dx.doi.org/10.23919/EUSIPCO.2017.8081318]

Hui Z, Gao X B, Yang Y C and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 2024-2032[DOI: 10.1145/3343031.3351084http://dx.doi.org/10.1145/3343031.3351084]

Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711[DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]

Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182]

Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/CVPR.2016.181http://dx.doi.org/10.1109/CVPR.2016.181]

Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer vision and Pattern Recognition. Honolulu, USA: IEEE: 105-114[DOI: 10.1109/CVPR.2017.19http://dx.doi.org/10.1109/CVPR.2017.19]

Lee K, Lee J, Lee J, Hwang S and Lee S. 2017. Brightness-based convolutional neural network for thermal image enhancement. IEEE Access, 5: 26867-26879 [DOI: 10.1109/access.2017.2769687]

Liang J Y, Cao J Z, Sun G L, Zhang K, Van Gool L and Timofte R. 2021. SwinIR: image restoration using swin transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE: 1833-1844[DOI: 10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210]

Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]

Niu B, Wen W L, Ren W Q, Zhang X D, Yang L P, Wang S Z, Zhang K H, Cao X C and Shen H F. 2020. Single image super-resolution via a holistic attention network//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 191-207[DOI: 10.1007/978-3-030-58610-2_12http://dx.doi.org/10.1007/978-3-030-58610-2_12]

Sajjadi M S M, Schölkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510[DOI: 10.1109/ICCV.2017.481http://dx.doi.org/10.1109/ICCV.2017.481]

Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/tip.2003.819861]

Wu H L, Li W Y and Zhang L B. 2022. Cross-scale coupling network for continuous-scale image super-resolution. Journal of Image and Graphics, 27(5): 1604-1615

吴瀚霖, 李宛谕, 张立保. 2022. 跨尺度耦合的连续比例因子图像超分辨率. 中国图象图形学报, 27(5): 1604-1615 [DOI: 10.11834/jig.210815]

Xu W J, Song H H, Yuan X T and Liu Q S. 2021. Lightweight attention feature selection recursive network for super-resolution. Journal of Image and Graphics, 26(12): 2826-2835

徐雯捷, 宋慧慧, 袁晓彤, 刘青山. 2021. 轻量级注意力特征选择循环网络的超分重建. 中国图象图形学报, 26(12): 2826-2835 [DOI: 10.11834/jig.200555]

Zhang X D, Li C L, Meng Q P, Liu S J, Zhang Y and Wang J Y. 2018a. Infrared image super resolution by combining compressive sensing and deep learning. Sensors, 18(8): #2587 [DOI: 10.3390/s18082587]

Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18]

Zhang Z F, Wang Z W, Lin Z and Qi H R. 2019. Image super-resolution by neural texture transfer//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7974-7983[DOI: 10.1109/CVPR.2019.00817http://dx.doi.org/10.1109/CVPR.2019.00817]

Zhao X L, Zhang Y L, Zhang T and Zou X M. 2019. Channel splitting network for single MR image super-resolution. IEEE Transactions on Image Processing, 28(11): 5649-5662 [DOI: 10.1109/tip.2019.2921882]

Alert me when the article has been cited

提交

Blueprint separable convolution Transformer network for lightweight image super-resolution

A deep progressive infrared and visible image fusion network

Survey of image deblurring

Image super-resolution with channel-attention-embedded Transformer

Spine CT image segmentation based on Transformer