Guided transformer for high-resolution visible image guided infrared image super-resolution
- Vol. 28, Issue 1, Pages: 196-206(2023)
Published: 16 January 2023 ,
Accepted: 28 October 2022
DOI: 10.11834/jig.220604
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 January 2023 ,
Accepted: 28 October 2022
移动端阅览
Defen Qiu, Junjun Jiang, Xingyu Hu, Xianming Liu, Jiayi Ma. Guided transformer for high-resolution visible image guided infrared image super-resolution. [J]. Journal of Image and Graphics 28(1):196-206(2023)
目的
2
红外图像在工业中发挥着重要的作用。但是由于技术原因,红外图像的分辨率一般较低,限制了其普遍适用性。许多低分辨率红外传感器都和高分辨率可见光传感器搭配使用,一种可行的思路是利用可见光传感器捕获的高分辨率图像,辅助红外图像进行超分辨率重建。
方法
2
本文提出了一种使用高分辨率可见光图像引导红外图像进行超分辨率的神经网络模型,包含两个模块:引导Transformer模块和超分辨率重建模块。考虑到红外和可见光图像对一般存在一定的视差,两者之间是不完全对齐的,本文使用基于引导Transformer的信息引导与融合方法,从高分辨率可见光图像中搜索相关纹理信息,并将这些相关纹理信息与低分辨率红外图像的信息融合得到合成特征。然后这个合成特征经过后面的超分辨率重建子网络,得到最终的超分辨率红外图像。在超分辨率重建模块,本文使用通道拆分策略来消除深度模型中的冗余特征,减少计算量,提高模型性能。
结果
2
本文方法在FLIR-aligned数据集上与其他代表性图像超分辨率方法进行对比。实验结果表明,本文方法可以取得优于对比方法的超分辨率性能。客观结果上,本文方法比其他红外图像引导超分辨率方法在峰值信噪比(peak signal to noise ratio
PSNR)上高0.75 dB; 主观结果上,本文方法能够生成视觉效果更加逼真、纹理更加清晰的超分辨率图像。消融实验证明了所提算法各个模块的有效性。
结论
2
本文提出的引导超分辨率算法能够充分利用红外图像和可见光图像之间的关联信息,同时获得红外图像的高质量超分辨率重建结果。
Objective
2
Infrared sensors can be dealt with poor visibility or extreme weather conditions like foggy or sleeting. However
the sensors-infrared imaging ability is constrained of poor spatial resolution compared to similar visible range RGB cameras. Therefore
the applicability of commonly-used infrared imaging systems is challenged for the spatial resolution constraints. To resolve the low-resolution infrared images
many infrared sensors are equipped with high-resolution visible range RGB cameras. Its mechanism is focused on the higher-resolution visible modality to guide the process of lower-resolution sensor-derived more detailed super resolution-optimized images in the visible images. The one challenging issue is the requirement to keep consistency for the target modality features and alleviate redundant artifacts or textures presented in the visible modality only. The other challenging problem is concerned about stereo-paired infrared and visible images and the problem-solving for the difference in their spectral range to pixel-wise align the two images
most of the guided-super resolution methods are bases on the aligned image pairs.
Method
2
Our model is focused on guided transformer super-resolution network (GTSR) for the super res
olution in infrared image. Those infrared and visible images are designed as queries and keys of each in a transformer. For image reconstruction tasks
it consists of two modules-optimized of those are 1) guided transformer module for transferring the accurate texture features
and 2) super resolution reconstruction module for generating the high resolution results. Due to the misaligned problem for infrared and visible image pairs
there is a certain parallax between them. A guided transformer for information guidance and fusion is used to search for texture information-relevant originated from high-resolution visible images
and the related texture information is fused to obtain synthetic features via low-resolution infrared images. There four aspects of the guided transformer module are: a) texture extractor
b) relevance calculation
c) hard-attention-based feature transfer
and d) soft-attention-based feature synthesis. First
to extract features between infrared and visible images
texture extractor is used. Second
to obtain a hard-attention map and a soft-attention map
features-extracted are formulated from the infrared and visible image as the query and key in a transformer for the relevance calculation. Finally
to transfer and fuse high resolution features from the visible image into the infrared features extraction
hard-attention map and the soft-attention map are employed. A set of synthetic features are obtained as well. To generate the final high resolution infrared image
the features are melted into the following super-resolution reconstruction module. Most of deep networks are focused on highly redundant features extraction due to the deeper nature of networks that similar features are extracted by different layers. In the super resolution reconstruction module
the channel-splitting strategy is implemented to eliminate the redundant features in the network. The residual groups extracted feature maps are segmented into two streamlines through each scale of
$$C$$
channels. To extract richer information
one streamline is linked to the following residual groups. Another streamline is connected with the features to other residual groups straightforward. To preserve high-frequency details in the super-resolution images
the channel splitting can be used to extract diversified features from low resolution infrared image.
Result
2
To evaluate our method-proposed
our model is trained and tested on the FLIR-aligned dataset. The training set in FLIR-aligned is organized in terms of 1 480 pairs
and each pair is composed of an infrared and a visible image. There are 126 testing image pairs in FLIR-aligned testing set. We compare our method to the guided and single image super resolution methods proposed for the visible or infrared images either. Two kinds of deep-learning based methods are compared in relevant to the guided super resolution methods: 1) pyramidal edge-maps and attention-based super-resolution-guided (PAGSR) and 2) unaligned thermal super-resolution-guided (UGSR). Among single image super resolution methods
we compare channel split convolutional neural network (ChasNet)
an infrared image super resolution method to a few state-of-the-art visible image super resolution deep super-resolution network-enhanced (EDSR)
residual channel attention network(RCAN)
information multi-distillation network (IMDN)
holistic attention network (HAN) and image restoration using Swin Transformer (SwinIR). The super resolution results are evaluated on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Our network is optimized much more in terms of the average PSNR and SSIM values on the 126 images in the FLIR-aligned test set. Specifically
the comparative analysis is illustrated on the three aspects: 1) for the guided super-resolution method UGSR proposed in 2021: the PSNR is 0.75 dB higher and the SSIM is 0.041 higher. 2) For the infrared image super-resolution method ChasNet proposed in 2021: the PSNR and SSIM are improved by 1.106 dB and 0.06 of each. 3) For the advanced visible image super-resolution method RCAN: the PSNR is improved by 0.763 dB
and the SSIM is improved by 0.049 either.
Conclusion
2
To extract high-frequency information from the high resolution visible images and provide detailed texture
our guided transformer super resolution model is demonstrated for generating the high resolution infrared image. The correlation information-involved is beneficial to image super resolution between infrared image and visible image. We illustrate that our model has its potentials for high-frequency details reconstruction and objects' structure preservation in terms of PSNR and SSIM.
图像超分辨率图像融合红外图像Transformer深度学习
image super-resolutionimage fusioninfrared imageTransformerdeep learning
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI: 10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13]
Fang Q Y, Han D P and Wang Z K. 2022. Cross-modality fusion transformer for multispectral object detection. [EB/OL]. [2021-12-01].https://arxiv.org/pdf/2111.00273.pdfhttps://arxiv.org/pdf/2111.00273.pdf
Gupta H and Mitra K. 2020. Pyramidal edge-maps and attention based guided thermal super-resolution//Proceedings of the European Conference on Computer Vision. Glasgow, UK: Springer: 698-715[DOI: 10.1007/978-3-030-67070-2_42http://dx.doi.org/10.1007/978-3-030-67070-2_42]
Gupta H and Mitra K. 2022. Toward unaligned guided thermal super-resolution. IEEE Transactions on Image Processing, 31: 433-445 [DOI: 10.1109/tip.2021.3130538]
Han T Y, Kim Y J and Song B C. 2017. Convolutional neural network-based infrared image super resolution under low light environment//Proceedings of the 25th European Signal Processing Conference (EUSIPCO). Kos, Greece: IEEE: 803-807[DOI: 10.23919/EUSIPCO.2017.8081318http://dx.doi.org/10.23919/EUSIPCO.2017.8081318]
Hui Z, Gao X B, Yang Y C and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 2024-2032[DOI: 10.1145/3343031.3351084http://dx.doi.org/10.1145/3343031.3351084]
Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711[DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]
Kim J, Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182]
Kim J, Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1637-1645[DOI: 10.1109/CVPR.2016.181http://dx.doi.org/10.1109/CVPR.2016.181]
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer vision and Pattern Recognition. Honolulu, USA: IEEE: 105-114[DOI: 10.1109/CVPR.2017.19http://dx.doi.org/10.1109/CVPR.2017.19]
Lee K, Lee J, Lee J, Hwang S and Lee S. 2017. Brightness-based convolutional neural network for thermal image enhancement. IEEE Access, 5: 26867-26879 [DOI: 10.1109/access.2017.2769687]
Liang J Y, Cao J Z, Sun G L, Zhang K, Van Gool L and Timofte R. 2021. SwinIR: image restoration using swin transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE: 1833-1844[DOI: 10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210]
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]
Niu B, Wen W L, Ren W Q, Zhang X D, Yang L P, Wang S Z, Zhang K H, Cao X C and Shen H F. 2020. Single image super-resolution via a holistic attention network//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 191-207[DOI: 10.1007/978-3-030-58610-2_12http://dx.doi.org/10.1007/978-3-030-58610-2_12]
Sajjadi M S M, Schölkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510[DOI: 10.1109/ICCV.2017.481http://dx.doi.org/10.1109/ICCV.2017.481]
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/tip.2003.819861]
Wu H L, Li W Y and Zhang L B. 2022. Cross-scale coupling network for continuous-scale image super-resolution. Journal of Image and Graphics, 27(5): 1604-1615
吴瀚霖, 李宛谕, 张立保. 2022. 跨尺度耦合的连续比例因子图像超分辨率. 中国图象图形学报, 27(5): 1604-1615 [DOI: 10.11834/jig.210815]
Xu W J, Song H H, Yuan X T and Liu Q S. 2021. Lightweight attention feature selection recursive network for super-resolution. Journal of Image and Graphics, 26(12): 2826-2835
徐雯捷, 宋慧慧, 袁晓彤, 刘青山. 2021. 轻量级注意力特征选择循环网络的超分重建. 中国图象图形学报, 26(12): 2826-2835 [DOI: 10.11834/jig.200555]
Zhang X D, Li C L, Meng Q P, Liu S J, Zhang Y and Wang J Y. 2018a. Infrared image super resolution by combining compressive sensing and deep learning. Sensors, 18(8): #2587 [DOI: 10.3390/s18082587]
Zhang Y L, Li K P, Li K, Wang L C, Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18]
Zhang Z F, Wang Z W, Lin Z and Qi H R. 2019. Image super-resolution by neural texture transfer//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7974-7983[DOI: 10.1109/CVPR.2019.00817http://dx.doi.org/10.1109/CVPR.2019.00817]
Zhao X L, Zhang Y L, Zhang T and Zou X M. 2019. Channel splitting network for single MR image super-resolution. IEEE Transactions on Image Processing, 28(11): 5649-5662 [DOI: 10.1109/tip.2019.2921882]
相关作者
相关机构