-
导出
分享
-
收藏
-
专辑
红外与可见光图像渐进融合深度网络
A deep progressive infrared and visible image fusion network
- 2023年28卷第1期 页码:156-165
收稿:2022-04-18,
修回:2022-7-28,
录用:2022-8-4,
纸质出版:2023-01-16
DOI: 10.11834/jig.220319
移动端阅览

浏览全部资源
扫码关注微信
导出
分享
收藏
专辑
收稿:2022-04-18,
修回:2022-7-28,
录用:2022-8-4,
纸质出版:2023-01-16
移动端阅览
目的
2
红外与可见光图像融合的目标是获得具有完整场景表达能力的高质量融合图像。由于深度特征具有良好的泛化性、鲁棒性和发展潜力,很多基于深度学习的融合方法被提出,在深度特征空间进行图像融合,并取得了良好的效果。此外,受传统基于多尺度分解的融合方法的启发,不同尺度的特征有利于保留源图像的更多信息。基于此,提出了一种新颖的渐进式红外与可见光图像融合框架(progressive fusion
ProFuse)。
方法
2
该框架以U-Net为骨干提取多尺度特征,然后逐渐融合多尺度特征,既对包含全局信息的高层特征和包含更多细节的低层特征进行融合,也在原始尺寸特征(保持更多细节)和其他更小尺寸特征(保持语义信息)上进行融合,最终逐层重建融合图像。
结果
2
实验在TNO(Toegepast Natuurwetenschappelijk Onderzoek)和INO(Institut National D’optique)数据集上与其他6种方法进行比较,在选择的6项客观指标上,本文方法在互信息(mutual Information,MI)上相比FusionGAN(generative adversarial network for infrared and visible image fusion)方法提升了115.64%,在标准差(standard deviation,STD)上相比于GANMcC(generative adversarial network with multiclassification constraints for infrared and visible image fusion)方法提升了19.93%,在边缘保存度Qabf上相比DWT(discrete wavelet transform)方法提升了1.91%,在信息熵(entopy,EN)上相比GANMcC方法提升了1.30%。主观结果方面,本文方法得到的融合结果具有更高的对比度、更多的细节和更清晰的目标。
结论
2
大量实验表明了本文方法的有效性和泛化性。与其他先进的方法相比,本文方法在主观和客观评估上都显示出更好的结果。
Objective
2
Multi-modal images have been developed based on multiple imaging techniques. The infrared image collects the radiation information of the target in the infrared band. The visible image is more suitable to human visual perception in terms of higher spatial resolution
richer effective information and lower noise. Infrared and visible image fusion (IVIF) can integrate the configurable information of multi-sensors to alleviate the limitations of hardware equipment and obtain more low-cost information for high-quality images. The IVIF can be used for a wide range of applications like surveillance
remote sensing and agriculture. However
there are several challenges to be solved in multi-modal image fusion. For instance
effective information extraction issue from different modalities and the problem-solving for fusion rule of the complementary information of different modalities. Current researches can be roughly divided into two categories: 1) traditional methods and 2) deep learning based methods. The traditional methods decompose the infrared image and the visible image into the transform domain to make the decomposed representation have special properties that are benefit to fusion
then perform fusion in the transform domain
which can depress information loss and avoid the artifacts caused by direct pixel manipulation
and finally reconstruct the fused image. Traditional methods are based on the assumptions on the source image pair and manual-based image decomposition methods to extract features. However
these hand-crafted features are not comprehensive and may cause the sensitivity to high-frequency or primary components and generate image distortion and artifacts. In recent years
data-driven deep learning-based image fusion methods have been developing. Most of the deep learning based fusion methods have been oriented for the infrared and visible image fusion in the deep feature space. Deep learning-based fusion methods can be divided into two categories: 1) convolutional neural network (CNN) for fusion
and 2) generative adversarial network (GAN) to generate fusion images. CNN-based information extraction is not fully utilized by the intermediate layers. The GAN-based methods are challenged to preserving image details in adequately.
Method
2
We develop a novel progressive infrared and visible image fusion framework (ProFuse)
which extracts multi-scale features with U-Net as our backbone
merges the multi-scale features and reconstructs the fused image layer by layer. Our network has composed of three parts: 1) encoder; 2) fusion module; and 3) decoder. First
a series of multi-scale feature maps are generated from the infrared image and the visible image via the encoder. Next
the multi-scale features of the infrared and visible image pair are fused in the fusion layer to obtain fused features. At last
the fused features pass through the decoder to construct the fused image. The network architecture of the encoder and decoder is designed based on U-Net. The encoder consists of the replicable applications of recurrent residual convolutional unit (RRCU) and the max pooling operation. Each down-sampling step can be doubled the number of feature channels
so that more features can be extracted. The decoder aims to reconstruct the final fused image. Every step in the decoder consists of an up-sampling of the feature map followed by a 3 × 3 convolution that halves the number of feature channels
a concatenation with the corresponding feature maps from the encoder
and a RRCU. At the fusion layer
our spatial attention-based fusion method is used to deal with image fusion tasks. This method has the following two advantages. First
it can perform fusion on global information-contained high-level features (at bottleneck semantic layer)
and details-related low-level features (at shallow layers). Second
our method not only perform fusion on the original scale (maintaining more details)
but also perform fusion on other smaller scales (maintaining semantic information). Therefore
the design of progressive fusion is mainly specified in the following two aspects: 1) we conduct image fusion progressively from high-level to low-level and 2) from small-scale to large-scale progressively.
Result
2
In order to evaluate the fusion performance of our method
we conduct experiments on publicly available Toegepast Natuurwetenschappelijk Onderzoek (TNO) dataset and compare it with some state-of-the-art (SOTA) fusion methods including DenseFuse
discrete wavelet transform (DWT)
Fusion-GAN
ratio of low-pass pyramid (RP)
generative adversarial network with multiclassification constraints for infrared and visible image fusion (GANMcC)
curvelet transform (CVT). All these competitors are implemented according to public code
and the parameters are set by referring to their original papers. Our method is evaluated with other methods in subjective evaluation
and some quality metrics are used to evaluate the fusion performance objectively. Generally speaking
the fusion results of our method obviously have higher contrast
more details and clearer targets. Compared with other methods
our method preserves the detailed information of visible and infrared radiation in maximization. At the same time
very little noise and artifacts are introduced in the results. We evaluate the performances of different fusion methods quantitatively via using six metrics
i.e.
entropy (EN)
structure similarity (SSIM)
edge-based similarity measure (Qabf)
mutual information (MI)
standard deviation (STD)
sum of the correlations of differences (SCD). Our method has achieved a larger value on EN
Qabf
MI and STD. The maximum EN value indicates that our method retains richer information than other competitors. The Qabf is a novel objective quality evaluation metric for fused images. The higher the value of Qabf is
the better the quality of the fusion images are. STD is an objective evaluation index that measures the richness of image information. The larger the value
the more scattered the gray-level distribution of the image
the more information the image carries
and the better the quality of the fused image. The larger the value of MI
the more information obtained from the source images
and the better the fusion effect. Our method has an improvement of 115.64% in the MI index compared with the generative adversarial network for infrared and visible image fusion (FusionGAN) method
19.93% in the STD index compared with the GANMcC method
1.91% in the edge preservation (Qabf) index compared with the DWT method and 1.30% in the EN index compared with the GANMcC method. This indicates that our method is effective for IVIF task.
Conclusion
2
Extensive experiments demonstrate the effectiveness and generalization of our method. It shows better results on the evaluations in qualitative and quantitative both.
Alom M Z, Hasan M, Yakopcic C, Taha T M and Asari V K. 2018. Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation[EB/OL ] . [2022-05-29 ] . https://arxiv.org/pdf/1802.06955.pdf https://arxiv.org/pdf/1802.06955.pdf
Aslantas V and Bendes E. 2015. A new image quality metric for image fusion: the sum of the correlations of differences. AEU-International Journal of Electronics and Communications, 69(12): 1890-1896 [DOI: 10.1016/j.aeue.2015.09.004]
Bhatnagar G and Liu Z. 2015. A novel image fusion framework for night-vision navigation and surveillance. Signal, Image and Video Processing, 9(1): 165-175 [DOI: 10.1007/s11760-014-0740-6]
Bulanon D M, Burks T F and Alchanatis V. 2009. Image fusion of visible and thermal images for fruit detection. Biosystems Engineering, 103(1): 12-22 [DOI: 10.1016/j.biosystemseng.2009.02.009]
Burt P J and Adelson E H. 1985. Merging images through pattern decomposition//Proceedings Volume 0575, Applications of Digital Image Processing Ⅷ. San Diego, USA: SPIE: 173-181[ DOI: 10.1117/12.966501 http://dx.doi.org/10.1117/12.966501 ]
Eslami M and Mohammadzadeh A. 2016. Developing a spectral-based strategy for urban object detection from airborne hyperspectral TIR and visible data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(5): 1808-1816 [DOI: 10.1109/JSTARS.2015.2489838]
Fu Y and Wu X J. 2021. A dual-branch network for infrared and visible image fusion//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE: 10675-10680[ DOI: 10.1109/ICPR48806.2021.9412293 http://dx.doi.org/10.1109/ICPR48806.2021.9412293 ]
Huo X, Zhou Y, Chen Y and Tan J Q. 2021. Dual-scale decomposition and saliency analysis based infrared and visible image fusion. Journal of Image and Graphics, 26(12): 2813-2825
霍星, 邹韵, 陈影, 檀结庆. 2021. 双尺度分解和显著性分析相结合的红外与可见光图像融合. 中国图象图形学报, 26(12): 2813-2825 [DOI: 10.11834/jig.200405]
Li H, Manjunath B S and Mitra S K. 1995. Multisensor image fusion using the wavelet transform. Graphical Models and Image Processing, 57(3): 235-245 [DOI: 10.1006/gmip.1995.1022]
Li H and Wu X J. 2019. DenseFuse: a fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5): 2614-2623 [DOI: 10.1109/TIP.2018.2887342]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Liu M W, Wang R H, Li J and Jiao Y Z. 2021. Infrared and visible image fusion with multi-scale anisotropic guided filtering. Journal of Image and Graphics, 26(10): 2421-2432
刘明葳, 王任华, 李静, 焦映臻. 2021. 各向异性导向滤波的红外与可见光图像融合. 中国图象图形学报, 26(10): 2421-2432 [DOI: 10.11834/jig.200339]
Liu Y, Chen X, Cheng J, Peng H and Wang Z F. 2018. Infrared and visible image fusion with convolutional neural networks. International Journal of Wavelets, Multiresolution and Information Processing, 16(3): #1850018 [DOI: 10.1142/S0219691318500182]
Liu Y, Chen X, Ward R K and Wang Z J. 2016. Image fusion with convolutional sparse representation. IEEE Signal Processing Letters, 23(12): 1882-1886 [DOI: 10.1109/LSP.2016.2618776]
Ma J Y, Yu W, Liang P W, Li C and Jiang J J. 2019. FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion, 48: 11-26 [DOI: 10.1016/j.inffus.2018.09.004]
Ma J Y, Zhang H, Shao Z F, Liang P W and Xu H. 2021. GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70: #5005014 [DOI: 10.1109/TIM.2020.3038013]
Nencini F, Garzelli A, Baronti S and Alparone L. 2007. Remote sensing image fusion using the curvelet transform. Information Fusion, 8(2): 143-156 [DOI: 10.1016/j.inffus.2006.02.001]
Piella G and Heijmans H. 2003. A new quality metric for image fusion//Proceedings of 2003 International Conference on Image Processing. Barcelona, Spain: IEEE: 111-173[ DOI: 10.1109/ICIP.2003.1247209 http://dx.doi.org/10.1109/ICIP.2003.1247209 ]
Qu G H, Zhang D L and Yan P F. 2002. Information measure for performance of image fusion. Electronics Letters, 38(7): 313-315 [DOI: 10.1049/el:20020212]
Rao Y J. 1997. In-fibre Bragg grating sensors. Measurement Science and Technology, 8(4): 355-375 [DOI: 10.1088/0957-0233/8/4/002]
Roberts JW, Van Aardt J A and Ahmed F B. 2008. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1): #023522 [DOI: 10.1117/1.2945910]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Toet A. 1989. Image fusion by a ratio of low-pass pyramid. Pattern Recognition Letters, 9(4): 245-253 [DOI: 10.1016/0167-8655(89)9003-2]
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/TIP.2003.819861]
Yang B and Li S T. 2010. Multifocus image fusion and restoration with sparse representation. IEEE transactions on Instrumentation and Measurement, 59(4): 884-892 [DOI: 10.1109/TIM.2009.2026612]
Yang B, Li S T and Sun F M. 2007. Image fusion using nonsubsampled contourlet transform//Proceedings of the 4th International Conference on Image and Graphics (ICIG 2007). Chengdu, China: IEEE: 719-724[ DOI: 10.1109/ICIG.2007.124 http://dx.doi.org/10.1109/ICIG.2007.124 ]
Yu N N, Qiu T S, Bi F and Wang A Q. 2011. Image features extraction and fusion based on joint sparse representation. IEEE Journal of Selected Topics in Signal Processing, 5(5): 1074-1082 [DOI: 10.1109/jstsp.2011.2112332]
相关作者
相关机构
京公网安备11010802024621