多尺度分解和八度卷积相结合的红外与可见光图像融合

张子晗; 吴小俊; 徐天阳

doi:10.11834/jig.220517

红外与可见光图像融合 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

多尺度分解和八度卷积相结合的红外与可见光图像融合
Multi-scale decomposition and octave convolution based infrared and visible image fusion
2023年28卷第1期页码：179-195
纸质出版日期： 2023-01-16 ，

录用日期： 2022-09-12
DOI： 10.11834/jig.220517
稿件说明：

移动端阅览

张子晗, 吴小俊, 徐天阳. 多尺度分解和八度卷积相结合的红外与可见光图像融合[J]. 中国图象图形学报, 2023,28(1):179-195.

Zihan Zhang, Xiaojun Wu, Tianyang Xu. Multi-scale decomposition and octave convolution based infrared and visible image fusion[J]. Journal of Image and Graphics, 2023,28(1):179-195.
张子晗, 吴小俊, 徐天阳. 多尺度分解和八度卷积相结合的红外与可见光图像融合[J]. 中国图象图形学报, 2023,28(1):179-195. DOI： 10.11834/jig.220517.

Zihan Zhang, Xiaojun Wu, Tianyang Xu. Multi-scale decomposition and octave convolution based infrared and visible image fusion[J]. Journal of Image and Graphics, 2023,28(1):179-195. DOI： 10.11834/jig.220517.

摘要

目的

在基于深度学习的红外与可见光图像融合方法中，多尺度分解是一种提取不同尺度特征的重要方式。针对传统多尺度分解方法里尺度设置粗糙的问题，提出了一种基于八度(octave)卷积的改进图像融合算法。

方法

融合方法由4部分组成：编码器、特征增强、融合策略和解码器。首先，使用改进后的编码器获取源图像的多尺度上的低频、次低频和高频特征。这些特征会被从顶层到底层进行强化。其次，将这些特征按照对应的融合策略进行融合。最后，融合后的深度特征由本文设计的解码器重构为信息丰富的融合图像。

结果

实验在TNO和RoadScene数据集上与9种图像融合算法进行比较。主观评价方面，所提算法可以充分保留源图像中的有效信息，融合结果也符合人的视觉感知；客观指标方面，在TNO数据集上所提算法在信息熵、标准差、视觉信息保真度、互信息和基于小波变换提取局部特征的特征互信息5个指标上均有最优表现，相较于9种对比方法中最优值分别提升了0.54%，4.14%，5.01%，0.55%，0.68%。在RoadScene数据集上所提算法在信息熵、标准差、视觉信息保真度和互信息4个指标上取得了最优值，相较9种对比方法的最优值分别提升了0.45%，6.13%，7.43%，0.45%，基于小波变换提取局部特征的特征互信息与最优值仅相差0.002 05。

结论

所提融合方法在主观和客观评估中都取得了优秀的结果，可以有效完成图像融合任务。

Abstract

Objective

Image fusion can be as one of the processing techniques in the context of computer vision

which aims to integrate the salient features from multiple input images into a complicated image. In recent years

image fusion approaches have been involved in applications-relevant like video clips analysis and medical-related interpretation. Generally

the existing fusion algorithms consist of two categories of methods: 1) traditional-based and 2) deep learning-based. Most traditional methods have introduced the signal processing operators for image fusion and the completed fusion task. However

the feature extraction and fusion rules are constrained of human-labeled methods. The feature extraction and fusion rule are quite complicated for realizing better fusion results. Thanks to the rapid development of deep learning

current image fusion methods have been facilitated based on this technique. Multi-scale decomposition can be as an effective method to extract the features for deep learning-based infrared image and visible image fusion. To alleviate the rough scale settings in the traditional multi-scale decomposition methods

we develop an improved octave convolution-based image fusion algorithm. Deep features can be divided in terms of octave convolution-based frequency.

Method

Our fusion method is composed of four aspects as following: 1) encoder

2) feature enhancement

3) fusion strategy and 4) decoder. The encoder extracts deep features on four scales source image-derived through convolution and pooling. The deep features-extracted of each scale are subdivided into low-frequency

sub low-frequency and high-frequency features in terms of octave convolution. For enhancement phase

high-level features are added to low-level features for feature enhancement between different scales. High-level high-frequency features are utilized to enhance low-level sub low-frequency features

and high-level sub low-frequency features are utilized to enhance low-level low-frequency features. The low-frequency

sub low-frequency and high-frequency features of each scale are fused based on multiple fusion strategies. To produce the informative fused image

the features-fused are reconstructed via the designed decoder. In our experiment

all requirements are based on the Ubuntu system with NVIDIA GTX 1080Ti GPU. The Python version is 3.6.

10 and the PyTorch is used for implementation. For training phase

the network does not use the fusion strategy. The pairs of infrared and visual images are not required for network training because it just needs deep features extraction and image reconstruction with these deep features. We choose 80 000 images from the dataset MS COCO(Microsoft common objects in context) as the training set of our auto-encoder network

which is converted to grayscale and then resized to 256×256 pixels. Adam optimizer is utilized to optimize our model. The learning rate

batch size and epochs are set as 1×10

-4

1 and 2 of each. After the training

the network can complete the image fusion task. First

the improved encoder is used to obtain the low-frequency

sub low-frequency and high-frequency features of the source image in multiple scales. These features can be enhanced between top and bottom levels. Second

these features are fused in terms of multiple fusion strategy. Finally

to obtain the informative fused image

the features-fused are reconstructed in terms of the designed decoder.

Result

The proposed fusion algorithm is compared to 9 sorts of existing image fusion algorithms on TNO and RoadScene datasets

and all image fusion algorithms are evaluated qualitatively and quantitatively. This algorithm can fully keep the effective natural-relevant information between the source image and the fused results. It is still challenged to evaluate some algorithms quantitatively

so we choose the 6 objective metrics to evaluate the fusion performance of these methods. Compared with other algorithms on TNO dataset

the proposed algorithm achieves the best performance in five indicators: 1) entropy

2) standard deviation

3) visual information fidelity

4) mutual information and 5) wavelet transform-based feature mutual information. Compared with the best values in the above five metrics of nine existing fusion algorithms

an average increase of are outreached 0.54%

4.14%

5.01%

0.55%

0.68% of each further. The performance of our algorithm-developed on RoadScene dataset is consistent with that on TNO dataset basically. The best values are obtained in 4 kinds of quality metrics: a) entropy

b) standard deviation

c) visual information fidelity

and d) mutual information. Compared to the 9 sort of existing methods

the best values of the four metrics are increased by 0.45%

6.13%

7.43%

and 0.45%

respectively. The gap between the value of our algorithm and the best value is only 0.002 05 in wavelet transform-based feature mutual information.

Conclusion

A novel and effective deep learning architecture is developed for infrared and visible image fusion analysis based on convolutional neural network and octave convolution. This network structure can make full use of multi-scale deep features. The octave convolution makes a more detailed division of the extracted features and the appropriated fusion strategies can be selected for these deep features further. Because low-frequency

sub low-frequency and high-frequency features are divided in each scale

more appropriated features can be selected to enhance low-level features in the feature enhancement phase. The experimental results show that our algorithm has its potentials in image fusion according to qualitative and quantitative evaluation.

关键词

图像处理图像融合八度卷积红外图像可见光图像

Keywords

image processingimage fusionoctave convolutioninfrared imagevisible image

references

Chen Y P, Fan H Q, Xu B, Yan Z C, Kalantidis Y, Rohrbach M, Yan S C and Feng J S. 2019. Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3434-3443[DOI: 10.1109/iccv.2019.00353http://dx.doi.org/10.1109/iccv.2019.00353]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

Haghighat M and Razian M A. 2014. Fast-FMI: non-reference image fusion metric//Proceedings of the 8th IEEE International Conference on Application of Information and Communication Technologies. Astana, Kazakhstan: IEEE: 1-3[DOI: 10.1109/icaict.2014.7036000http://dx.doi.org/10.1109/icaict.2014.7036000]

Han Y, Cai Y Z, Cao Y and Xu X M. 2013. A new image fusion performance metric based on visual information fidelity. Information Fusion, 14(2): 127-135 [DOI: 10.1016/j.inffus.2011.08.002]

Kumar B K S. 2013. Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal, Image and Video Processing, 7(6): 1125-1143 [DOI: 10.1007/s11760-012-0361-x]

Li H and Wu X J. 2019. DenseFuse: a fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5): 2614-2623 [DOI: 10.1109/tip.2018.2887342]

Li H, Wu X J and Durrani T. 2020a. NestFuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement, 69(12): 9645-9656 [DOI: 10.1109/tim.2020.3005230]

Li H, Wu X J and Kittler J. 2018. Infrared and visible image fusion using a deep learning framework//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 2705-2710[DOI: 10.1109/icpr.2018.8546006http://dx.doi.org/10.1109/icpr.2018.8546006]

Li H, Wu X J and Kittler J. 2020b. MDLatLRR: a novel decomposition method for infrared and visible image fusion. IEEE Transactions on Image Processing, 29: 4733-4746 [DOI: 10.1109/tip.2020.2975984]

Li H, Wu X J and Kittler J. 2021. RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Information Fusion, 73: 72-86 [DOI: 10.1016/j.inffus.2021.02.023]

Li X X, Guo X P, Han P F, Wang X, Li H G and Luo T. 2020c. Laplacian redecomposition for multimodal medical image fusion. IEEE Transactions on Instrumentation and Measurement, 69(9): 6880-6890 [DOI: 10.1109/TIM.2020.2975405]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Liu Y, Chen X, Peng H and Wang Z F. 2017. Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36: 191-207 [DOI: 10.1016/j.inffus.2016.12.001]

Liu Y, Chen X, Ward R K and Wang Z J. 2016. Image fusion with convolutional sparse representation. IEEE Signal Processing Letters, 23(12): 1882-1886 [DOI: 10.1109/lsp.2016.2618776]

Ma J L, Zhou Z Q, Wang B and Zong H. 2017. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics and Technology, 82: 8-17 [DOI: 10.1016/j.infrared.2017.02.005]

Ma J Y, Yu W, Liang P W, Li C and Jiang J J. 2019. FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion, 48: 11-26 [DOI: 10.1016/j.inffus.2018.09.004]

Nunez J, Otazu X, Fors O, Prades A, Pala V and Arbiol R. 1999. Multiresolution-based image fusion with additive wavelet decomposition. IEEE Transactions on Geoscience and Remote Sensing, 37(3): 1204-1211 [DOI: 10.1109/36.763274]

Piella G. 2003. A general framework for multiresolution image fusion: from pixels to regions. Information Fusion, 4(4): 259-280 [DOI: 10.1016/S1566-2535(03)00046-0]

Rao Y J. 1997. In-fibre Bragg grating sensors. Measurement Science and Technology, 8(4): 355-375 [DOI: 10.1088/0957-0233/8/4/002]

Roberts J W, van Aardt J A and Ahmed F B. 2008. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1): #023522 [DOI: 10.1117/1.2945910]

Sulaiman M A and Labadin J. 2015. Feature selection based on mutual information//Proceedings of the 9th International Conference on IT in Asia (CITA). Sarawak, Malaysia: IEEE: 1-6[DOI: 10.1109/CITA.2015.7349827http://dx.doi.org/10.1109/CITA.2015.7349827]

TOET A. 2014. TNO image fusion dataset[EB/OL]. [2021-02-20].https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029

Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]

Xu H, Ma J Y, Jiang J J, Guo X J and Ling H B. 2020a. U2Fusion: a unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1): 502-518 [DOI: 10.1109/TPAMI.2020.3012548]

Xu H, Ma J Y, Le Z L, Jiang J J and Guo X J. 2020b. FusionDN: a unified densely connected network for image fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 12484-12491 [DOI: 10.1609/aaai.v34i07.6936]

Xu T Y, Feng Z H, Wu X J and Kittler J. 2019a. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Transactions on Image Processing, 28(11): 5596-5609 [DOI: 10.1109/TIP.2019.2919201]

Xu T Y, Feng Z H, Wu X J and Kittler J. 2019b. Joint group feature selection and discriminative filter learning for robust visual object tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 7950-7960[DOI: 10.1109/iccv.2019.00804http://dx.doi.org/10.1109/iccv.2019.00804]

Xu T Y, Feng Z H, Wu X J and Kittler J. 2019c. Learning low-rank and sparse discriminative correlation filters for coarse-to-fine visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 30(10): 3727-3739 [DOI: 10.1109/tcsvt.2019.2945068]

Yang B and Li S T. 2010. Multifocus image fusion and restoration with sparse representation. IEEE Transactions on Instrumentation and Measurement, 59(4): 884-892 [DOI: 10.1109/tim.2009.2026612]

Yin M, Duan P H, Chu B and Liang X Y. 2016. Image fusion based on non-subsampled quaternion shearlet transform. Journal of Image and Graphics, 21(10): 1289-1297

殷明, 段普宏, 褚标, 梁翔宇. 2016. 非下采样四元数剪切波变换域的图像融合. 中国图象图形学报, 21(10): 1289-1297 [DOI: 10.11834/jig.20161003]

Zhang Y, Liu Y, Sun P, Yan H, Zhao X L and Zhang L. 2020. IFCNN: a general image fusion framework based on convolutional neural network. Information Fusion, 54: 99-118 [DOI: 10.1016/j.inffus.2019.07.011]

Zhou T, Liu S, Dong Y L, Huo B Q and Ma Z J. 2021. Research on pixel-level image fusion based on multi-scale transformation: progress application and challenges. Journal of Image and Graphics, 26(9): 2094-2110

周涛, 刘珊, 董雅丽, 霍兵强, 马宗军. 2021. 多尺度变换像素级医学图像融合: 研究进展、应用和挑战. 中国图象图形学报, 26(9): 2094-2110 [DOI: 10.11834/jig.200803]

Zhou W, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/tip.2003.819861]

文章被引用时，请邮件提醒。

提交

红外与可见光图像分组融合的视觉Transformer

红外与可见光图像渐进融合深度网络

双尺度分解和显著性分析相结合的红外与可见光图像融合

BEMD分解和W变换相结合的红外与可见光图像融合

红外—可见光跨模态的行人检测综述