多级特征引导网络的红外与可见光图像融合

王彦舜; 聂仁灿; 张谷铖; 杨小飞

发布时间： 2023-01-14
摘要点击次数： 1820
全文下载次数： 692
DOI: 10.11834/jig.220638
2023 | Volume 28 | Number 1

多级特征引导网络的红外与可见光图像融合

摘要

目的以卷积神经网络为基础的深度学习技术在图像融合方面表现出优越的性能。在各类图像融合领域，红外与可见光的图像融合应用十分广泛，这两种图像各自的特性十分鲜明，二者信息交互融合得到的融合图像具有显著的价值和意义。为了提高红外与可见光图像的融合质量，本文提出了一种多级特征引导网络的融合框架。方法本文框架中编码器用于提取源图像的特征，并将多级特征引导至解码器中对融合结果进行重建。为了有效地训练网络，设计了一种混合损失函数。其中，加权保真项约束融合结果与源图像的像素相似度，而结构张量损失鼓励融合图像从源图像中提取更多的结构特征，为了有效进行多尺度信息交互，不同于普通的编解码结构，本文方法在编码器每一层的每一部分均进行特征引导，在编码部分采用池化对尺寸进行缩小，解码采用上采样将尺寸放大，实现多尺度融合与重建，有效弥补了训练过程中卷积层数的堆叠导致的信息的丢失，在编码部分适时地对特征进行引导，及时地与解码层进行融合，在网络结构构建完成后，提出一种损失融合算法，从红外图像和可见光图像各自特点出发，分别设计基于视觉显著性权值估计的2范数损失和基于结构张量的F范数损失。结果为了说明融合方法的可行性，在TNO数据集与RoadScene数据集上进行实验，与传统以及深度学习融合方法进行了视觉对比和客观对比，在信息保真度准则、基于梯度的融合性能边缘信息保持度、非线性相关熵以及基于结构相似度的图像质量测量指标等关键图像评价指标上达到了理想的结果。同时，为了验证提出的网络结构以及损失函数的有效性，使得提出的网络模型完备性得到保证。结论提出的融合模型综合了传统模型和深度学习模型的优点，得到了高质量的融合图像，取得了良好的融合效果。

关键词

图像融合多级特征引导混合损失结构张量显著性检测深度学习

Infrared and visible image fusion based on multi-level guided network

Wang Yanshun, Nie Rencan, Zhang Gucheng, Yang Xiaofei(School of Information Science and Engineering, Yunnan University, Kunming 650505, China)

Abstract

Objective Multi-source image fusion is focused effective information extraction and integration for diversified images. It is beneficial to resolve the insufficient information-related problem for single image and improve the efficiency of data processing for multi-source images. The infrared and visible images are widely used in the context of image processing and have their mutual benefits for information ability. To obtain a clear and accurate description of the scene, the fusion-mutual can optimize the detailed texture information in the visible image and clarify the target information in the infrared image. So, we develop a fusion-mutual algorithm in terms of deep learning-relevant image processing method. Method First, densed convolutional network is improved, and an end-to-end convolutional network fusion model is trained in relevant to encoding and decoding. To reconstruct the fusion results, the encoder is used to extract the features of the source image and guide the multilevel features into the decoder. Each layer-encoded has the feature-guided beyond regular decoding structure. The pooling-based size is narrowed down in the coding part. The upsampling can be enlarged in the decoding part to realize the multi-scale fusion and reconstruction. The training effectiveness can be improved further in the process of convolution layer stack but information loss is followed by. To train the network effectively, a hybrid loss function is designed. The weighted fidelity term will be used to constrain the pixel similarity between the fusion result and the source image when the structural tensor loss is activated in the fusion image to extract more structural features from the source image. The coding part is divided into three layers to ensure effective feature extraction in depth, and each layer is segmented by a pooling layer. The network-depth-via convolution blocks between each layer are down from 3 to 1 gradually. To bridge the extraction of effective network features, it can adapt more extraction of shallow network and less extraction of a deep network. To realize multi-scale information interaction, the features are efficient to be guided in the encoding part. It can be fused into the decoding layer at the same time. For decoding part of the design, the first layer is composed of five parts of the convolution blocks. Our fusion results are obtained after the fifth convolution block output. The second layer is composed of three convolution blocks, and the third layer is constructed based on a convolution block only. The sampling process is interconnected between layers. After the network structure is constructed, we proposed a loss fusion algorithm, which are included L2 saliency detection-based norm constraints; and the F norm constraint is based on structure tensor calculation for infrared and visible light. The image features are user-friendly. The fusion results are mutual-benefited under the control of the network structure and the loss algorithm. Result A series of evaluation indicators are achieved compared to traditional fusion methods and deep learning fusion methods on the TNO dataset and RoadScene dataset. To demonstrate the feasibility of the fusion method, its experiment is carried out on TNO dataset and RoadScene dataset. Furthermore, to validate the effectiveness of our loss function-based algorithm, the network structure and loss are ablated both in terms of the principle of control variables. Conclusion To obtain potential high-quality fusion images and achieve good fusion effects further, our fusion model has shown its priorities for models optimization.

Keywords

image fusion multi-level feature guidance hybrid loss structure tensor detection of significance deep learning

在线采编平台

在线出版

年度会议

下载中心

年度信息