自学习规则下的多聚焦图像融合
Multi-focus image fusion with a self-learning fusion rule
- 2020年25卷第8期 页码:1637-1648
收稿:2019-12-16,
修回:2020-2-23,
录用:2020-3-2,
纸质出版:2020-08-16
DOI: 10.11834/jig.190614
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-12-16,
修回:2020-2-23,
录用:2020-3-2,
纸质出版:2020-08-16
移动端阅览
目的
2
基于深度学习的多聚焦图像融合方法主要是利用卷积神经网络(convolutional neural network,CNN)将像素分类为聚焦与散焦。监督学习过程常使用人造数据集,标签数据的精确度直接影响了分类精确度,从而影响后续手工设计融合规则的准确度与全聚焦图像的融合效果。为了使融合网络可以自适应地调整融合规则,提出了一种基于自学习融合规则的多聚焦图像融合算法。
方法
2
采用自编码网络架构,提取特征,同时学习融合规则和重构规则,以实现无监督的端到端融合网络;将多聚焦图像的初始决策图作为先验输入,学习图像丰富的细节信息;在损失函数中加入局部策略,包含结构相似度(structural similarity index measure,SSIM)和均方误差(mean squared error,MSE),以确保更加准确地还原图像。
结果
2
在Lytro等公开数据集上从主观和客观角度对本文模型进行评价,以验证融合算法设计的合理性。从主观评价来看,模型不仅可以较好地融合聚焦区域,有效避免融合图像中出现伪影,而且能够保留足够的细节信息,视觉效果自然清晰;从客观评价来看,通过将模型融合的图像与其他主流多聚焦图像融合算法的融合图像进行量化比较,在熵、
Q
w
、相关系数和视觉信息保真度上的平均精度均为最优,分别为7.457 4,0.917 7,0.978 8和0.890 8。
结论
2
提出了一种用于多聚焦图像的融合算法,不仅能够对融合规则进行自学习、调整,并且融合图像效果可与现有方法媲美,有助于进一步理解基于深度学习的多聚焦图像融合机制。
Objective
2
The existing multi-focus image fusion approaches based on deep learning methods consider a convolutional neural network (CNN) as a classifier. These methods use CNNs to classify pixels into focused or defocused pixels
and corresponding fusion rules are designed in accordance with the classified pixels. The expected full-focused image mainly depends on handcraft labeled data and fusion rule and is constructed on the learned feature maps. The training process is learned based on label pixel. However
manually labeling a focused or defocused pixel is an arduous problem and may lead to inaccurate focus prediction. Existing multi-focus datasets are constructed by adding Gaussian blur to some parts of full-focused images
which makes the training data unrealistic. To solve these issues and enable CNN to adaptively adjust fusion rules
a novel multi-focus image fusion algorithm based on self-learning fusion rules is proposed.
Method
2
Autoencoders are unsupervised learning networks
and their hidden layer can be considered a feature representation of the input samples. Multi-focus images are usually collected from the same scene with public scene information and private focus information
and the paired images should be encoded in their common and private feature spaces
respectively. This study uses joint convolutional autoencoders (JCAEs) to learn structured features. JCAEs consist of public and private branches. The public branches share weights to obtain the common encoding features of multiple input images
and the private branches can acquire private encoding features. A fusion layer with concentrating operation is designed to obtain a self-learned fusion rule and constrain the entire fusion network to work in an end-to-end style. The initial focus map is regarded as a prior input to enable the network to learn precise details. Current multi-focus image fusion algorithms based on deep learning train networks by applying data augmentation to datasets and utilize various skills to adjust the networks. The design of fusion rules is significant. Fusion rules generally comprise direct cascading fusion and pixel-level aspects. The cascading fusion stacks multiple inputs and then blends with the next convolutional layer to help networks gain rich image features. Pixel-level fusion rules are formed with maximum
sum
and mean rules
which can be selected depending on the characteristics of datasets. The mean rule is introduced based on cascading fusion to make the network feasible for achieving the autonomous adjustment of the fusion rules in the training process. The fusion rules of JCAEs are quantitatively and qualitatively discussed to identify the way they work in the process. Image entropy is used to represent the amount of information contained in the aggregated features of grayscale distribution in images. The fusion rules are reasonably demonstrated by calculating the retaining information of the feature map in the network fusion layer. In this study
a pair of multi-focus images is fed into the network
and the feature map of the convolution operation pertaining to the fusion layer is trained to produce fused images. The fusion rules can be visually interpreted by comparing the image information quantity and the learned weight value subjectively. Instead of using the basic loss function to train CNN
the model adds a local strategy to the loss function
including structural similarity index measure and mean squared error. Such a strategy can effectively drive the fusion unit to learn pixel-wise features and ensure accurate image restoration. More accurate and abstract features can be obtained when source images are passed through deep networks rather than shallow networks. However
problems
such as gradient vanishing and high network convergence time
occur in the back-propagation stage of deep networks. The residual network skips a few training layers by using skip connection or shortcut and can easily learn residual images rather than the original input image. Therefore
we use the short connection strategy to improve the feature learning ability of JCAEs.
Result
2
The model is trained on the Keras framework based on TensorFlow. We test our model on Lytro dataset and conduct subjective and objective evaluations with existing multi-focus fusion algorithms to verify the performance of the proposed fusion method. The dataset has been widely used in multi-focus image fusion research. We magnify the key areas
such as the region between focused and defocused pixels in the fusion image
to illustrate the differences of fusion images in detail. From the perspective of subjective evaluation
the model can effectively fuse the focus area and shun the artifacts in the fused image. Detailed information is fused
and thus
the visual effect is naturally clear. From the perspective of objective evaluation
a comparison of the image of the model fusion with the fusion image of other mainstream multi-focus image fusion algorithms demonstrates that the average precision of the entropy
Q
w
correlation coefficient
and visual information fidelity are the best
which are 7.457 4
0.917 7
0.978 8
and 0.890 8
respectively.
Conclusion
2
Most deep learning-based multi-focus image fusion methods fulfill a pattern
that is
employing CNN to classify pixels into focused and defocused ones
manually designing fusion rules in accordance with the classified pixels
and conducting a fusion operation on the original spatial domain or learned feature map to acquire a fused full-focused image. This pipeline ignores considerable useful information of the middle layer and heavily relies on labeled data. To solve the above-mentioned problems
this study proposes a multi-focus image fusion algorithm with self-learning style. A fusion layer is designed based on JCAEs. We discuss its network structure
the loss function design
and a method on how to embed pixel-wise prior knowledge. In this way
the network can output vivid fused images. We also provide a reasonable geometric interpretation of the learnable fusion operation on quantitative and qualitative levels. The experiments demonstrate that the model is reasonable and effective; it can not only achieve self-learning of fusion rules but also performs efficiently with subjective visual perception and objective evaluation metrics. This work offers a new idea for the fusion of multi-focus images
which will be beneficial to further understand the mechanism of deep learning-based multi-focus image fusion and motivate us to develop an interpretable image fusion method with popular neural networks.
Cui Y, Xiong B L, Jiang Y M and Kuang G Y. 2014. Multi-scale approach based on structure similarity for change detection in SAR images. Journal of Image and Graphics, 19(10):1507-1513
崔莹, 熊博莅, 蒋咏梅, 匡纲要. 2014.结合结构相似度的自适应多尺度SAR图像变化检测.中国图象图形学报, 19(10):1507-1513)[DOI:10.11834/jig.20141013]
Han Y, Cai Y Z, Cao Y and Xu X M. 2013. A new image fusion performance metric based on visual information fidelity. Information Fusion, 14(2):127-135[DOI:10.1016/j.inffus.2011.08.002]
Li M J, Dong Y B and Wang X L. 2014. Image fusion algorithm based on gradient pyramid and its performance evaluation. Applied Mechanics and Materials, 525:715-718[DOI:10.4028/www.scientific.net/AMM.525.715]
Li S T, Kang X D, Fang L Y, Hu J W and Yin H T. 2017. Pixel-level image fusion:a survey of the state of the art. Information Fusion, 33:100-112[DOI:10.1016/j.inffus.2016.05.004]
Li S T, Kang X D and Hu J W. 2013a. Image fusion with guided filtering. IEEE Transactions on Image Processing, 22(7):2864-2875[DOI:10.1109/TIP.2013.2244222]
Li S T, Kang X D, Hu J W and Yang B. 2013b. Image matting for fusion of multi-focus images in dynamic scenes. Information Fusion, 14(2):147-162[DOI:10.1016/j.inffus.2011.07.001]
Liang J L, He Y, Liu D and Zeng X J. 2012. Image fusion using higher order singular value decomposition. IEEE Transactions on Image Processing, 21(5):2898-2909[DOI:10.1109/TIP.2012.2183140]
Liu Y and Wang Z F. 2013. Multi-focus image based on wavelet transform and adaptive block. Journal of Image and Graphics, 18(11):1435-1444
刘羽, 汪增福. 2013.结合小波变换和自适应分块的多聚焦图像快速融合.中国图象图形学报, 18(11):1435-1444)[DOI:10.11834/jig.20131106]
Liu Y, Chen X, Peng H and Wang Z F. 2017. Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36:191-207[DOI:10.1016/j.inffus.2016.12.001]
Liu Y, Liu S P and Wang Z F. 2015a. A general framework for image fusion based on multi-scale transform and sparse representation. Information Fusion, 24:147-164[DOI:10.1016/j.inffus.2014.09.004]
Liu Y, Liu S P and Wang Z F. 2015b. Multi-focus image fusion with dense SIFT. Information Fusion, 23:139-155[DOI:10.1016/j.inffus.2014.05.004]
Luo X Q, Xiong M Y and Zhang Z C. 2018. Multi-focus image fusion method based on the joint convolutional auto-encoders network[EB/OL].[2019-12-01] http://kns.cnki.net/kcms/detail/21.1124.TP.20190318.1051.001.html http://kns.cnki.net/kcms/detail/21.1124.TP.20190318.1051.001.html
罗晓清, 熊梦渔, 张战成. 2018.基于联合卷积自编码网络的多聚焦图像融合方法[EB/OL].[2019-12-01] http://kns.cnki.net/kcms/detail/21.1124.TP.20190318.1051.001.html http://kns.cnki.net/kcms/detail/21.1124.TP.20190318.1051.001.html
Luo X Y, Zhang J and Dai Q H. 2012. A regional image fusion based on similarity characteristics. Signal Processing, 92(5):1268-1280[DOI:10.1016/j.sigpro.2011.11.021]
Ma J Y, Ma Y and Li C. 2019. Infrared and visible image fusion methods and applications:a survey. Information Fusion, 45:153-178[DOI:10.1016/j.inffus.2018.02.004]
Mahyari A G and Yazdi M. 2009. A novel image fusion method using curvelet transform based on linear dependency test//Proceedings of 2009 International Conference on Digital Image Processing. Bangkok, Thailand: IEEE: 351-354[ DOI: 10.1109/ICDIP.2009.67 http://dx.doi.org/10.1109/ICDIP.2009.67 ]
Nejati M, Samavi S and Shirani S. 2015. Multi-focus image fusion using dictionary-based sparse representation. Information Fusion, 25:72-84[DOI:10.1016/j.inffus.2014.10.004]
Piella G and Heijmans H. 2003. A new quality metric for image fusion//Proceedings of the International Conference on Image Processing. Barcelona, Spain: IEEE: 173-176[ DOI: 10.1109/ICIP.2003.1247209 http://dx.doi.org/10.1109/ICIP.2003.1247209 ]
Tang H, Xiao B, Li W S and Wang G Y. 2018. Pixel convolutional neural network for multi-focus image fusion. Information Sciences, 433-434:125-141[DOI:10.1016/j.ins.2017.12.043]
Wan T, Zhu C C and Qin Z C. 2013. Multifocus image fusion based on robust principal component analysis. Pattern Recognition Letters, 34(9):1001-1008[DOI:10.1016/j.patrec.2013.03.003]
Yan C M, Guo B L and Yi M. 2012. Multifocus image fusion method based on improved LP and adaptive PCNN. Control and Decision, 27(5):703-707, 712
严春满, 郭宝龙, 易盟. 2012.基于改进LP变换及自适应PCNN的多聚焦图像融合方法.控制与决策, 27(5):703-707, 712)[DOI:10.13195/j.cd.2012.05.66.yanchm.016]
Yang B and Li S T. 2010. Multifocus image fusion and restoration with sparse representation. IEEE Transactions on Instrumentation and Measurement, 59(4):884-892[DOI:10.1109/tim.2009.2026612]
Yang Y, Zheng W J, Huang S Y, Fang Z J and Yuan F N. 2014. Multi-focus image fusion based on human visual perception characteristic in non-subsampled contourlet transform domain. Journal of Image and Graphics, 19(3):447-455
杨勇, 郑文娟, 黄淑英, 方志军, 袁非牛. 2014.人眼视觉感知特性的非下采样Contourlet变换域多聚焦图像融合.中国图象图形学报, 19(3):447-455)[DOI:10.11834/jig.20140215]
Zhang J, He H, Zhan X S and Xiao J. 2014. Three dimensional face reconstruction via feature adaptation and Laplace deformation. Journal of Image and Graphics, 19(9):1349-1359
张剑, 何骅, 詹小四, 肖俊. 2014.结合特征适配与拉普拉斯形变的3维人脸重建.中国图象图形学报, 19(9):1349-1359)[DOI:10.11834/jig.20140912]
Zhang Q and Guo B L. 2009. Multifocus image fusion using the nonsubsampled contourlet transform. Signal Processing, 89(7):1334-1346[DOI:10.1016/j.sigpro.2009.01.012]
Zhang Y, Bai X Z and Wang T. 2017. Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure. Information Fusion, 35:81-101[DOI:10.1016/j.inffus.2016.09.006]
Zhang Y, Liu Y, Sun P, Yan H, Zhao X L and Zhang L. 2020. IFCNN:a general image fusion framework based on convolutional neural network. Information Fusion, 54:99-118[DOI:10.1016/j.inffus.2019.07.011]
Zhao H J, Shang Z W, Tang Y Y and Fang B. 2013. Multi-focus image fusion based on the neighbor distance. Pattern Recognition, 46(3):1002-1011[DOI:10.1016/j.patcog.2012.09.012]
Zhao Y L, Zhou Y and Xu D. 2015. Multi-focus image capture and fusion system for macro photography. Journal of Image and Graphics, 20(4):544-550
赵毅力, 周屹, 徐丹. 2015.微距摄影的多聚焦图像拍摄和融合.中国图象图形学报, 20(4):544-550)[DOI:10.11834/jig.20150411]
Zhou Z Q, Li S and Wang B. 2014. Multi-scale weighted gradient-based fusion for multi-focus images. Information Fusion, 20:60-72[DOI:10.1016/j.inffus.2013.11.005]
相关作者
相关机构
京公网安备11010802024621