改进U-Net3+与跨模态注意力块的医学图像融合

王丽芳; 米嘉; 秦品乐; 蔺素珍; 高媛; 刘阳

doi:10.11834/jig.211066

医学图像处理 | 浏览量 : 0 下载量: 218 CSCD: 1

PDF
导出
分享
收藏
专辑

改进U-Net3+与跨模态注意力块的医学图像融合
Medical image fusion using improved U-Net3+ and cross-modal attention blocks
2022年27卷第12期页码：3622-3636
收稿日期：2021-11-07，

修回日期：2022-01-11，

录用日期：2022-1-18，

纸质出版日期：2022-12-16
DOI： 10.11834/jig.211066
稿件说明：

移动端阅览

王丽芳, 米嘉, 秦品乐, 蔺素珍, 高媛, 刘阳. 改进U-Net3+与跨模态注意力块的医学图像融合[J]. 中国图象图形学报, 2022,27(12):3622-3636. DOI： 10.11834/jig.211066.

Lifang Wang, Jia Mi, Pinle Qin, Suzhen Lin, Yuan Gao, Yang Liu. Medical image fusion using improved U-Net3+ and cross-modal attention blocks[J]. Journal of image and graphics, 2022, 27(12): 3622-3636. DOI： 10.11834/jig.211066.

摘要

目的

针对目前多模态医学图像融合方法深层特征提取能力不足，部分模态特征被忽略的问题，提出了基于U-Net3+与跨模态注意力块的双鉴别器生成对抗网络医学图像融合算法(U-Net3+ and cross-modal attention block dual-discriminator generative adversal network，UC-DDGAN)。

方法

结合U-Net3+可用很少的参数提取深层特征、跨模态注意力块可提取两模态特征的特点，构建UC-DDGAN网络框架。UC-DDGAN包含一个生成器和两个鉴别器，生成器包括特征提取和特征融合。特征提取部分将跨模态注意力块嵌入到U-Net3+下采样提取图像深层特征的路径上，提取跨模态特征与提取深层特征交替进行，得到各层复合特征图，将其进行通道叠加、降维后上采样，输出包含两模态全尺度深层特征的特征图。特征融合部分通过将特征图在通道上进行拼接得到融合图像。双鉴别器分别对不同分布的源图像进行针对性鉴别。损失函数引入梯度损失，将其与像素损失加权优化生成器。

结果

将UC-DDGAN与5种经典的图像融合方法在美国哈佛医学院公开的脑部疾病图像数据集上进行实验对比，其融合图像在空间频率(spatial frequency

SF)、结构相似性(structural similarity

SSIM)、边缘信息传递因子(degree of edge information

$$ {\rm{Q}}^{{\rm{A B / F}}}$$

)、相关系数(correlation coefficient

CC)和差异相关性(the sum of the correlations of differences

SCD)等指标上均有提高，SF较DDcGAN(dual discriminator generation adversative network)提高了5.87%，SSIM较FusionGAN(fusion generative adversarial network)提高了8%，

$$ {\rm{Q}}^{{\rm{A B / F}}}$$

较FusionGAN提高了12.66%，CC较DDcGAN提高了14.47%，SCD较DDcGAN提高了14.48%。

结论

UC-DDGAN生成的融合图像具有丰富深层特征和两模态关键特征，其主观视觉效果和客观评价指标均优于对比方法，为临床诊断提供了帮助。

Abstract

Objective

Multi-modal medical image fusion tends to get more detailed features beyond single modal defection. The deep features of lesions are essential for clinical diagnosis. However

current multi-modal medical image fusion methods are challenged to capture the deep features. The integrity of fusion image is affected when extracting features from a single modal only. In recent years

deep learning technique is developed in image processing

and generative adversarial network (GAN)

as an important branch of deep learning

has been widely used in image fusion. GAN not only reduces information loss but also highlights key features through information confrontation between different original images. The deep feature extraction ability of current multi-modal medical image fusion methods is insufficient and some modal features are ignored. We develop a medical image fusion method based on the improved U-Net3+ and cross-modal attention blocks in combination with dual discriminator generation adversative network (UC-DDGAN).

Method

The UC-DDGAN image fusion modal is mainly composed of full scale connected U-Net3+ network structure and two modal features integrated cross-modal attention blocks. The U-Net3+ network can extract deep features

and the cross-modal attention blocks can extract different modal features in terms of the correlation between images. Computed tomography (CT) and magnetic resonance (MR) can be fused through the trained UC-DDGAN

which has a generator and two discriminators. The generator is used to extract the deep features of image and generate fusion image. The generator includes two parts of feature extraction and feature fusion. In the feature extraction part

the encoding and decoding of coordinated U-Net3+ network complete feature extraction. In the coding stage

the input image is down-sampled four times to extract features

and cross-modal attention blocks are added after each down-sampling to obtain two modal composite feature maps. Cross-modal attention block not only calculates self-attention in a single image

but also extends the calculation of attention to two modes. By calculating the relationship between local features and global features of the two modes

the fusion image preserves the overall of image information. In the decoding stage

the decoder receives the feature maps in the context of the same scale encoder and the maximum pooling based smaller scale encoder and the dual up-sampling based large scale encoder. Then

64 filters with a size of 3×3 are linked to the feature image channels. The synthesized feature maps of each layer are combined and up-sampled. After 1×1 convolution for channel dimension reduction

the feature maps are fused into the image which contains depth features on the full scale of the two modes. In the feature fusion part

to obtain the fusion image with deep details and the key features of the two modes

the two feature maps are synthesized and concatenated via the concat layer

and five convolution modules for channel dimension reduction layer by layer. The discriminator is focused on leveraging original image from fusion image via the distribution of different samples. To identify the credibility of the input images

the characteristics of different modal images are integrated with different distribution. In addition

gradient loss is melted into the loss function calculation

and the weighted sum of gradient loss and pixel loss are as the loss function to optimize the generator.

Result

To validate the quality of fusion image

UC-DDGAN is compared to five popular multi-modal image fusion methods

including Laplasian pyramid(LAP)

pulse-coupled neural network(PCNN)

convolutional neural network(CNN)

fusion generative adversarial network(FusionGAN) and dual discriminator generative adversarial network(DDcGAN). The edges of fusion results obtained by LAP are fuzzy in qualitative

which are challenged to observe the contour of the lesion. The brightness of fusion results obtained by PCNN is too low. The CNN-based fusion results are lack of deep details

and the internal details cannot be observed. The fusion results obtained by using FusionGAN pay too much attention to MR modal images and lose the bone information of CT images. The edges of fusion results obtained by DDcGAN are not smooth enough. 1)The fusion results of cerebral infarction disease obtained by UC-DDGAN can show clear brain gullies

2)the fusion results of cerebral apoplexy disease can clarify color features

3)the fusion results of cerebral tumor disease show brain medulla and bone information are fully reserved

and 4)the fusion results of cerebrovascular disease contain deep-based information of brain lobes. To evaluate the performance of UC-DDGAN

quantitative results are based on the selected thirty typical image pairs and five classical methods. The fusion image generated by UC-DDGAN is improved on spatial frequency (SF)

structural similarity (SSIM)

edge information transfer factor (

$$ {\rm{Q}}^{{\rm{A B / F}}}$$

)

correlation coefficient (CC)

and the sum of the correlations of differences (SCD). 1)SF is improved by 5.87% in contrastive to DDcGAN

2)SSIM is improved by 8% compared to FusionGAN

$$ {\rm{Q}}^{{\rm{A B / F}}}$$

is improved by 12.66%

CC is improved by 14.47% and 4)SCD is improved by 14.48% in comparison with DDcGAN

respectively.

Conclusion

A dual discriminator generation adversative network based (UC-DDGAN-based) medical image fusion method is developed based on the improved U-Net3+ and cross-modal attention blocks. The fusion image generated by UC-DDGAN is linked to richer deep features and key features of two modes.

关键词

Keywords

references

Buslaev A, Iglovikov V I, Khvedchenya E, Parinov A, Druzhinin M and Kalinin A A. 2020. Albumentations: fast and flexible image augmentations. Information, 11(2): #125 [DOI: 10.3390/info11020125]

Cai X, Liu X Y, An M Y and Han G. 2021. Vision-based fall detection using dense block with multi-channel convolutional fusion strategy. IEEE Access, 9: 18318-18325 [DOI: 10.1109/ACCESS.2021.3054469]

Cherry J M, Adler C, Ball C A, Chervitz S A, Dwight S S, Hester E T, Jia Y K, Juvik G, Roe T, Schroeder M, Weng S and Botstein D. 1998. SGD: saccharomyces genome database. Nucleic Acids Research, 26(1): 73-79 [DOI: 10.1093/nar/26.1.73]

Custódio A, Rocha H and Vicente L N. 2010. Incorporating minimum Frobenius norm models in direct search. Computational Optimization and Applications, 46(2): 265-278 [DOI: 10.1007/s10589-009-9283-0]

Gai D, Shen X J, Cheng H and Chen H P. 2019. Medical image fusion via PCNN based on edge preservation and improved sparse representation in NSST domain. IEEE Access, 7: 85413-85429 [DOI: 10.1109/ACCESS.2019.2925424]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90 ]

Huang F S and Lin S Z. 2019. Multi-band image fusion rules comparison based on the Laplace pyramid transformation method. Infrared Technology, 41(1): 64-71

黄福升. 蔺素珍. 2019. 基于拉普拉斯金字塔变换方法的多波段图像融合规则比较. 红外技术, 41(1): 64-71

Huang H M, Lin L F, Tong R F, Hu H J, Zhang Q W, Iwamoto Y, Han X H, Chen Y W and Wu J. 2020. UNet 3+: a full-scale connected UNet for medical image segmentation//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona, Spain: IEEE: 1055-1059 [ DOI: 10.1109/icassp40776.2020.9053405 http://dx.doi.org/10.1109/icassp40776.2020.9053405 ]

Indhumathi R, Nagarajan S and Indira K P. 2021. Hybrid pixel-based method for multimodal medical image fusion based on integration of pulse-coupled neural network (PCNN) and genetic algorithm (GA)//Patnaik S, Yang X S and Sethi I K. Advances in Machine Learning and Computational Intelligence. Singapore, Singapore: Springer: 853-867 [ DOI: 10.1007/978-981-15-5243-4_82 http://dx.doi.org/10.1007/978-981-15-5243-4_82 ]

Jung H, Kim Y, Jang H, Ha N and Sohn K. 2020. Unsupervised deep image fusion with structure tensor representations. IEEE Transactions on Image Processing, 29: 3845-3858 [DOI: 10.1109/TIP.2020.2966075]

Kingma D P and Ba J. 2017. Adam: a methodfor stochastic optimization[EB/OL]. [2021-01-30] . https://arxiv.com/pdf/1412.6980.pdf https://arxiv.com/pdf/1412.6980.pdf

Kurakin A, Goodfellow I J and Bengio S. 2018. Adversarial examples in the physical world//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview. net: 98-111

Lin S Z and Han Z. 2017. Images fusion based on deep stack convolutional neural network. Chinese Journal of Computers, 40(11): 2506-2518

蔺素珍, 韩泽. 2017. 基于深度堆叠卷积神经网络的图像融合. 计算机学报, 40(11): 2506-2518 [DOI: 10.11897/SP.J.1016.2017.02506]

Liu Y, Chen X, Peng H and Wang Z F. 2017. Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36: 191-207 [DOI: 10.1016/j.inffus.2016.12.001]

Liu Y, Chen X, Wang Z F, Wang Z J, Ward R K and Wang X S. 2018. Deep learning for pixel-level image fusion: recent advances and future prospects. Information Fusion, 42: 158-173 [DOI: 10.1016/j.inffus.2017.10.007]

Ma J Y, Xu H, Jiang J J, Mei X G and Zhang X P. 2020. DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29: 4980-4995 [DOI: 10.1109/TIP.2020.2977573]

Ma J Y, Yu W, Liang P W, Li C and Jiang J J. 2019. FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion, 48: 11-26 [DOI: 10.1016/j.inffus.2018.09.004]

Maqsood S and Javed U. 2020. Multi-modal medical image fusion based on two-scale image decomposition and sparse representation. Biomedical Signal Processing and Control, 57: #101810 [DOI: 10.1016/j.bspc.2019.101810]

Mbilinyi A and Schuldt H. 2020. Cross-modality medical image retrieval with deep features//Proceedings of 2020 IEEE International Conference on Bioinformatics and Biomedicine. Seoul, Korea (South): IEEE: 2632-2639 [ DOI: 10.1109/bibm49941.2020.9313211 http://dx.doi.org/10.1109/bibm49941.2020.9313211 ]

Nikolaev A V, de Jong L, Weijers G, Groenhuis V, Mann R M, Siepel F J, Maris B M, Stramigioli S, Hansen H H G and de Korte C L. 2021. Quantitative evaluation of an automated cone-based breast ultrasound scanner for MRI-3D US image fusion. IEEE Transactions on Medical Imaging, 40(4): 1229-1239 [DOI: 10.1109/TMI.2021.3050525]

Nour M, Cömert Z and Polat K. 2020. A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization. Applied Soft Computing, 97: #106580 [DOI: 10.1016/j.asoc.2020.106580]

Pan Y, Pi D C, Khan I A, Khan Z U, Chen J F and Meng H. 2021. DenseNetFuse: a study of deep unsupervised DenseNet to infrared and visual image fusion. Journal of Ambient Intelligence and Humanized Computing, 12(11): 10339-10351 [DOI: 10.1007/s12652-020-02820-3]

Qin X B, Zhang Z C, Huang C Y, Dehghan M, Zaiane E R and Jagersand M. 2020. U 2 -Net: going deeper with nested U-structure for salient object detection. Pattern Recognition, 106: #107404 [DOI: 10.1016/j.patcog.2020.107404]

Ratliff L J, Burden S A and Sastry S S. 2013. Characterization and computation of local Nash equilibria in continuous games//Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing. Monticello, USA: IEEE: 917-924 [ DOI: 10.1109/allerton.2013.6736623 http://dx.doi.org/10.1109/allerton.2013.6736623 ]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]

Song H, Kang J and Lee S. 2018. ConcatNet: a deep architecture of concatenation-assisted network for dense facial landmark alignment//Proceedings of the 25th IEEE International Conference on Image Processing. Athens, Greece: IEEE: 2371-2375 [ DOI: 10.1109/icip.2018.8451375 http://dx.doi.org/10.1109/icip.2018.8451375 ]

Song X R, Guo H T, Xu X N, Chao H Q, Xu S, Turkbey B, Wood B J, Wang G and Yan P K. 2021. Cross-modal attention for MRI and ultrasound volume registration//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention. Strasbourg, France: Springer: 66-75 [ DOI: 10.1007/978-3-030-87202-1_7 http://dx.doi.org/10.1007/978-3-030-87202-1_7 ]

Veshki F G, Ouzir N, Vorobyov S A and Ollila E. 2021. Coupled feature learning for multimodal medical image fusion[EB/OL]. [2021-02-17] . https://arxiv.org/pdf/2102.08641.pdf https://arxiv.org/pdf/2102.08641.pdf

Wang L F, Zhang C C, Qin P L, Lin S Z, Gao Y and Dou J L. 2020. Image registration method with residual dense relativistic average CGAN. Journal of Image and Graphics, 25(4): 745-758

王丽芳, 张程程, 秦品乐, 蔺素珍, 高媛, 窦杰亮. 2020. 残差密集相对平均CGAN的脑部图像配准. 中国图象图形学报, 25(4): 745-758 [DOI: 10.11834/jig.190116]

Wang X L, Girshick R, Gupta A and He K M. 2017. Non-local neural networks [EB/OL]. [2021-04-13] . https://arxiv.org/pdf/1711.07971.pdf https://arxiv.org/pdf/1711.07971.pdf

Wang Y C, Xu S, Liu J M, Zhao Z X, Zhang C X and Zhang J S. 2021. MFIF-GAN: a new generativeadversarial network for multi-focus image fusion. Signal Processing: Image Communication, 96: #116295 [DOI: 10.1016/j.image.2021.116295]

Xiao B, Xu B C, Bi X L and Li W S. 2021. Global-feature encoding U-Net (GEU-Net) for multi-focus image fusion. IEEE Transactions on Image Processing, 30: 163-175 [DOI: 10.1109/TIP.2020.3033158]

Xiong Y J, Gao Y B, Wu H and Yao Y. 2021. Attention U-Net with feature fusion module for robust defect detection. Journal of Circuits, Systems and Computers, 30(15): #2150272 [DOI: 10.1142/S0218126621502728]

Yan L, Hao Q, Cao J, Saad R, Li K, Yan Z G and Wu Z M. 2021. Infrared and visible image fusion via octave Gaussian pyramid framework. Scientific Reports, 11(1): #1235 [DOI: 10.1038/s41598-020-80189-1]

Yang Z G, Chen Y P, Le Z L and Ma Y. 2021. GANFuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Computing and Applications, 33(11): 6133-6145 [DOI: 10.1007/s00521-020-05387-4]

Zhang X C, Ye P and Xiao G. 2020b. VIFB: a visible and infrared image fusion benchmark//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE: 468-478 [ DOI: 10.1109/cvprw50498.2020.00060 http://dx.doi.org/10.1109/cvprw50498.2020.00060 ]

Zhang Y, Liu Y, Sun P, Yan H, Zhao X L and Zhang L. 2020a. IFCNN: a general image fusion framework based on convolutional neural network. Information Fusion, 54: 99-118 [DOI: 10.1016/j.inffus.2019.07.011]

Zhou Z W, Siddiquee M R, Tajbakhsh N and Liang J M. 2018. UNet++: a nested u-net architecture for medical image segmentation//Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Granada, Spain: Springer: 3-11 [ DOI: 10.1007/978-3-030-00889-5_1 http://dx.doi.org/10.1007/978-3-030-00889-5_1 ]

文章被引用时，请邮件提醒。

提交

渐进特征提取和频域信息补充的多模态医学图像融合