多尺度特征融合与加性注意力指导脑肿瘤MR图像分割-Multi-scale feature fusion and additive attention guide brain tumor MR image segmentation

Current Issue Cover

发布时间： 2023-04-20
摘要点击次数： 1388
全文下载次数： 1058
DOI: 10.11834/jig.211073
2023 | Volume 28 | Number 4

多尺度特征融合与加性注意力指导脑肿瘤MR图像分割

孙家阔¹, 张荣¹, 郭立君¹, 汪建华²(1.宁波大学信息科学与工程学院, 宁波 315211;2.宁波大学医学院附属医院, 宁波 315211)

摘要

目的 U-Net是医学图像分割领域中应用最为广泛的基础分割网络，然而U-Net及其各种增强网络在跳跃连接时仅利用相同尺度特征，忽略了具有互补信息的多尺度特征对当前尺度特征的指导作用。同时，跳跃连接时编码器特征和解码器特征所处的网络深度不同，二者直接串联会产生语义特征差距。针对这两个问题，提出了一种新型分割网络，以改进现有网络存在的不足。方法首先，将编码器不同层级具有不同尺度感受野的特征进行融合，并在融合特征与编码器各层级特征间引入加性注意力对编码器特征进行指导，以增强编码器特征的判别性；其次，在编码器特征和解码器特征间采用加性注意力来自适应地学习跳跃连接特征中的重要特征信息，以降低二者间的语义特征差距。结果在多模态脑肿瘤数据集BraTS2020（multimodal brain tumor segmentation challenge 2020）上评估了所提出的网络模型，并进行了消融实验和对比实验。实验结果表明，所提出的网络在BraTS2020验证数据集上关于整个肿瘤、肿瘤核心和增强肿瘤的平均Dice分别为0.887 5、0.719 4和0.706 4，优于2D网络DR-Unet104（deepresidual Unet with 104 convolutional layers）的分割结果，其中肿瘤核心和增强肿瘤的分割结果分别高出后者4.73%和3.08%。结论所提出的分割网络模型，通过将编码器中具有互补信息的多尺度特征进行融合，然后对当前尺度特征进行加性注意力指导，同时在编码器和解码器特征间采用加性注意力机制来降低跳跃连接时二者间的语义特征差距，能更精准地分割MR（magnetic resonance）图像中脑肿瘤子区域。

关键词

医学图像分割脑肿瘤磁共振（MR）图像 U-Net 多尺度特征融合加性注意力

Multi-scale feature fusion and additive attention guide brain tumor MR image segmentation

Sun Jiakuo¹, Zhang Rong¹, Guo Lijun¹, Wang Jianhua²(1.Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China;2.Affiliated Hospital of Medicine School of Ningbo University, Ningbo University, Ningbo 315211, China)

Abstract

Objective U-Net can be as the basic network in medical image segmentation. For U-Net and its various augmented networks，the encoder can extract features from input images in terms of a series of convolution and down-sampling operations. With the convolution and down-sampling operations at each layer of the encoder，the feature map sizes are decreased and the receptive field sizes can be remained to increase. For the network training，each level of the encoder can learn discriminative feature information at the current scale. To improve its feature utilization，the augmented U-Net schemes can melt skip connections between the encoder features and the decoder features into feature information-reused of shallow layers. However，the same scale are concatenated the features only via the skip-connected channel，and the role of multi-scale features with complementary information can be ignored. In addition，encoder features are oriented at a relatively shallow position in the overall network structure，while decoder features are based on a relatively deep position. As a result，a semantic feature gap is required to be bridged between encoder features and decoder features when skip connections are made. To optimize the U-Net and its augmented networks model，a novel segmentation network model is developed. Method We construct a segmentation network in terms of multi-scale feature fusion and additive attention mechanism. First，the features are fused in relevant to multi-scale receptive fields at different levels of the encoder. To guide the encoder features and enhance their discrimination ability，additive attention is introduced between the fused features and the encoder features at each level of the encoder. Second，to bridge the gap between the two semantic features，encoder and decoder features-between additive attention is used to learn important feature information in skip connections features adaptively. Experiments are carried out based on five-fold cross-validation. Multimodal magnetic resonance（MR）images of 234 high-grade glioma（HGG）samples and 59 low-grade glioma（LGG）samples in the BraTS2020 training dataset are used as the training data. MR images of 59 HGG samples and 15 LGG samples from the BraTS2020 training dataset are regarded as the validation data. The validation dataset of BraTS2020 is used as the final test data. The images of each modality are normalized using the Z-Score approach on the basis of the original data. The loss function is used in terms of the categorical cross-entropy loss function. Our model proposed is equipped with Ubuntu 18. 04 operating system using Pycharm based on Keras，and the network model is trained and predicted on a workstation with a 16 GB graphics memory NVIDIA Quadro P5000 GPU. An ADAM optimizer is used in terms of a learning rate of 0. 000 1 and the parameters in the network are initialized using the he_normal parameter initialization method. Our batch size for training the network is set to 12 and the model took 3 days to train after 150 iterations. Result To evaluate the performance of the proposed model，the Dice coefficient and the 95% Hausdorff distance（HD95）are used as evaluation metrics for the segmented regions of whole tumor （WT），tumor core（TC）and enhancing tumor（ET）. To obtain quantitative evaluation results for these evaluation metrics，network-based segmentation results are uploaded to the B潲摡敔牓′昰攲愰琠畯牮敬獩?慥渠摥?瑡桬敵?摴敩捯潮搠数牬?晴敦慯瑲畭爮攠獆?睲桳整渌?獨步椠灳?捧潭湥湮整捡瑴楩潯湮猠?慦牦敥?浴慩摶敥?ess of the proposed network is verified on the BraTS2020 validation dataset. The experimental results show that the average Dice of the proposed network in relevant to ET，WT and TC are 0. 706 4，0. 887 5 and 0. 719 4 of each. Then，the proposed network is investigated in ablation experiments to validate the effectiveness of the proposed multi-scale feature fusion module，the fused feature additive attention module，and the encoder-decoder additive attention concatenate module. The results of the ablation experiments show that the addition of the proposed multi-scale feature fusion module to the backbone network improves the average Dice of the network of ET，WT and TC by 2. 23%，2. 13% and 0. 97%，respectively. In addition，the average Dice values of the network about ET，WT and TC are increased by 1. 54%，0. 58% and 1. 45% more after adding the proposed multi-scale feature fusion and fused feature additive attention modules to the network. The average Dice values of the network related to ET，WT and TC are increased by 2. 46%，0. 82% and 3. 51% further after the proposed encoder-decoder additive attention concatenate module is added to the network. Finally，our optimal network is compared to U-Net and popular augmented networks，as well as other non-U-Net segmentation networks. The proposed network to the 2D network DR-Unet104 is optimized by 4. 73%，3. 08% and 0. 13% for TC，ET and WT. Furthermore，the visualization results show that the proposed network can segment the boundaries of different tumor regions more accurately and achieve a better overall segmentation effect. Conclusion To segment brain tumor sub-regions in MR images more accurately，we develop a novel segmentation network model. It can fuse multi-scale features with complementary information in the encoder，and additive attention guidance can be applied to the features in the current scale. To reduce the gap between the two semantic features，additive attention mechanism is also used between the enc

Keywords

medical image segmentation brain tumor magnetic resonance（MR）images U-Net multi-scale feature fusion additive attention

在线采编平台

在线出版

年度会议

2024图像图形会议

图图Seminar直播回放

下载中心

论文体例和模板

论文版权转让声明

基金项目中英对照

中英文摘要书写要求

参考文献著录格式

封面、封底投稿单

代码论文投稿说明

数据集论文投稿说明

年度信息

年度优秀论文

年度优秀审稿专家

订阅号|日报