引入分组注意力的医学图像分割模型
Group attention-based medical image segmentation model
- 2023年28卷第10期 页码:3231-3242
纸质出版日期: 2023-10-16
DOI: 10.11834/jig.220748
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-10-16 ,
移动端阅览
张学峰, 张胜, 张冬晖, 刘瑞. 2023. 引入分组注意力的医学图像分割模型. 中国图象图形学报, 28(10):3231-3242
Zhang Xuefeng, Zhang Sheng, Zhang Donghui, Liu Rui. 2023. Group attention-based medical image segmentation model. Journal of Image and Graphics, 28(10):3231-3242
目的
2
卷积神经网络结合U-Net架构的深度学习方法广泛应用于各种医学图像处理中,取得了良好的效果,特别是在局部特征提取上表现出色,但由于卷积操作本身固有的局部性,导致其在全局信息获取上表现不佳。而基于Transformer的方法具有较好的全局建模能力,但在局部特征提取方面不如卷积神经网络。为充分融合两种方法各自的优点,提出一种基于分组注意力的医学图像分割模型(medical image segmentation module based on group attention,GAU-Net)。
方法
2
利用注意力机制,设计了一个同时集成了Swin Transformer和卷积神经网络的分组注意力模块,并嵌入网络编码器中,使网络能够高效地对图像的全局和局部重要特征进行提取和融合;在注意力计算方式上,通过特征分组的方式,在同一尺度特征内,同时进行不同的注意力计算,进一步提高网络提取语义信息的多样性;将提取的特征通过上采样恢复到原图尺寸,进行像素分类,得到最终的分割结果。
结果
2
在Synapse多器官分割数据集和ACDC(automated cardiac diagnosis challenge)数据集上进行了相关实验验证。在Synapse数据集中,Dice值为82.93%,HD(Hausdorff distance)值为12.32%,相较于排名第2的方法,Dice值提高了0.97%,HD值降低了5.88%;在ACDC数据集中,Dice值为91.34%,相较于排名第2的方法提高了0.48%。
结论
2
本文提出的医学图像分割模型有效地融合了Transformer和卷积神经网络各自的优势,提高了医学图像分割结果的精确度。
Objective
2
The end-to-end automatic medical image segmentation model has been concerned about recently. The emerging deep learning method has been widely used in various medical image processing tasks based on an integrated convolutional neural network (CNN) and U-Net architecture, especially for its potential ability of local feature extraction. Due to the inherent locality of the convolution operation itself, it is still challenged for global information acquisition further. The Transformer-based method is focused on global modeling capabilities, but it is still required to optimize CNNs-based local feature extraction farther. To fully integrate the potentials of two methods, we develop a group attention based medical image segmentation model, called GAU-Net.
Method
2
First, to integrate the potentials of the convolutional neural network and the Swin Transformer, a dual of group attention module is designed that the Swin Transformer is linked to the convolutional neural network in parallel using the attention mechanism. To extract the global features of the image, a series of Swin Transformer modules are recognized as the sub-modules. The spatial and pixel channel attention modules are constructed using the convolutional neural network, and two of them are combined in series to develop the mixed attention in the group attention module. The sub-module can be used to extract key local features in the medical image on the spatial scale and pixel channel dimension, and two sub-modules-extracted features is spliced in the channel dimension, a residual unit is employed for feature fusion, and attention module-extracted key global and local features are grouped and fused, and the constructed group attention module is embedded in each layer of network encoder. Second, the attention calculation method is required to be focused on because of existing computational redundancy and efficient matching with the group attention module structure. To get simultaneous different attention calculations, encoder-extracted features are grouped proportionally in the feature channel dimension before it is input into the group attention module, and it can reduce the computational redundancy problem effectively and the diversity and richness of the network model-extracted semantic feature information are improved further. Finally, the extracted deep features are restored by layer-by-layer 2-fold upsampling to the original image size, and pixel classification is adopted to get the final segmentation result. At the same time, the class imbalance problem in the image is involved in, and the model training process is easily affected by irrelevant background pixels, and the linear combination of generalized dice loss and cross-entropy loss is used to solve the class imbalance problem and accelerate model convergence.
Result
2
Such of experimental verifications are carried out on the Synapse dataset and the ACDC dataset. The Synapse dataset consists of 30 cases with a total of 3 779 axial abdominal clinical computed tomography (CT) images. The data of 18 patient samples are used as the training set, and 12 patient samples are used as a test set. This dataset is labeled for 8 sort of abdominal organs in related to the aorta, gallbladder, spleen, left kidney, right kidney, liver, pancreas, and stomach. The ACDC dataset is collected from different patients using a magnetic resonance imaging (MRI) scanner. For each patient’s image, the left ventricle, right ventricle and myocardium are labeled as well. This dataset is composed of 70 training samples, 10 validation samples and 20 a test sample. Dice similarity coefficient and Hausdorff Distance95 are opted as the evaluation index to evaluate the accuracy of model segmentation results. Furthermore, ablation experiments are carried out to test the effectiveness of all modules and combinations. For the Synapse dataset, compared to the second-ranked method MISSFormer, the Dice value is increased by 0.97%, and the Hausdorff distance (HD) value is decreased by 5.88% and reached 82.93% (Dice) and 12.32 % (HD), respectively. For the ACDC dataset, compared to the second-ranked method MISSFormer, the Dice value is increased by 0.48% and reached to 91.34% (Dice).
Conclusion
2
Our medical image segmentation model proposed can be used to develop an integrated optimization for Swin Transformer and convolutional neural network effectively. The group attention module and group attention operation mode are melted into as well, which can improves the accuracy of medical image segmentation results further.
深度学习卷积神经网络(CNN)医学图像分割U-Net分组注意力Swin Transformer
deep learningconvolutional neural network(CNN)medical image segmentationU-Netgroup attentionSwin Transformer
Cao H, Wang Y Y, Chen J, Jiang D S, Zhang X P, Tian Q and Wang M N. 2023. Swin-Unet: Unet-like pure Transformer for medical image segmentation//Proceedings of 2023 Computer Vision European Conference on Computer Vision. Tel Aviv, Israel: Springer: 205-218 [DOI: 10.1007/978-3-031-25066-8_9http://dx.doi.org/10.1007/978-3-031-25066-8_9]
Chen J N, Lu Y Y, Yu Q H, Luo X D, Adeli E, Wang Y, Lu L, Yuille A L and Zhou Y Y. 2021. TransUNet: Transformers make strong encoders for medical image segmentation [EB/OL]. [2022-06-12]. https://arxiv.org/pdf/2102.04306.pdfhttps://arxiv.org/pdf/2102.04306.pdf
Çiçek Ö, Abdulkadir A, Lienkamp S S, Brox T and Ronneberger O. 2016. 3D U-Net: learning dense volumetric segmentation from sparse annotation//Proceedings of the 19th International Conference on Medical Image Computing and Computer-Assisted Intervention. Athens, Greece: Springer: 424-432 [DOI: 10.1007/978-3-319-46723-8_49http://dx.doi.org/10.1007/978-3-319-46723-8_49]
Dhruv P and Naskar S. 2020. Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): a review//Swain D, Pattnaik P K and Gupta P K, eds. Machine Learning and Information Processing. Singapore: Springer: 367-381 [DOI: 10.1007/978-981-15-1884-3_34http://dx.doi.org/10.1007/978-981-15-1884-3_34]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2020. An image is worth 16 × 16 words: Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Girum K B, Créhange G and Lalande A. 2021. Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging, 40(6): 1542-1554 [DOI: 10.1109/TMI.2021.3060497http://dx.doi.org/10.1109/TMI.2021.3060497]
Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019. CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562http://dx.doi.org/10.1109/TMI.2019.2903562]
Hao X Y, Xiong J F, Xue X D, Shi J, Wen K, Han W T, Li X Y, Zhao J and Fu X L. 2020. 3D U-Net with dual attention mechanism for lung tumor segmentation. Journal of Image and Graphics, 25(10): 2119-2127
郝晓宇, 熊俊峰, 薛旭东, 石军, 文可, 韩文廷, 李骁扬, 赵俊, 傅小龙. 2020. 融合双注意力机制3D U-Net的肺肿瘤分割. 中国图象图形学报, 25(10): 2119-2127[DOI: DOI: 10.11834/jig.200282http://dx.doi.org/DOI:10.11834/jig.200282]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023 [DOI: 10.1109/TPAMI.2019.2913372http://dx.doi.org/10.1109/TPAMI.2019.2913372]
Huang H M, Lin L F, Tong R F, Hu H J, Zhang Q W, Iwamoto Y, Han X H, Chen Y W and Wu J. 2020. UNet 3+: a full-scale connected UNet for medical image segmentation//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona, Spain: IEEE: 1055-1059 [DOI: 10.1109/ICASSP40776.2020.9053405http://dx.doi.org/10.1109/ICASSP40776.2020.9053405]
Huang X H, Deng Z F, Li D D and Yuan X G. 2021. MISSFormer: an effective medical image segmentation Transformer [EB/OL]. [2022-06-12]. https://arxiv.org/pdf/2109.07162.pdfhttps://arxiv.org/pdf/2109.07162.pdf
Isensee F, Jaeger P F, Full P M, Wolf I, Engelhardt S and Maier-Hein K H. 2018. Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features//Proceedings of the 8th International Workshop on Statistical Atlases and Computational Models of the Heart. Quebec City, Canada: Springer: 120-129 [DOI: 10.1007/978-3-319-75541-0_13http://dx.doi.org/10.1007/978-3-319-75541-0_13]
Isensee F, Jaeger P F, Kohl S A A, Petersen J and Maier-Hein K H. 2021. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2): 203-211 [DOI: 10.1038/s41592-020-01008-zhttp://dx.doi.org/10.1038/s41592-020-01008-z]
Li C Y, Bai J and Zheng L. 2022. A U-Net based contour enhanced attention for medical image segmentation. Journal of Graphics, 43(2): 273-278
李翠云, 白静, 郑凉. 2022. 融合边缘增强注意力机制和U-Net网络的医学图像分割. 图学学报, 43(2): 273-278 [DOI: 10.11996/JG.j.2095-302X.2022020273http://dx.doi.org/10.11996/JG.j.2095-302X.2022020273]
Li Z H, Li Y X, Li Q D, Wang P Y, Zhang Y, Guo D Z, Lu L, Jin D K and Hong Q Q. 2022. LViT: language meets vision Transformer in medical image segmentation [EB/OL]. [2022-08-14]. https://arxiv.org/pdf/2206.14718.pdfhttps://arxiv.org/pdf/2206.14718.pdf
Lin X, Yu L, Cheng K T and Yan Z Q. 2022. BATFormer: towards boundary-aware lightweight Transformer for efficient medical image segmentation [EB/OL]. [2023-04-19]. https://arxiv.org/pdf/2206.14409.pdfhttps://arxiv.org/pdf/2206.14409.pdf
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 9992-10002 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE: 565-571 [DOI: 10.1109/3DV.2016.79http://dx.doi.org/10.1109/3DV.2016.79]
Park J, Woo S, Lee J Y and Kweon I S. 2018. BAM: bottleneck attention module//Proceedings of 2018 British Machine Vision Conference. Newcastle, UK: BMVA Press: #147
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B and Rueckert D. 2019. Attention gated networks: learning to leverage salient regions in medical images. Medical Image Analysis, 53: 197-207 [DOI: 10.1016/j.media.2019.01.012http://dx.doi.org/10.1016/j.media.2019.01.012]
Simantiris G and Tziritas G. 2020. Cardiac MRI segmentation with a dilated CNN incorporating domain-specific constraints. IEEE Journal of Selected Topics in Signal Processing, 14(6): 1235-1243 [DOI: 10.1109/JSTSP.2020.3013351http://dx.doi.org/10.1109/JSTSP.2020.3013351]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010 [DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803 [DOI: 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813]
Woo S, Park J, Lee J Y and Kweon I S. 2018. Cbam: Convolutional block attention module//Proceedings of 2018 European Conference on Computer Vision (ECCV). Munich, Germany: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xiao X, Lian S, Luo Z M and Li S Z. 2018. Weighted Res-UNet for high-quality retina vessel segmentation//Proceedings of the 9th International Conference on Information Technology in Medicine and Education. Hangzhou, China: IEEE: 327-331 [DOI: 10.1109/ITME.2018.00080http://dx.doi.org/10.1109/ITME.2018.00080]
Zhang Y D, Liu H Y and Hu Q. 2021. TransFuse: fusing Transformers and CNNs for medical image segmentation//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention. Strasbourg, France: Springer: 14-24 [DOI: 10.1007/978-3-030-87193-2_2http://dx.doi.org/10.1007/978-3-030-87193-2_2]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239 [DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
Zhou R, Guo F M, Azarpazhooh M R, Hashemi S, Cheng X Y, Spence J D, Ding M Y and Fenster A. 2021. Deep learning-based measurement of total plaque area in b-mode ultrasound images. IEEE Journal of Biomedical and Health Informatics, 25(8): 2967-2977 [DOI: 10.1109/JBHI.2021.3060163http://dx.doi.org/10.1109/JBHI.2021.3060163]
Zhou Z W, Rahman Siddiquee M, Tajbakhsh N and Liang J M. 2018. UNet++: a nested U-net architecture for medical image segmentation//Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis, ML-CDS: International Workshop on Multimodal Learning for Clinical Decision Support. Granada, Spain: Springer: 3-11 [DOI: 10.1007/978-3-030-00889-5_1http://dx.doi.org/10.1007/978-3-030-00889-5_1]
相关作者
相关机构