Ultrasound Image Segmentation Using SAM Combined with Counterfactual Prompt and Cascaded Decoder

Huo Yiru; Feng Jun; Liu Na; Shi Yichen; Yin Mengying

doi:10.11834/jig.240447

Views : 0 下载量: 1 CSCD: 0

PDF
Export
Share
Collection
Album

Ultrasound Image Segmentation Using SAM Combined with Counterfactual Prompt and Cascaded Decoder
Pages: 1-14(2025)
Published Online： 19 February 2025 ，
DOI： 10.11834/jig.240447
稿件说明：

移动端阅览

霍一儒,封筠,刘娜等.结合反事实提示与级联解码SAM的超声图像分割[J].中国图象图形学报,

Huo Yiru,Feng Jun,Liu Na,et al.Ultrasound Image Segmentation Using SAM Combined with Counterfactual Prompt and Cascaded Decoder[J].Journal of Image and Graphics,
霍一儒,封筠,刘娜等.结合反事实提示与级联解码SAM的超声图像分割[J].中国图象图形学报, DOI： 10.11834/jig.240447.

Huo Yiru,Feng Jun,Liu Na,et al.Ultrasound Image Segmentation Using SAM Combined with Counterfactual Prompt and Cascaded Decoder[J].Journal of Image and Graphics, DOI： 10.11834/jig.240447.

摘要

目的

分割一切模型（Segment Anything Model，SAM）在自然图像分割领域已取得显著成就，但其应用于医学成像尤其是涉及对比度低、边界模糊、形状复杂的超声图像时，分割过程往往需要人工干预，并且会出现分割性能下降的情况。针对上述问题，提出一种结合反事实提示与级联解码SAM的改进方法（SAM combined with Counterfactual prompt and cascaded Decoder，SAMCD）。

方法

SAMCD在SAM的基础上增加旁路CNN图像编码器、跨分支交互适配器、提示生成器和级联解码器。首先，通过使用旁路CNN编码器以及所设计的跨分支交互适配器，补充ViT编码器缺乏的局部信息，以提高模型对细节的捕捉能力；然后，引入因果学习的反事实干预机制，通过生成反事实提示，迫使模型专注于事实提示生成，提高模型分割精度；其次，采用所提出的级联解码器获得丰富的边缘信息，即先利用SAM的原始解码器创建先验掩码，再使用加入边界注意力的Transformer解码器和像素解码器；最后，在训练模型时采用两阶段的训练策略，即交互分割模型训练阶段和自动分割模型训练阶段。

结果

在TN3K和BUSI这2个数据集上进行实验，SAMCD的DSC值分别达到83.66%和84.29%，较SAMCT提升0.73、0.90个百分点，且较对比的SAM及其变体模型更为轻量化；相较于9种先进方法，SAMCD在DSC、mIoU、HD、敏感性和特异性指标上均达到最优。消融实验和可视化分析表明提出的SAMCD方法具有明显的提升效果。

结论

本文提出的超声医学图像分割SAMCD方法在充分利用SAM强大的特征表达能力的基础上，通过对编码器、提示生成器、解码器和训练策略的改进，能够精准地捕获超声图像中的复杂局部细节和小目标，提高超声医学图像自动分割效果。

Abstract

Objective

Ultrasound imaging is a fundamental tool in medical diagnosis due to its convenience， non-radiative， and cost-effectiveness， making it an indispensable component of clinical diagnostics. However， accurately localizing and extracting detailed features from ultrasound images， particularly in cases involving complex pathological boundaries such as nodules and cysts， remains a significant challenge. Traditional Convolutional Neural Networks （CNNs） are proficient in feature extraction through convolutional layers， but their limited receptive fields often result in a loss of global information. Conversely， Transformer-based models are adept at capturing global features through self-attention mechanisms， yet they frequently fail to capture local details effectively. Additionally， their high computational requirements limit their practical use in real-time medical applications. The recent Segmentation Any Model （SAM） has shown notable success in natural image segmentation. However， its performance declines when applied to medical imaging， particularly in ultrasound image segmentation， often necessitating manual intervention. This decline is primarily due to SAM is trained exclusively on natural images， which exhibit a domain distribution vastly different from medical images. To address this limitation， we propose an enhanced SAM model， i.e. SAM combined with Counterfactual Prompt and Cascaded Decoder （SAMCD）.

Method

SAMCD enhances the existing SAM framework by incorporating a Bypass CNN image encoder， a Simple Cross-Branch Interaction Adapter （SCIA）， a counterfactual intervention prompt generator， and a cascaded decoder. Initially， we utilize Bypass CNN encoder and a novel module named SCIA. Integrating the Bypass CNN encoder with the SCIA module compensates for the ViT encoder's lack of local information， thereby enhancing the model’s ability to capture fine details. Next， to adapt to the prompts produced by the prompt generator and to optimize its output， we introduce a counterfactual intervention mechanism based on causal learning. This mechanism forces the model to focus on factual prompt generation， enhancing the learning capability of the prompt generator， improving the model's segmentation precision， and reducing dependency on high-quality prompts. Additionally， to capture richer edge information， we designed a cascaded decoder. SAM's original decoder is used to create a prior mask， followed by an edge-attention enhanced Transformer decoder and pixel decoder to further understand rich edge information and optimize the segmentation results. Finally， we employ a two-stage training strategy to enhance the model's segmentation performance and accelerate convergence. The first stage focuses on training the interactive segmentation model， while the second stage concentrates on training the automatic segmentation model that incorporates a prompt generator. In the experiments， the hardware platform is NVIDIA GeForce RTX 3090， the programming language is Python 3.9， and the deep learning framework is PyTorch. the network is trained with a Batch size of 4， a learning rate of 0.0001， and a number of training rounds of 200， and the Adam optimizer is chosen. Before training， SAMCD is initialized with SAM weights， and during training， the images are scaled to 256x256 resolution using bilinear interpolation.

Result

Experiments were conducted on the TN3K and BUSI datasets to evaluate the SAMCD model using a range of metrics including DSC， mIoU， HD， ACC， Sen and Spe. Notably， lower HD values indicate better segmentation performance， while higher values for the other metrics like DSC and mIoU， ranging from 0 to 1， indicate better performance. In these evaluations， SAMCD has a DSC score of 83.66% on the TN3K dataset and 84.29% on the BUSI dataset， which are higher than the original SAM， MedSAM， SAMed， and SAMCT. Compared to SAMCT， on the TN3K dataset， SAMCD improves 0.91% and 0.16% on mIoU and ACC， and improves 20.43% and 12.91% on average compared to the rest of the SAM-related comparison models. In comparison with non-SAM approaches， the DSC values on the TN3K dataset are 4.65%， 3.29%， 13.58%， 5.16%， and 2.22% higher than those of U-Net， CE-Net， SwinUnet， TransFuse， and TransFuse， respectively； and on ACC they are higher than those of TransFuse and TransFuse by 0.79% and 0.29%； and on average 4.95% and 0.46% higher on Sen and Spe than the five non-SAM methods. In addition， SAMCD requires fewer training parameters and consumes less computational resources compared to SAM-related models. Ablation experiments and visual analyses further validate the significant performance gains from the SAMCD method.

Conclusion

SAMCD leverages the strong feature extraction capabilities of SAM. By enhancing the encoder， prompt generator， decoder and training strategy， SAMCD accurately capture the complex local details and small targets in the ultrasonic image and improve the automatic segmentation effect of ultrasonic medical images.

关键词

Keywords

references

Al-Dhabyani W， Gomaa M， Khaled H and A. Fahmy A . 2020 . Dataset of breast ultrasound images //Dataset of breast ultrasound images［ DOI： 10.1016/j.dib.2019.104863 http://dx.doi.org/10.1016/j.dib.2019.104863 ］

Cao H ， Wang Y Y ， Chen J ， J D S ， Zhang X P ， Tian Q and Wang M . 2023 . Swin-unet： Unet-like pure transformer for medical image segmentation // Proceedings of 2023 Computer Vision Euro pean Conference on Computer Vision . Tel Aviv，Israel ： Springer： 205 - 218 . ［ DOI： 10.1007/978-3-031-25066-8_9 http://dx.doi.org/10.1007/978-3-031-25066-8_9 ］

Chen J ， Lu Y ， Yu Q ， Luo X D ， Adeli E ， Wang Y ， Lu L ， Yuille A L and Zhou Y Y . 2021 . TransUNet： Transformers Make Strong Encoders for Medical Image Segmentation ［EB/OL］.［ 2021-4-12 ］. https://arxiv.org/abs/2102.04306 https://arxiv.org/abs/2102.04306

Chen L X ， Lin C H ， Zheng Z L ， Mo Z F ， Huang X Y ， Zhao G S . 2023 . Research Overview of Transformer in Computer Vision Scenarios . Computer Science ， 2023 （ 12 ）： 130 - 147 .

陈洛轩，林成创，郑招良，莫泽枫，黄心怡，赵淦森 . Transformer在计算机视觉场景下的研究综述 . 计算机科学， 2023 （ 12 ）： 130 - 147 .

Cheng J L ， Ye J ， Deng Z Y ， Chen J P ， Li T B ， Wang H Y ， Su Y Z ， Huang Z Y ， Chen J L ， Jiang L ， He J J ， Zhang S T ， Zhu M and Qian Y . 2023 . SAM-Med 2 D［EB/OL］.［ 2023-10-4 ］. https://doi.org/10.48550/arXiv.2308.16184 https://doi.org/10.48550/arXiv.2308.16184

Cheng Z ， Wei Q ， Zhu H ， Wang Y ， Qu L ， Shao W and Zhou Y . 2024 . Unleashing the potential of SAM for medical adaptation via hierarchical decoding // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2024： 3511 - 3522 . https：//doi.org/10.48550/arXiv.2403.18271 https://doi.org/10.48550/arXiv.2403.18271

Dhruv P and Naskar S . 2020 . Image classification using convolutional neural network（CNN）and recurrent neural network（RNN）： a review //Swain D，Pattnaik P K and Gupta P K，eds. Machine Learning and Information Processing . Singapore ： Springer： 367 - 381 ［ DOI： 10.1007/978-981-15-1884-3_34 http://dx.doi.org/10.1007/978-981-15-1884-3_34 ］

Dou H ， Zhang P ， Su W ， Yu Y ， Lin Y ， and Li X . 2023 . Gaitgci： Generative counterfactual intervention for gait recognition // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2023： 5578 - 5588 . ［ DOI： 10.1109/CVPR52729.2023.00540 http://dx.doi.org/10.1109/CVPR52729.2023.00540 ］

Xia F ， Shao H J and Deng X . 2022 . Cross-stage deep-learning-based MRI fused images of human brain tumor segmentation . Journal of Image and Graphics ， 27 （ 3 ）： 873 - 884

夏峰，邵海见，邓星 . 2022 . 融合跨阶段深度学习的脑肿瘤MRI图像分割 . 中国图象图形学， 27 （ 3 ）： 873 - 884 ［ DOI： 10.11834/jig.210330 http://dx.doi.org/10.11834/jig.210330 ］

Fu Z ， Yang H ， Haoran S ， AnthonyMan-Cho ， Wai B ， Lidong and Collier . 2023 . On the effectiveness of parameter-efficient fine-tuning // Proceedings of AAAI conference on artificial intelligence . 2023， 37 （ 11 ）： 12799 - 12807 .

Gong H ， Chen J ， Chen G ， Li H ， Li G ， and Chen F . 2023 . Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules .// Proceedings of 2023 Computers in biology and medicine . 2023， 155 ： 106389 ［ DOI： 10.1016/J.COMPBIOMED.2022.106389 http://dx.doi.org/10.1016/J.COMPBIOMED.2022.106389 ］

Gu Z ， Cheng J ， Fu H ， Gu Z ， Cheng J ， Fu H Z ， Zhou K ， Hao H Y ， Zhao Y T ， Zhang T Y and Gao S H and Liu J . 2019 . Ce-net： Context encoder network for 2d medical image segmentation // Proceedings of the IEEE transactions on medical imaging ， 2019， 38 （ 10 ）： 2281 - 2292 . ［ DOI： 10.1109/tmi.2019.2903562 http://dx.doi.org/10.1109/tmi.2019.2903562 ］

Han K and Wang Y ， Chen H T ， Chen X H ， Guo J Y ， Liu Z H ， Tang Y H ， Xiao A ， Xu C J ， Xu Y X ， Yang Z H ， Zhang YM and Tao D C 2022 . A survey on vision transformer // Proceedings of 2022 IEEE Trans . Pattern Anal. Mach. Intell 45 （ 1 ）： 87 - 110 . ［ DOI： 10.1109/tpami.2022.3152247 http://dx.doi.org/10.1109/tpami.2022.3152247 ］

Kirillov A ， Mintun E ， Ravi N ， Mao H Z ， Rolland C ， Gustafson L ， Xiao T ， Whitehead S ， Berg A C ， Lo W Y ， Dolln P and Girshick R B . 2023 . Segment Anything // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision ， Paris， France ： IEEE： 3992 - 4003 ［ DOI： 10.1109/ICCV51070.2023 http://dx.doi.org/10.1109/ICCV51070.2023 ］

Lin X ， Xiang Y ， Zhang L ， Yang X ， Yan Z Q and Yu L . 2023 . SAMUS： Adapting Segment Anything Model for Clinically-Friendly and

Generalizable Ultrasound Image Segmentation ［EB/OL］.［ 2023-9-19 ］. https://doi.org/10.48550/arXiv.2309.06824 https://doi.org/10.48550/arXiv.2309.06824

Lin X ， Xiang Y ， Wang Z ， Cheng K T ， Yan Z Q and Yu L . 2024 . SAMCT： Segment Any CT Allowing Labor-Free Task-Indicator Prompts ［EB/OL］.［ 2024-3-13 ］. https://doi.org/10.48550/arXiv.2403.13258 https://doi.org/10.48550/arXiv.2403.13258

Liu Z L ， Chen G ， Shan Z Y and Jang X Q . 2018 . Segmentation of spine CT image based on deep learning . Computer Applications and Soft⁃ ware ， 35 （ 10 ）： 200 - 204， 273

刘忠利，陈光，单志勇，蒋学芹 . 2018 . 基于深度学习的脊柱CT图像分割 . 计算机应用与软件， 35 （ 10 ）： 200- 204 ， 273 ［ DOI： 10.3969/j.issn.1000-386x.2018. 10.036 http://dx.doi.org/10.3969/j.issn.1000-386x.2018.10.036 ］

Ma J ， He Y T ， Li F F ， Han ， You C Y and Wang B . 2023a . Segment Anything in Medical Images ［EB/OL］.［ 2024-01-10 ］ https://arxiv.org/pdf/2304.12306.pdf https://arxiv.org/pdf/2304.12306.pdf

Noble J A ， Boukerroui D . 2006 . Ultrasound image segmentation： a survey ［J］. Proceedings of IEEE Transactions on medical imaging ， 2006， 25 （ 8 ）： 987 - 1010 .［ DOI：10.1109/tmi.2006.877092］

Ronneberger O ， Fischer P and Brox T . 2015 . U-Net： Convolutional Networks for Biomedical Image Segmentation // Proceedings of the Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 . Munich， Germany ： Springer： 234 - 241 ［ DOI： 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ］

Siddique N ， Paheding S ， Elkin C P and Devabhaktuni V K . 2021 . U-net and its variants for medical image segmentation： A review of theory and applications // Proceedings of 2021 IEEE access ， 9 ： 82031 - 82057 .［ DOI： 10.1109/access.2021.3086020 http://dx.doi.org/10.1109/access.2021.3086020 .］

Shi P ， Qiu J ， Abaxi S M D ， Wei H ， Lo F P . W and Yuan W . 2023 . Generalist vision foundation models for medical imaging： A case study of segment anything model on zero-shot medical segmentation ［EB/OL］. ［ 2024-02-21 ］. https://doi.org/10.48550/arXiv.2304.12637 https://doi.org/10.48550/arXiv.2304.12637

Wang M ， Huang Z Z ， He H G ， Lu H C ， Shan H M and Zhang J P . 2024 . Potential and prospects of segment anything model： a survey . Journal of Image and Graphics ， 29 （ 06 ）： 1479 - 1509

王淼，黄智忠，何晖光，卢湖川，单洪明，张军平 . 2024 . 分割一切模型SAM的潜力与展望：综述 . 中国图象图形学报， 29 （ 06 ）： 1479 - 1509 ［ DOI： 10. 11834/jig. 230792 http://dx.doi.org/10.11834/jig.230792 ］

Wang Y ， Chen K ， Yuan W ， Meng C and Bai X Z . 2024 . SAMIHS： adaptation of segment anything model for intracranial hemorrhage segmentation ［EB/OL］.［ 2024-02-05 ］. https://doi.org/10.48550/arXiv.2311.08190 https://doi.org/10.48550/arXiv.2311.08190

Wang Y ， Ge X ， Ma H ， Qi S L ， Zhang G J and Yao Y D . 2021 . Deep Learning in Medical Ultrasound Image Analysis： A Review // Proceedings of 2021 IEEE Access ， 2021 ： 54310 - 54324 . ［DOI：10.1109/access.2021.3071301］.

Wu J ， Fu R ， Fang H ， Liu Y P ， Wang Z W ， Xu Y W ， Jin Y M and Arbel T . 2023 . Medical SAM Adapter： Adapting Segment Anything Model for Medical Image Segmentation ［EB/OL］.［ 2023-3-3 ］ https://doi.org/10.48550/arXiv.2304.12620 https://doi.org/10.48550/arXiv.2304.12620

Xiao H ， Li L ， Liu Q ， Zhu X H and Zhang Q H . 2023 . Transformers in medical image segmentation： A review // Proceedings of 2023 Biomedical Signal Processing and Control ， 2023， 84 ： 104791 .［ DOI： 10.1016/J.BSPC.2023.104791 http://dx.doi.org/10.1016/J.BSPC.2023.104791 ］

Yu D ， Peng Y J and Guo Y F . 2023 . Ultrasonic image segmentation of thyroid nodules-relevant multi-scale feature based h-shape network . Journal of Image and Graphics ， 28 （ 7 ）： 2195 - 2207

于典，彭延军，郭燕飞 . 2023 . 面向甲状腺结节超声图像分割的多尺度特征融合“h”形网络 . 中国图象图形学报， 28 （ 7 ）： 2195 - 2207 ［ DOI： 10.11834/jig.220078 http://dx.doi.org/10.11834/jig.220078 ］

Zhang Y ， Liu H and Hu Q . 2021 . Transfuse： Fusing transformers and cnns for medical image segmentation // Proceedings of 2021 Medical image computing and computer assisted intervention ，Strasbourg，France，MICCAI： 14 - 24 .［ DOI： 10.1007/978-3-030-87193-2_2 http://dx.doi.org/10.1007/978-3-030-87193-2_2 ］

Zhang K and Liu D . 2023 . Customized segment anything model for medical image segmentation ［EB/OL］.［ 2023-03-03 ］. https://doi.org/10.48550/arXiv.2304.13785 https://doi.org/10.48550/arXiv.2304.13785

Alert me when the article has been cited

提交

暂无数据