结合潜在扩散模型和U型网络的HIFU治疗目标区域提取

翟锦涛; 王润民; 李昂; 田峰; 龚瑾儒; 钱盛友; 邹孝

发布时间： 2024-05-20
摘要点击次数： 289
全文下载次数： 255
DOI: 10.11834/jig.230516
2024 | Volume 29 | Number 5

结合潜在扩散模型和U型网络的HIFU治疗目标区域提取

翟锦涛^1,2, 王润民³, 李昂^1,2, 田峰^1,2, 龚瑾儒^1,2, 钱盛友^1,2, 邹孝^1,2(1.湖南师范大学物理与电子科学学院, 长沙 410081;2.后摩尔时代物理与器件湖南省普通高等学校重点实验室, 长沙 410081;3.湖南师范大学信息科学与工程学院, 长沙 410081)

摘要

目的由于数据采集限制和隐私保护造成高强度聚焦超声（high intensity focused ultrasound，HIFU）治疗超声监控图像数据量过少，导致现有的强监督分割方法提取治疗目标区域不佳。因此，提出了一种结合潜在扩散模型（latent diffusion）和U型网络的HIFU治疗目标区域提取方法。方法生成阶段利用潜在扩散模型和自动筛选模块，实现超声监控图像数据的扩充。目标区域提取阶段提出新型U型分割网络（novel U-shaped segmentation network，NUNet），在编码器端结合空洞空间金字塔池化（atrous spatial pyramid pooling，ASPP），扩大网络的感受野；设计双注意力跳跃连接模块（dual attention skip connection，DAttention-SK），降低边缘纹理信息丢失的风险；引入多交叉熵损失提高网络的分割性能。结果实验结果表明，与其他生成模型相比，本文使用潜在扩散模型生成的超声监控图像在FID（Fréchet inception distance）和LPIPS（learned perceptual image patch similarity）上获得更优的指标（分别为0.172和0.072）；相较于先进的PDF-UNet （U-shaped pyramid-dilated network），在HIFU临床治疗子宫肌瘤超声监控数据集中，本文分割算法的MIoU（mean intersection over union）和DSC（Dice similarity coefficient）分别提高了2.67%和1.39%。为进一步探讨所提算法的泛化性，在乳腺超声公共数据集（breast ultrasound images dataset，BUSI）上进行了验证。相较于M2SNet （multi-scale in multi-scale subtraction network），本文算法MIoU和DSC分别提升了2.11%和1.36%。结论本文算法在一定程度上解决了超声监控图像中数据量过少的问题，实现对监控超声图像中目标区域的精确提取。代码开源地址为https://github.com/425877/based-on-latent-diffusion-model-for-HIFU-treatment-target-region-extraction。

关键词

高强度聚焦超声(HIFU) 图像分割图像生成损失函数潜在扩散模型

Combination of latent diffusion and U-shaped networks for HIFU treatment target region extraction

Zhai Jintao^1,2, Wang Runmin³, Li Ang^1,2, Tian Feng^1,2, Gong Jinru^1,2, Qian Shengyou^1,2, Zou Xiao^1,2(1.School of Physics and Electronics, Hunan Normal University, Changsha 410081, China;2.Key Laboratory of Physics and Devices in Post-Moore Era, College of Hunan Province, Changsha 410081, China;3.School of Information Science and Engineering, Hunan Normal University, Changsha 410081, China)

Abstract

Objective In high intensity focused ultrasound（HIFU）treatment，the target area contains a large amount of pathological information；thus，the target area must be accurately located and extracted by ultrasound monitoring images. As biological tissues and target regions change their relative positions during treatment，the location of the treatment area may also change. At the same time，the diversity of diseases，the variability of tissues，and the complexity of target shapes pose certain challenges for target region extraction in ultrasound medical images. Nevertheless，computers can use advanced image processing and analysis algorithms，combined with big data and machine learning methods，to identify and locate target areas quickly and accurately，providing a reliable basis for quantitative clinical analysis. Traditional image segmentation algorithms mainly include methods，such as threshold segmentation，edge detection，and region growing. However，these methods still have some limitations and are sensitive to the complexity of ultrasound images，noise，and other image quality issues，resulting in poor accuracy and robustness of segmentation results. Meanwhile，traditional methods usually require manual selection of parameters，which limit the adaptive and generalization capabilities of the methods， and have a strong dependence on different images. In recent years，deep learning-based methods have attracted widespread attention and made remarkable progress in the field of medical image segmentation. Most of the methods are performed under strong supervision，yet this type of training requires a large amount of data as support for improved prediction. The amount of data in HIFU therapy ultrasound surveillance images is too small due to patient privacy，differences in acquisition devices，and the need for manual labeling of target areas by specialized physicians. It causes the network not to be adequately trained，making the segmentation results poor in accuracy and robustness. Therefore，this study proposed a method for extracting the target region of HIFU treatment by combining the latent diffusion and U-shaped network. Method First，we train latent diffusion using existing ultrasound surveillance images and their masks，in which the masks are input into the model as condition vectors to generate ultrasound surveillance images with the same contours. To ensure further that the quality of the generated images is close to that of the original images，we design an automatic filtering module that calculates the Fréchet inception distance score（FID）of the generated images with respect to the original images by setting the threshold value of the FID to achieve the reliability of the data expansion of ultrasound surveillance images. Second，we propose a novel U-shaped segmentation network（NUNet），whose main body adopts the encoder and decoder of U-Net. Combining atrous spatial pyramid pooling（ASPP）on the encoder side expands the sensory field of the network to extract image features more efficiently. Inspired by the spatial attention and channel attention mechanisms，we design the dual attention skip connection module（DAttention-SK）to replace the original skip connection layer，which improves the efficiency of splicing low-level information with high-level information and reduces the risk of losing information，such as edge texture. At the same time，incorporating multiple cross entropy losses supervises the network to retain useful details and contextual information. Finally，the images generated using latent diffusion are combined with the existing ultrasound surveillance images as a training set. The effect of segmentation errors due to data scarcity in ultrasound surveillance images is reduced to improve the accuracy of segmentation further. Result All experiments were implemented in PyTorch on NVIDIA GeForce RTX 3080 GPU. We trained latent diffusion using datasets collected from clinical treatments and determine the quality of the generated images by FID. For the training strategy of the generative network，the initial learning rate was set to 1×10-4，the batch size was adjusted to 2，and the training epoch was 200. When training the segmentation network，the initial learning rate was set to 1×10-4，the batch size was adjusted to 24，and the training epoch was 100. To verify the superiority of the proposed method，we compared the popular generative and segmentation models. Experimental results showed that the ultrasound surveillance images generated using latent diffusion in exhibit better metrics on FID and learned perceptual image patch similarity （LPIPS）compared with other generative models （0. 172 and 0. 072，respectively）. Under the training set of ultrasound surveillance images of uterine fibroids clinically treated with HIFU，the proposed segmentation algorithm obtained an improvement in mean intersection over union （MIoU） and Dice similarity coefficient （DSC）by 2. 67% and 1. 39%，respectively，compared with the state-of-the-art PDF-UNet. Validation was continued in a breast ultrasound image dataset to explore the generalization of the proposed algorithm. Compared with the state-of-the-art M2SNet，the proposed algorithm’s MIoU and DSC are improved by 2. 11% and 1. 36%，respectively. Conclusion A method for extracting the target region of HIFU treatment was proposed by combining latent diffusion and a U-shaped network. For the first time，latent diffusion was introduced into the generation of ultrasound surveillance images for HIFU treatment，solving the problems of insufficient dataset diversity and data scarcity. Combining ASPP and dual-attention skip connection module in the segmentation network reduces the risk of losing information，such as the edge texture of the target region，and achieves accurate extraction of the target region in the surveillance ultrasound image. The proposed algorithm solves the problem of insufficient diversity of datasets in surveillance ultrasound images to a certain extent and realizes the accurate extraction of target regions in surveillance ultrasound images.

Keywords

high intensity focused ultrasound(HIFU) image segmentation image generation loss function latentdiffusion