翟锦涛1, 王润民2, 李昂1, 田峰1, 龚瑾儒1, 钱盛友1, 邹孝1(1.湖南师范大学物理与电子科学学院;2.湖南师范大学信息与工程学院)
目的 由于数据采集限制和隐私保护造成高强度聚焦超声（high intensity focused ultrasound, HIFU）治疗超声监控图像数据量过少，导致现有的强监督分割方法提取治疗目标区域不佳。因此，本文提出了一种结合潜在扩散模型（Latent Diffusion）和U型网络的HIFU治疗目标区域提取方法。方法 生成阶段利用潜在扩散模型和自动筛选模块，实现超声监控图像数据的扩充。目标区域提取阶段提出新型U型分割网络（novel u-shaped segmentation network, NUNet），在编码器端结合空洞空间金字塔池化（atrous spatial pyramid pooling, ASPP），扩大网络的感受野；设计双注意力跳跃连接模块（dual attention skip connection，DAttention-SK），降低边缘纹理信息丢失的风险；引入多交叉熵损失提高网络的分割性能。结果 实验结果表明，与其他生成模型相比，本文使用潜在扩散模型生成的超声监控图像在FID和LPIPS上表现出更优的指标（分别为0.172和0.072）；相较于先进的PDF-UNet，在HIFU临床治疗子宫肌瘤超声监控数据集中，本文所提的分割算法，得到的MIoU和DSC分别提高了2.67%和1.39%。同时，为进一步探讨所提算法的泛化性，继续在乳腺超声公共数据集（breast ultrasound images dataset，BUSI）进行了验证。实验发现，相较于先进的M2SNet，本文算法MIoU和DSC分别提升了2.11%和1.36%。结论 本文算法在一定程度上解决了超声监控图像中数据量过少的问题，实现对监控超声图像中目标区域的精确提取。
A combination of Latent-Diffusion and U-shaped network for HIFU treatment target region extraction
(Hunan Normal University)
Objective In high intensity focused ultrasound (HIFU) treatment, the target area contains a large amount of pathological information, thus it is necessary to accurately locate and extract the target area by ultrasound monitoring images. Since the biological tissues and target regions change their relative positions during the treatment process. At the same time, the diversity of diseases, the variability of tissues and the complexity of target shapes pose certain challenges for target region extraction in ultrasound medical images. Nevertheless, computers can use advanced image processing and analysis algorithms, combined with big data and machine learning methods, to quickly and accurately identify and locate target areas, providing a reliable basis for quantitative clinical analysis. Traditional image segmentation algorithms mainly include methods such as threshold segmentation, edge detection and region growing. However, these methods still have some limitations and are sensitive to the complexity of ultrasound images, noise and other image quality issues, resulting in poor accuracy and robustness of segmentation results. Meanwhile, traditional methods usually require manual selection of parameters limiting the adaptive and generalisation capabilities of the methods, and have a strong dependence on different images. In recent years, deep learning-based methods have attracted widespread attention and made significant progress in the field of medical image segmentation. Most of the methods are performed under strong supervision, yet this type of training requires a large amount of data as support for better prediction. The amount of data in HIFU therapy ultrasound surveillance images is too small due to patient privacy, differences in acquisition devices and the need for manual labelling of target areas by specialised physicians. It causes the network not to be adequately trained, making the segmentation results poor in accuracy and robustness.Therefore, in this paper, we proposed a method for extracting the target region of HIFU treatment by combining the Latent Diffusion and U-shaped network. Method First, we train Latent-Diffusion using existing ultrasound surveillance images and their masks, in which the masks are input into the model as condition vectors to generate ultrasound surveillance images with the same contours; In order to further ensure that the quality of the generated images is close to that of the original images, we designed an automatic filtering module that calculates the frechet inception distance score (FID) of the generated images with respect to the original images by setting the threshold value of the FID to achieve the reliability of the data expansion of ultrasound surveillance images. Secondly, this paper proposes a novel u-shaped segmentation network (NUNet), whose main body adopts the encoder and decoder of UNet. Combining atrous spatial pyramid pooling (ASPP) on the encoder side expands the sensory field of the network to extract image features more efficiently. Inspired by the spatial attention and channel attention mechanisms, we design the dual attention skip connection module (DAttention-SK) to replace the original skip connection layer, which improves the efficiency of splicing the low-level information with the high-level information, and reduces the risk of losing the information such as the edge texture. At the same time, incorporating multiple cross entropy losses supervises the network to retain more useful details and contextual information. Finally, the images generated using the Latent Diffusion are combined with the existing ultrasound surveillance images as a training set.The effect of segmentation errors due to data scarcity in ultrasound surveillance images is reduced to further improve the accuracy of segmentation. Result In this paper, All experiments were implemented in Pytorch on NVIDIA GeForce RTX 3080 GPU. we train a Latent Diffusion using datasets collected from clinical treatments and determine the quality of the generated images by FID. For the training strategy of the generative network, the initial learning rate was set to 1 × 10-4, the batchsize was adjusted to 2, and the training epoch was 200. When training the segmentation network, the initial learning rate was set to 1×10-4, the batchsize was adjusted to 24, and the training epochs was 100. To verify the superiority of the proposed method, we compares the popular generative and segmentation models. The experimental results show that the ultrasound surveillance images generated using the Latent Diffusion in this paper exhibit better metrics on FID and learned perceptual image patch similarity (LPIPS) compared to other generative models (0.172 and 0.072, respectively)；Under the training set of ultrasound surveillance images of uterine fibroids clinically treated with HIFU , compared to the state-of-the-art PDF-UNet, the segmentation algorithm proposed in this paper, obtained an improvement in mean intersection over union (MIoU) and dice similarity coefficient (DSC) by 2.67% and 1.39%, respectively. Also, to further explore the generalization of the proposed algorithm, the validation was continued in breast ultrasound images dataset (BUSI), a public dataset of breast ultrasound. It is found that compared with the state-of-the-art M2SNet, the algorithm MIoU and DSC in this paper are improved by 2.11% and 1.36% respectively. Conclusion we propose a method for extracting the target region of HIFU treatment by combining the Latent Diffusion and U-shaped network. For the first time, the Latent Diffusion was introduced into the generation of ultrasound surveillance images for HIFU treatment, which solved the problems of insufficient dataset diversity and data scarcity. Combining ASPP and dual-attention skip connection module in the segmentation network reduces the risk of losing information such as the edge texture of the target region, and achieves the accurate extraction of the target region in the surveillance ultrasound image.The algorithm in this paper solves the problem of insufficient diversity of datasets in surveillance ultrasound images to a certain extent, and realizes the accurate extraction of target regions in surveillance ultrasound images.