Current Issue Cover
结合局部全局特征与多尺度交互的三维多器官分割网络

柴静雯, 李安康, 张浩, 马泳, 梅晓光, 马佳义(武汉大学电子信息学院, 武汉 430072)

摘 要
目的 高度适形放射治疗是常用的癌症治疗方法,该方法的有效性依赖于对癌组织和周边多个危及器官(organ at risk,OAR)解剖结构的精确刻画,因此研究三维图像多器官的高精度自动分割具有重要意义。以视觉Transformer (vision Transformer,ViT)和卷积神经网络(convolutional neural network,CNN)结合为代表的三维医学图像分割方法表现出了丰富的应用优势。然而,这类方法往往忽略同一尺度内和不同尺度间的信息交互,使得CNN和ViT特征的提取和融合受限。本文提出一种端到端多器官分割网络LoGoFUNet (local-global-features fusion UNet),旨在应对现有方法的缺陷。方法 首先,针对单一器官分割,提出在同一尺度下并行提取并融合CNN和ViT特征的LoGoF (local-global-features fusion)编码器,并构建了一个端到端的三维医学图像分割多尺度网络M0。此外,考虑到器官内部以及器官之间的相互关系,该方法在M0网络的基础上设计并引入了多尺度交互(multi-scale interaction,MSI)模块和注意力指导(attention guidance,AG)结构,最终形成了LoGoFUNet。结果 在Synapse数据集和SegTHOR (segmentation of thoracic organs at risk)数据集上,本文方法相比于表现第2的模型,DSC (Dice similarity cofficient)指标分别提高了2.94%和4.93%,而HD95(Hausdorff distance_95)指标则分别降低了8.55和2.45,切实提升了多器官分割任务的性能表现。在ACDC (automatic cardiac diagnosis challenge)数据集上,3D分割方法的适用性大多较差,但LoGoFUNet依然得到了比2D先进方法更好的结果,说明其对数据集的适应能力更强。结论 该方法的分割模型综合尺度内和尺度间的信息交互,具有更好的分割结果,且在数据集上的泛化性更好。
关键词
3D multi-organ segmentation network combining local and global features and multi-scale interaction

Chai Jingwen, Li Ankang, Zhang Hao, Ma Yong, Mei Xiaoguang, Ma Jiayi(School of Electronic Information, Wuhan University, Wuhan 430072, China)

Abstract
Objective Highly conformal radiotherapy is a widely adopted cancer treatment modality requiring meticulous characterization of cancer tissues and comprehensive delineation of the surrounding anatomical structures. The efficacy and safety of this technique depend generally on the ability to precisely target the tumor, necessitating a thorough understanding of the corresponding organ-at-risk anatomy. Thus, accurate and detailed depiction of the neoplastic and adjacent normal tissues using advanced imaging techniques is critical in optimizing the outcomes of highly conformal radiotherapy. Given the current inadequacy of conventional segmentation methods in achieving accurate and efficient delineation of multi-organ structures from 3D medical images, there exists a promising opportunity for research on developing precise and automated segmentation techniques using deep learning approaches. By leveraging the capacity of deep neural networks(DNNs) to learn complex hierarchical representations from vast amounts of labeled data, this technique can facilitate the identification and extraction of specific features and patterns from medical images, leading to considerably reliable and efficient segmentation outcomes. This method could significantly enhance the clinical utility of imaging data in various diagnostic and therapeutic applications, including but not limited to radiation therapy planning, surgical navigation, and disease assessment. Over the past few years, there has been increasing interest in exploring the benefits of integrating vision Transformer(ViT) with convolutional neural networks(CNNs) to enhance the quality and accuracy of semantic segmentation tasks. One promising research direction that has emerged involves addressing the issue of multi-scale representation, which is critical for achieving robust and precise segmentation results on various medical imaging datasets. However, current state-of-the-art methods have failed to fully maximize the potential of multi-scale interaction between CNNs and ViTs. For example, some methods completely disregard multi-scale structures or achieve it by limiting the computational scope of ViTs. Other methods rely solely on CNN or ViT at the same scale, disregarding their complementary advantages. In addition, the existing multi-scale interaction methods often neglect the spatial association between two-dimensional slices, resulting in poor performance in processing volume data. Therefore, further research is needed to solve the aforementioned problems. Method This research aims to address the limitations of existing methods for multi-organ segmentation in 3D medical images by proposing a new approach. By recognizing the importance of simultaneously determining local and global features at the same scale, a universal feature encoder known as the LoGoF module is introduced for use in multi-organ segmentation networks. This method enables the creation of an end-to-end 3D medical image multi-organ segmentation network(denoted as M0), which leverages the LoGoF module. To further enhance the model's ability to determine complex relationships between organs at different scales, a multi-scale interaction module and an attention-guided structure are incorporated into M0. These novel techniques introduce spatial priors into the features extracted at different scales, enabling M0 to accurately perceive inter-organ relationships and identify organ boundaries. By leveraging the preceding advanced components, the proposed model, called LoGoFUNet, enables robust and efficient multi-organ segmentation in 3D medical images. Overall, this approach represents a significant step forward in advancing the accuracy and efficiency of multi-organ segmentation in clinical applications. Result In experiments conducted on two well-known medical imaging datasets(i. e., Synapse and SegTHOR), LoGoFUNet demonstrated impressive gains in accuracy over the second-best performing model. Compared with the runner-up, LoGoFUNet achieved a 2. 94% improvement in the Dice similarity coefficient on the Synapse dataset, and a 4. 93% improvement on the SegTHOR dataset. Furthermore, the 95th percentile Hausdorff distance index showed a significant decrease of 8. 55 and 2. 45 on Synapse and SegTHOR, respectively, indicating an overall improvement in multiorgan segmentation performance. On the ACDC dataset, the applicability of the 3D segmentation method is mostly poor, but LoGoFUNet still obtains better results than the 2D advanced method. This result indicates LoGoFUNet's superior adaptability and versatility to different types of datasets. These findings suggest that LoGoFUNet is a highly competitive and robust framework for accurate multi-organ segmentation in various clinical settings. This study conducts further ablation experiments to provide additional evidence supporting the effectiveness of and justification for LoGoFUNet. These experiments serve to verify the role and contribution of each of the proposed components, including the LoGoF encoder, multiscale interaction module, and attention-guidance structure, in achieving the superior segmentation performance observed with LoGoFUNet. By systematically removing and evaluating the impact of each component on segmentation accuracy, these experiments confirm that the proposed module design is rational and effective. Thus, results of the ablation experiments further reinforce the value and potential clinical significance of adopting the LoGoFUNet framework for multi-organ segmentation in 3D medical imaging applications. Conclusion The experimental evaluation of the proposed segmentation model suggests that it effectively integrates information exchange within and between different scales. This outcome leads to improved segmentation performance and superior generalization capabilities on the dataset. By facilitating the interaction of multi-scale representations and leveraging novel techniques, such as intra- and inter-scale information exchange mechanisms, this approach enables the model to accurately determine complex spatial relationships and produce high-quality segmentations across a range of 3D medical imaging datasets. Findings highlight the importance of multi-scale features and information exchange in achieving robust and accurate medical image segmentation results. Lastly, results suggest that the proposed framework could provide significant benefits in a variety of clinical applications.
Keywords

订阅号|日报