Current Issue Cover
融合交叉注意力与双编码器的医学图像分割

李赫1, 刘建军1, 肖亮2(1.江南大学;2.南京理工大学)

摘 要
目的 在现有的医学图像分割算法中,卷积神经网络(Convolutional Netural Network,CNN)和Transformer相结合的方法占据了主流。然而,这些方法通常不能有效的结合CNN和Transformer所提取到的局部和全局信息,针对这一问题,本文提出了一种基于全局-局部交叉注意力的双编码器分割网络(Dual-encoder global-local cross attention network,DGLCANet)。方法 DGLCANet是基于UNet的编码器-解码器结构实现的。首先,采用CNN和交叉形状窗口Transformer(CSWin Transformer)为主的双编码器结构来提取图像丰富的全局上下文特征以及局部纹理特征。其次,在CNN分支中,引入一个全局-局部交叉注意力Transformer模块来使双分支所提取到的信息关联起来。最后,在原始的跳跃连接中加入了一个特征自适应模块,用来提升网络的特征自适应能力,减小编码器与解码器之间的特征差距。 结果 将DGLCANet与9种先进的分割算法在4个公开数据集上进行实验对比,其分割效果在交并比(Intersection over Union,IoU)、Dice系数(Dice coefficient)、准确度(Accuracy, Acc)和召回率(Recall)指标上均有提高,在4个数据上的IoU分别达到了85.1%、83.34%、68.01%和85.63%,相较于经典算法UNet分别提升了8.07%、6.01%、7.83%和3.87%。 结论 DGLCANet综合了基于CNN方法和基于Transformer方法的优点,充分利用了图像中的全局和局部信息,具有更优异的分割性能。
关键词
Dual-encoder global-local cross attention network for medical image segmentation

Li He, Liu Jianjun1, Xiao Liang2(1.Jiangnan University;2.Nanjing University of Science and Technology)

Abstract
Objective Medical image segmentation has a wide range of applications and research value in medical research and practice. In recent years, UNet based on convolutional neural network (CNN) has become a baseline architecture for medical image segmentation. However, due to the limited receptive field of CNN, it cannot effectively extract global context information. Although Transformer was originally designed to solve this problem, it is limited in capturing local information. In this paper, we propose the DGLCANet, a dual-encoder global-local cross attention network with CNN and Transformer. Method DGLCANet combines the advantages of CNN and Transformer to extract rich global context and local texture features. Specifically, we propose a global-local cross attention Transformer block, which correlates the local and global information that are extracted by CNN and Transformer branches. To improve the feature adaptation capability and reduce the feature gap between the encoder and decoder, we design a feature adaptation block in the skip connection of DGLCANet. We conduct test performed on four public datasets, including ISIC-2017, ISIC-2018, BUSI and 2018 Data Science Bowl. Among them, ISIC-2017 and ISIC-2018 are used for dermoscopic images of melanoma detection, containing 2000 and 2596 images respectively. The BUSI dataset is a breast ultrasound dataset for detecting breast cancer, which contains 780 images. The 2018 Data Science Bowl dataset for examining cell nuclei in different microscope images and there are 670 images in total. The resolution of all images is set to 256×256 pixels and randomly divided into training and test sets according to the ratio of 8:2. DGLCANet is implemented in the pytorch framework and was trained on a NVIDIA GeForce RTX 3090Ti GPU with 24GB memory. In the experiment, we mixed the binary cross-entropy loss function and the Dice loss function in proportion to construct a new loss function. Furthermore, employ the Adam optimizer with an initial learning rate of 0.001, a momentum parameter of 0.9, and a weight decay of 0.0001. Result In this study, we used four evaluation metrics to evaluate the effectiveness of the proposed method, including Intersection over Union (IoU), Dice coefficient (Dice), Accuracy and Recall. In theory, the larger the value of these evaluation metrics, the better the segmentation effect. Experimental results show that on the four datasets, the Dice coefficient reaches 91.88%, 90.82%, 80.71%, and 92.25% respectively, which are 5.87%, 5.37%, 4.65%, and 2.92% higher than the classic method UNet respectively. Compared with recent state-of-the-art methods, our method also demonstrates its superiority. Furthermore, the visualized results graph demonstrate that our method better predict the boundary area of the image and distinguish the lesion area from the normal area. Meanwhile compared with other methods, our method can still achieve better segmentation results under the condition of multiple interference factors such as brightness, which are very close to the ground truth. The results of a series of ablation experiments also show that each of our proposed components can perform as it should. Conclusion In this study, we propose a dual-encoder medical image segmentation method that integrates global-local attention mechanism. The experimental results demonstrate that our method outperforms a number of state-of-the-art methods.
Keywords

订阅号|日报