SA-TF-UNet：基于空间注意力机制和Transformer的MRI海马体分割

欧宇轩; 高敏; 赵地; 刘军

发布时间： 2023-10-19
摘要点击次数： 1025
全文下载次数： 482
DOI: 10.11834/jig.220567
2023 | Volume 28 | Number 10

SA-TF-UNet：基于空间注意力机制和Transformer的MRI海马体分割

欧宇轩^1,2, 高敏³, 赵地^1,4, 刘军³(1.中国科学院计算技术研究所, 北京 100190;2.北京邮电大学国际学院, 北京 100876;3.中南大学湘雅二医院放射科, 长沙 410011;4.中国科学院大学计算机学院, 北京 100049)

摘要

目的海马体内嗅皮层的像素体积较小，这些特征给医学影像的分割任务带来很大挑战。综合海马体的形态特点以及医生的分割流程，提出一种新的海马体分割方法，以实现在临床医学影像处理中对海马体的精确分割，辅助阿尔兹海默症的早期诊断。方法提出一个基于自注意力机制与空间注意力机制的U型网络模型SA-TF-UNet （hippocampus segmentation network based on Transformer and spatial attention mechanisms）。该网络为端到端的预测网络，输入任意大小的3维MRI （magnetic resonance imaging）影像，输出类别标签。SA-TF-UNet采用编码器—解码器结构，编码器采用纯Transformer模块，不包含卷积模块。多头自注意力机制为Transformer模块中的特征提取器，自注意力模块基于全局信息建模，并提取特征。因此，使用Transformer提取特征符合医生分割海马体的基本思路。解码器采用简单的卷积模块进行上采样。使用AG （attention gate）模块作为跳跃连接的方式，自动增加前景的权重，代替了传统网络中的直接连接。为了验证AG的有效性，分别做了只在单层加入AG的实验，与在4层网络中全部加入AG的实验结果进行对比。为了进一步探讨AG模块中门控信号的来源，设计了两个SA-TF-UNet的变体，它们的网络结构中AG门控信号分别为比AG中的特征图深两层的Transformer模块输出和深3层的Transformer模块输出。结果为了验证SA-TF-UNet在临床数据集中分割海马体的有效性，在由阿尔兹海默症患者的MRI影像组成的脑MRI数据集上进行实验。4层网络全部加入AG，且AG的门控信号是由比AG特征图更深一层的Transformer模块输出的SA-TF-UNet模型分割效果最好。SA-TF-UNet对于左海马体、右海马体的分割Dice系数分别为0.900 1与0.909 1，相较于对比的语义分割网络有显著提升，Dice系数提升分别为2.82%与3.43%。结论加入空间注意力机制的以纯Transformer模块为编码器的分割网络有效提升了脑部MRI海马体的分割精度。

关键词

海马体医学图像处理 Transformer 空间注意力机制语义分割

SA-TF-UNet: a Transformer and spatial attention mechanisms based hippocampus segmentation network

Ou Yuxuan^1,2, Gao Min³, Zhao Di^1,4, Liu Jun³(1.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;2.International School, Beijing University of Posts and Telecommunications, Beijing 100876, China;3.Department of Radiology, Second Xiangya Hospital, Central South University, Changsha 410011, China;4.School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract

Objective The early intervention and diagnosis of Alzheimer's disease(AD) have its high clinical and social value to a certain extent. Hippocampus is located and as one of the earliest affected brain regions in AD,and its dysfunction is recognized as such core features of the disease-memory impairment. It is labor-intensive and time inefficient to deal with AD contexts using magnetic resonance imaging(MRI). The emerging artificial intelligence(AI) technique is beneficial for high-accuracy hippocampus segmentation work on MRI scanning effectively and efficiently. When an AI-related algorithm is developed for AD diagnosis,convolutional neural networks(CNNs) based deep learning methods can be employed to carry out the task of hippocampus segmentation further. As the down-sampling steps are involved in the encoder,convolutions of various kernel sizes can be used to contract images and extract image features. To expand the generated feature map through encoding,upsampling it to the original spatial size of the input image,the decoders can be used to transpose convolutions and bilinear interpolation as well. First,convolutions can be used to integrate context information within the receptive field only. In this case,all pixels-out would be ignored for in-bound of the receptive field,even pixels are correlated with in-bound pixels,and redundant information is produced after that. To optimize task of hippocampus segmentation network,we focus on the natural characteristics of the hippocampus and clinical-based segmentation works. The characteristics of the hippocampus can be affected on the two aspects as mentioned below:the first one is oriented that the shape of the hippocampus is irregular,while its size of the second one is minimal,occupied by only 0. 000 2 of the whole pixels of the MRI scans. For the first one,convolutions are difficult to extract features effectively from irregular shape objectives because they can extract local features only. An encoder in a neural network may contain many feature extraction layers,so the extracted information of the hippocampus will be lost because there are only limited pixels of the hippocampus in the original image. To sort the hippocampus-relevant region of interest out,it is required to segment small objects is a superposition of a detection network. The semantic segmentation network will only be oriented and applied inside the bounding box. However,it still has two identical features in the learning process,for which redundancy of computing resources are inevitable. Method To extract features from targets with irregular shapes effectively and highlight the target areas automatically,we adjust the segmentation in medical images and treat it as a sequence-to-sequence prediction task. We develop a U-shaped network based on self-attention and spatial attention mechanisms,called SA-TF-UNet. The SA-TF-UNet has an encoder-decoder architecture,where the encoder is based on pure Transformer blocks. Self-attention mechanisms in Transformer blocks can be used to enable global modeling as well. An attention gate(AG) is adopted to optimize the concatenation of the skip connections in U-Net,where the AGs can be learnt from depth layers of the Transformer and the weights on the target areas can be automatically set up more. To validate the effectiveness of AGs,we carried out experiments where one AG is only contained for the network. The comparative analysis is carried out the experiment as well,where we apply AG to all four layers. To determine the gating signals for each AG further,two sorts of structures are illustrated. The gating signals in these two sorts of structures are focused on the depth outputs of two Transformer blocks, and three Transformer blocks. Result Our models proposed are tested on a dataset sample derived of 54 clinical MRI scans from AD patients. The dataset is divided into training data and testing data at a ratio of 8:1 randomly. Three independent experiments are carried out,and an average result is used to reduce contingency simutaneously. The potential of SA-TF-UNet is demonstrated that the average dice of the left hippocampus and right hippocampus in three independent experiments are 0. 900 1 and 0. 909 1 relevant to an improvement of 2. 82% and 3. 37%. The other two related fine-tuned structures are linked that a dice coefficient of them is reached to more than 0. 88 as well. Conclusion The integrated self and spatial attention is beneifical for the precision of hippocampus segmentation. It is effective that the gating signal in AG is outputted in terms of one depth Transformer block only.

Keywords

hippocampus medical image processing Transformer spatial attention sementic segmentation

在线采编平台

论文出版

年度会议

下载中心

年度信息