Current Issue Cover
融合双注意力机制3D U-Net的肺肿瘤分割

郝晓宇1, 熊俊峰2,3, 薛旭东4, 石军1, 文可1, 韩文廷1, 李骁扬4, 赵俊2, 傅小龙5(1.中国科学技术大学计算机科学与技术学院, 合肥 230026;2.上海交通大学生物医学工程学院, 上海 200240;3.腾讯医疗健康, 上海 200000;4.中国科学技术大学附属第一医院 肿瘤放疗科, 合肥 230001;5.上海交通大学附属胸科医院放射肿瘤科, 上海 200030)

摘 要
目的 精确的肺肿瘤分割对肺癌诊断、手术规划以及放疗具有重要意义。计算机断层扫描(computed tomography,CT)是肺癌诊疗中最重要的辅助手段,但阅片是一项依靠医生主观经验、劳动密集型的工作,容易造成诊断结果的不稳定,实现快速、稳定和准确的肺肿瘤自动分割方法是当前研究的热点。随着深度学习的发展,使用卷积神经网络进行肺肿瘤的自动分割成为了主流。本文针对3D U-Net准确度不足,容易出现假阳性的问题,设计并实现了3维卷积神经网络DAU-Net(dual attention U-Net)。方法 首先对数据进行预处理,调整CT图像切片内的像素间距,设置窗宽、窗位,并通过裁剪去除CT图像中的冗余信息。DAU-Net以3D U-Net为基础结构,将每两个相邻的卷积层替换为残差结构,并在收缩路径和扩张路径中间加入并联在一起的位置注意力模块和通道注意力模块。预测时,采用连通域分析对网络输出的二值图像进行后处理,通过判断每个像素与周围26个像素的连通关系获取所有的连通域,并清除最大连通域外的其他区域,进一步提升分割精度。结果 实验数据来自上海胸科医院,总共1 010例肺癌患者,每例数据只包含一个病灶,专业的放射科医师提供了金标准,实验采用十折交叉验证。结果表明,本文提出的肺肿瘤分割算法与3D U-Net相比,Dice系数和哈斯多夫距离分别提升了2.5%和9.7%,假阳性率减少了13.6%。结论 本文算法能够有效提升肺肿瘤的分割精度,有助于实现肺癌的快速、稳定和准确分割。
关键词
3D U-Net with dual attention mechanism for lung tumor segmentation

Hao Xiaoyu1, Xiong Junfeng2,3, Xue Xudong4, Shi Jun1, Wen Ke1, Han Wenting1, Li Xiaoyang4, Zhao Jun2, Fu Xiaolong5(1.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;2.School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;3.Tencent HealthCare, Co. Ltd., Shanghai 200000, China;4.Department of Radiation Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230001, China;5.Department of Radiation Oncology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai 200030, China)

Abstract
Objective Precise lung tumor segmentation is a necessary step in computer-aided diagnosis, surgical planning, and radiotherapy of lung cancer. Computed tomography (CT) images are important auxiliary tools in clinical medicine. The diagnosis of lung cancer tumors is labor intensive that requires professional radiologists to carefully examine hundreds of CT slices for finding and confirming the location of tumor lesions, and final reports need to be verified by other experienced radiologists. This process consumes time and effort. Doctors commonly make different diagnoses at the same time, and the same doctor may make different decisions at different times because of the difference in their subjective experience. To solve the above problems, increasing scientific researchers have devoted to the field of medical imaging by continuously promoting the combination of artificial intelligence and medical imaging, and the automatic segmentation of lung tumors has been widely investigated. To address the problems that 3D U-Net is insufficiently accurate and is prone to produce false positive pixels, this paper proposes a new network named dual attention U-Net (DAU-Net) that incorporates dual attention mechanisms and residual modules. A post processing method based on connected component analysis is used to remove the false positive regions outside the region of interest. Method In accordance with the characteristics of lung CT images, we proposed a pipeline to preprocess CT images, which was divided into three steps. Standardizing pixel pitch was the first step that needs to be performed because different pixel spacings will affect the speed and quality of network convergence in the training process. The thickness of all 2D slices is 5 mm, and the range of in-plane resolution varies from 0.607 mm to 0.976 mm. Thus, linear interpolation was applied to each CT slice to obtain 1 mm in-plane resolution. The interpolated CT images still exist in 3D form. The window width and window level were then set to 1 600 and -200, respectively, that is, the pixel values in the CT image greater than 600 were set to 600 and those less than -1 000 were set to -1 000. The intensity values of images were truncated to the range of [-1 000, 600] and linearly normalized to [0,1] to enhance the regions of interest when using CT images, which is helpful for the automatic segmentation of lesions. This step will make the size of each CT image less than N×512×512, where N is the number of slices. After padding to N×512×512, the CT images and their corresponding annotations were cropped to a constant size of N×320×260 from a fixed coordinate (0, 90, 130) of the very beginning slice, and interpolation was used to scale the size of the images to 64×320×260. The main architecture of the network adopts the 3D form of the U-Net by replacing every two adjacent convolutional layers with a residual structure and adding two attention mechanisms to the middle of the contraction path and the expansion path to obtain DAU-Net. The network can alleviate degradation, gradient disappearance, and gradient explosion caused by the increase in the depth of the neural network by adding the residual structures. Similar to U-Net, encoder-decoder networks can merge high-resolution feature maps with position information and low-resolution feature maps with contextual information through skip connections to capture targets of different scales. However, they cannot take advantage of the positional relationship of different objects in global images and association between different categories. To retain the advantages of encoder-decoder structures and overcome the above problems, a position attention module and a channel attention module connected in parallel are combined with 3D U-Net. The position attention module can encode context information from a wide range into local features and the channel attention module can find the dependency relationship between different channels, thereby strengthening the interdependent features. The network can perform end-to-end training and it was trained by optimizing soft dice loss in this work. After inference, connected component analysis is used to remove the false positive regions that are wrongly segmented by only keeping the largest connected component and discarding other parts. Considering that this paper uses a 3D CNN(convolutional neural network), a 26-neighborhood connected component analysis method is used to determine the connection relationship between a central pixel and its 26 adjacent pixels. The output of the network has two channels, and softmax is used to make the output between zero and one. In binarization, only the channel index with a high probability is selected to obtain the final binary result where the connected component analysis method is applied. This postprocessing method effectively improves the segmentation accuracy and decreases the false positive rate (FPR). The premise of using this method is that the dataset we use contains only one lesion per case. Result We retrospectively collected data from patients in Shanghai Chest Hospital from 2013 to 2017. The study was approved by Shanghai Chest Hospital, Shanghai Jiao Tong University. Ethical approval (ID: KS 1716) was obtained for use of the CT images. Experienced radiologists provided the gold standard of each case. In the experiment, we compared the standard 3D U-Net and the reproduced 3D attention U-Net. The experiment used 10-fold cross-validation for all networks, and we adopted the widely used Dice, Hausdorff distance (HD), FPR, and true positive rate to evaluate the predicted outputs. The results show that the proposed DAU-Net has powerful performance in the lung tumor segmentation task, and the postprocessing method can effectively reduce the interference of false positive regions on the segmentation results. Compared with 3D U-Net, Dice and HD are improved by 2.5% and 9.7%, respectively, and FPR is reduced by 13.6%. Conclusion The proposed lung tumor segmentation algorithm can effectively improve the accuracy of tumor segmentation and help to achieve rapid, stable, and accurate segmentation of lung cancer.
Keywords

订阅号|日报