基于Transformer网络的COVID-19肺部CT图像分割

樊圣澜; 柏正尧; 陆倩杰; 周雪

doi:10.11834/jig.220865

医学图像处理 | 浏览量 : 0 下载量: 2 CSCD: 0

PDF
导出
分享
收藏
专辑

基于Transformer网络的COVID-19肺部CT图像分割
A Transformer network based CT image segmentation for COVID-19-derived lung disease
2023年28卷第10期页码：3203-3213
纸质出版日期： 2023-10-16 ，
DOI： 10.11834/jig.220865
稿件说明：

移动端阅览

樊圣澜，柏正尧，陆倩杰，周雪. 2023. 基于Transformer网络的COVID-19肺部CT图像分割. 中国图象图形学报， 28(10):3203-3213

Fan Shenglan， Bai Zhengyao， Lu Qianjie， Zhou Xue. 2023. A Transformer network based CT image segmentation for COVID-19-derived lung disease. Journal of Image and Graphics， 28(10):3203-3213
樊圣澜，柏正尧，陆倩杰，周雪. 2023. 基于Transformer网络的COVID-19肺部CT图像分割. 中国图象图形学报， 28(10):3203-3213 DOI： 10.11834/jig.220865.

Fan Shenglan， Bai Zhengyao， Lu Qianjie， Zhou Xue. 2023. A Transformer network based CT image segmentation for COVID-19-derived lung disease. Journal of Image and Graphics， 28(10):3203-3213 DOI： 10.11834/jig.220865.

摘要

目的

COVID-19（corona virus disease 2019）患者肺部CT（computed tomography）图像病变呈多尺度特性，且形状不规则。由于卷积层缺乏长距离依赖性，基于卷积神经网络（convolutional neural network，CNN）的语义分割方法对病变的假阴性关注度不够，存在灵敏度低、特异度高的问题。针对COVID-19病变的多尺度问题，利用Transformer强大的全局上下文信息捕获能力，提出了一种COVID-19患者肺部CT图像分割的Transformer网络：COVID-TransNet。

方法

该网络以Swin Transformer为主干，在编码器部分提出了一个具有残差连接和层归一化（layer normalization，LN）的线性前馈模块，用于特征图通道维度的调整，并用轴向注意力模块（axial attention）替换跳跃连接，提升网络对全局信息的关注度。在解码器部分引入了一种新的特征融合模块，在上采样的过程中逐级细化局部信息，并采用多级预测的方法进行深度监督，最后利用Swin Transformer模块对解码器各级特征图进行解码。

结果

在COVID-19 CT segmentation数据集上实现了0.789的Dice系数、0.807的灵敏度、0.960的特异度和0.055的平均绝对误差，较Semi-Inf-Net分别提升了5%、8.2%、0.9%，平均绝对误差下降了0.9%，取得了先进水平。

结论

基于Transformer的COVID-19 CT图像分割网络，提高了COVID-19病变的分割精度，有效解决了CNN方法低灵敏度、高特异度的问题。

Abstract

Objective

The corona virus disease 2019 （COVID-19） patients-oriented screening is mostly focused on reverse transcription-polymerase chain reaction （RT-PCR） nowadays. However， its challenges have been emerging in related to lower sensitivity and time-consuming. To optimize the related problem of diagnostic accuracy and labor intensive， chest X-ray （CXR） images and computed tomography （CT） images have been developing as two of key techniques for COVID-19 patients-oriented screening. However， these methods still have such limitations like clinicians-related experience factors in visual interpretation. In addition， inefficient diagnostic time span is challenged to be resolved for CT scanning technology as well. To get a rapid diagnosis of COVID-19 patients， emerging deep learning technique based CT scanning technology have been applied to segment and identify lesion regions in CT images of patients. Most of semantic segmentation methods are implemented in terms of convolutional neural networks （CNNs）. The lesions of COVID-19 are multi-scale and irregular， and it is still difficult to capture completed information derived of the limited receptive field of CNN. Therefore， CNN-based semantic segmentation method does not pay enough attention to false negatives when such lesions are dealt with， and it still has the problem of low sensitivity and high specificity.

Method

First， Swin Transformer is as the backbone and the output is extracted of the second， fourth， eighth， and twelfth Swin Transformer modules. Four sort of multi-scale feature maps are generated after that. Numerous of datasets are required to be used in terms of transfer learning method and its pre-training weight on ImageNet. Second， a residual connection and layer normalization （LN） based linear feed-forward module is developed to adjust the channel dimension of feature maps， and the axial attention module is applied to improve global information-related network’s attention as well. The linear feed-forward module-relevant fully connected layer can be carried out in the channel dimension only， and axial attention module-relevant self-attention is only computed locally， so computing cost has barely shrinked. Finally， for the decoder part， to improve the segmentation accuracy of edge information， a structure is developed to refine local information step by step， as well as multi-level prediction method is used for deep supervision. Furthermore， a multi-level prediction approach is also used for deep supervision. The Swin Transformer module is used to decode all levels of feature maps of the decoder part， which can optimize network learning and its related ability to refine local information gradually.

Result

For data augmentation-excluded data set of the COVID-19 CT segmentation， the Dice coefficient is 0.789， the sensitivity is 0.807； the specificity is 0.960， and the mean absolute error （MAE） is 0.055. Compared to the Semi-Inf-Net， each of it is increased by 5%， 8.2%， and 0.9%， and the MAE is decreased by 0.9%. For the ablation experiment， we have also verified the improvement of segmentation accuracy based on each module. The generalization ability is verified on 638 slices of the COVID-19 infection segmentation dataset， for which the Dice coefficient is 0.704， the sensitivity is 0.807， and the specificity is 0.960. Compared to the Semi-Inf-Net， each of it has increased by 10.7%， 0.1%， and 1.3% further.

Conclusion

Such a Transformer is applied to segment COVID-19 CT images. Our network proposed can be dealt with both of local information and global information effectively through Transformer-purified network structure. The segmentation accuracy of COVID-19 lesions can be improved， and the problem of low sensitivity and high specificity of traditional CNN can be solved effectively to a certain extent. Experiments demonstrate that COVID-TransNet has its generalization performance and the ability of high accuracy segmentation. It is beneficial to assist clinicians efficiently in relevant to diagnosing COVID-19 patients.

关键词

COVID-19CT图像分割Swin Transformer轴向注意力多级预测

Keywords

corona virus disease 2019 （COVID-19）computed tomography （CT） images segmentationSwin Transformeraxial attentionmulti-level prediction

references

Cao H， Wang Y Y， Chen Y， Jiang D S， Zhang X P， Tian Q and Wang M N. 2021. Swin-Unet： Unet-like pure Transformer for medical image segmentation ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/2105.05537.pdfhttps://arxiv.org/pdf/2105.05537.pdf

Chen J N， Lu Y Y， Yu Q H， Luo X D， Adeli E， Wang Y， Lu L， Yuille A L and Zhou Y Y. 2021. Transunet： s make strong encoders for medical image segmentation ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/2102.04306.pdfhttps://arxiv.org/pdf/2102.04306.pdf

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： Transformers for image recognition at scale ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Elharrouss O， Subramanian N and Al-Maadeed S. 2022. An encoder-decoder-based method for segmentation of COVID-19 lung infection in CT images. SN Computer Science， 3（1）： #13 ［DOI： 10.1007/s42979-021-00874-4http://dx.doi.org/10.1007/s42979-021-00874-4］

Fan D P， Zhou T， Ji G P， Zhou Y， Chen G， Fu H Z， Shen J B and Shao L. 2020. Inf-Net： automatic COVID-19 lung infection segmentation from CT images. IEEE Transactions on Medical Imaging， 39（8）： 2626-2637 ［DOI： 10.1109/TMI.2020.2996645http://dx.doi.org/10.1109/TMI.2020.2996645］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Ho J， Kalchbrenner N， Weissenborn D and Salimans T. 2019. Axial attention in multidimensional Transformers ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/1912.12180.pdfhttps://arxiv.org/pdf/1912.12180.pdf

Li X M， Chen H， Qi X J， Dou Q， Fu C W and Heng P A. 2018. H-DenseUNet： hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Transactions on Medical Imaging， 37（12）： 2663-2674 ［DOI： 10.1109/TMI.2018.2845918http://dx.doi.org/10.1109/TMI.2018.2845918］

Lin T Y， Dollár P， Girshick R， He K M， Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： 936-944 ［DOI： 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Lu Q J， Bai Z Y， Fan S L， Zhou X and Xu Z. 2022. Multiscale codec network based CT image segmentation for human lung disease derived of COVID-19. Journal of Image and Graphics， 27（3）： 827-837

陆倩杰，柏正尧，樊圣澜，周雪，许祝. 2022. COVID-19肺部CT图像多尺度编解码分割. 中国图象图形学报， 27（3）： 827-837 ［DOI： 10.11834/jig.210523http://dx.doi.org/10.11834/jig.210523］

Lu R J， Zhao X， Li J， Niu P H， Yang B， Wu H L， Wang W L， Song H， Huang B Y， Zhu N， Bi Y H， Ma X J， Zhan F X， Wang L， Hu T， Zhou H， Hu Z H， Zhou W M， Zhao L， Chen J， Meng Y， Wang J， Lin Y， Yuan J Y， Xie Z H， Ma J M， Liu W J， Wang D Y， Xu W B， Holmes E C， Gao G F， Wu G Z， Chen W J， Shi W F and Tan W J. 2020. Genomic characterisation and epidemiology of 2019 novel coronavirus： implications for virus origins and receptor binding. The Lancet， 395（10224）： 565-574 ［DOI： 10.1016/s0140-6736（20）30251-8http://dx.doi.org/10.1016/s0140-6736（20）30251-8］

Oktay O， Schlemper J， Le Folgoc L， Lee M， Heinrich M， Misawa K， Mori K， McDonagh S， Hammerla N Y， Kainz B， Glocker B and Rueckert D. 2018. Attention U-Net： learning where to look for the pancreas ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/1804.03999.pdfhttps://arxiv.org/pdf/1804.03999.pdf

Ronneberger O， Fischer P and Brox T. 2015. U-net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Schlemper J， Oktay O， Schaap M， Heinrich M， Kainz B， Glocker B and Rueckert D. 2019. Attention gated networks： learning to leverage salient regions in medical images. Medical Image Analysis， 53： 197-207 ［DOI： 10.1016/j.media.2019.01.012http://dx.doi.org/10.1016/j.media.2019.01.012］

Shelhamer E， Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（4）： 640-651 ［DOI： 10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683］

Valanarasu J M J， Oza P， Hacihaliloglu I and Patel V M. 2021. Medical Transformer： gated axial-attention for medical image segmentation//Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg， France： Springer： 36-46 ［DOI： 10.1007/978-3-030-87193-2_4http://dx.doi.org/10.1007/978-3-030-87193-2_4］

Wang H Y， Xie S A， Lin L F， Iwamoto Y， Han X H， Chen Y W and Tong R F. 2022. Mixed Transformer U-Net for medical image segmentation//Proceedings of ICASSP 2022-2022 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Singapore， Singapore： IEEE： 2390-2394 ［DOI： 10.1109/ICASSP43922.2022.9746172http://dx.doi.org/10.1109/ICASSP43922.2022.9746172］

Wang L W， Lee C Y， Tu Z W and Lazebnik S. 2015. Training deeper convolutional networks with deep supervision ［EB/OL］. ［2022-08-05］. https://arxiv.org/pdf/1505.02496.pdfhttps://arxiv.org/pdf/1505.02496.pdf

Wu Z， Su L and Huang Q M. 2019. Stacked cross refinement network for edge-aware salient object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7263-7272 ［DOI： 10.1109/ICCV.2019.00736http://dx.doi.org/10.1109/ICCV.2019.00736］

Zhao X Y， Zhang P， Song F， Fan G D， Sun Y Y， Wang Y J， Tian Z Y， Zhang L Q and Zhang G L. 2021. D2A U-Net： automatic segmentation of COVID-19 CT slices based on dual attention and hybrid dilated convolution. Computers in Biology and Medicine， 135： #104526 ［DOI： 10.1016/j.compbiomed.2021.104526http://dx.doi.org/10.1016/j.compbiomed.2021.104526］

Zhou X， Bai Z Y， Lu Q J and Fan S L. 2023. Colorectal polyp segmentation combining pyramid Transformer and axial attention. Computer Engineering and Applications， 59（11）： 222-230

周雪，柏正尧，陆倩杰，樊圣澜. 2023. 融合金字塔Transformer和轴向注意的结直肠息肉分割. 计算机工程与应用， 59（11）： 222-230 ［DOI：10.3778/j.issn.1002-8331.2203-0110http://dx.doi.org/10.3778/j.issn.1002-8331.2203-0110］

Zhou Z W， Siddiquee M M R， Tajbakhsh N and Liang J M. 2018. UNet++： a nested U-Net architecture for medical image segmentation//Proceedings of the 4th International Workshop and the 8th International Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Granada， Spain： Springer： 3-11 ［DOI： 10.1007/978-3-030-00889-5_1http://dx.doi.org/10.1007/978-3-030-00889-5_1］

文章被引用时，请邮件提醒。

提交

引入分组注意力的医学图像分割模型