融合多头注意力机制的新冠肺炎联合诊断与分割
A MHA-based integrated diagnosis and segmentation method for COVID-19 pandemic
- 2022年27卷第12期 页码:3651-3662
纸质出版日期: 2022-12-16 ,
录用日期: 2022-03-08
DOI: 10.11834/jig.211015
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-12-16 ,
录用日期: 2022-03-08
移动端阅览
李金星, 孙俊, 李超, Bilal Ahmad. 融合多头注意力机制的新冠肺炎联合诊断与分割[J]. 中国图象图形学报, 2022,27(12):3651-3662.
Jinxing Li, Jun Sun, Chao Li, Ahmad Bilal. A MHA-based integrated diagnosis and segmentation method for COVID-19 pandemic[J]. Journal of Image and Graphics, 2022,27(12):3651-3662.
目的
2
新冠肺炎疫情席卷全球,为快速诊断肺炎患者,确认患者肺部感染区域,大量检测网络相继提出,但现有网络大多只能处理一种任务,即诊断或分割。本文提出了一种融合多头注意力机制的联合诊断与分割网络,能同时完成X线胸片的肺炎诊断分类和新冠感染区分割。
方法
2
整个网络由3部分组成,双路嵌入层通过两种不同的图像嵌入方式分别提取X线胸片的浅层直观特征和深层抽象特征;Transformer模块综合考虑提取到的浅层直观与深层抽象特征;分割解码器扩大特征图以输出分割区域。为响应联合训练,本文使用了一种混合损失函数以动态平衡分类与分割的训练。分类损失定义为分类对比损失与交叉熵损失的和;分割损失是二分类的交叉熵损失。
结果
2
基于6个公开数据集的合并数据实验结果表明,所提网络取得了95.37%的精度、96.28%的召回率、95.95%的F1指标和93.88%的kappa系数,诊断分类性能超过了主流的ResNet50、VGG16(Visual Geometry Group)和Inception_v3等网络;在新冠病灶分割表现上,相比流行的U-Net及其改进网络,取得最高的精度(95.96%),优异的敏感度(78.89%)、最好的Dice系数(76.68%)和AUC(area under ROC curve)指标(98.55%);效率上,每0.56 s可输出一次诊断分割结果。
结论
2
联合网络模型使用Transformer架构,通过自注意力机制关注全局特征,通过交叉注意力综合考虑深层抽象特征与浅层高级特征,具有优异的分类与分割性能。
Objective
2
In order to alleviate the COVID-19(corona virus disease 2019) pandemic
the initial implementation is focused on targeting and isolating the infectious patients in time. Traditional PCR(polymerase chain reaction) screening method is challenged for the costly and time-consuming problem. Emerging AI(artificial intelligence)-based deep learning networks have been applied in medical imaging for the COVID-19 diagnosis and pathological lung segmentation nowadays. However
current networks are mostly restricted by the experimental datasets with limited number of chest X-ray (CXR) images
and it merely focuses on a single task of diagnosis or segmentation. Most networks are based on the convolution neural network (CNN). However
the convolution operation of CNN is capable to extract local features derived from intrinsic pixels
and has the long-range dependency constraints for explicitly modeling. We develop a vision transformer network (ViTNet). The multi-head attention (MHA) mechanism is guided for long-range dependency model between pixels.
Method
2
We built a novel transformer network called ViTNet for diagnosis and segmentation both. The ViTNet is composed of three parts
including dual-path feature embedding
transformer module and segmentation-oriented feature decoder. 1) The embedded dual-path feature is based on two manners for the embedded CXR inputs. One manner is on the basis of 2D convolution with the sliding step equal to convolution kernel size
which divides a CXR to multiple patches and builds an input vector for each patch. The other manner is concerned of a pre-trained feature map (ResNet34-derived) as backbone in terms of deep CXR-based feature extraction. 2) The transformer module is composed of six encoders and one cross-attention module. The 2D-convolution-generated vector sequence is as inputs for transformer encoder. Owing that the encoder inputs are directly extracted from image pixels
they can be considered as the shallow and intuitive feature of CXR. The six encodes are in sequential
transforming the shallow feature to advanced global feature. The cross-attention module is focused on the results obtained by backbone and transformer encoders as inputs
the network can combine the deep abstract feature and encoded shallow feature
and absorb both the global information and the local information in terms of the encoded shallow feature and deep abstract feature
respectively. 3) The feature decoder for segmentation can double the size of feature map and provide the segmentation results. Our network is required to deal with two tasks simultaneously for both of classification and segmentation. A hybrid loss function is employed for their training
which can balance the training efforts between classification and segmentation. The classification loss is the sum of a contrastive loss and a multi-classes cross-entropy loss. The segmentation loss is a binary cross-entropy loss. What is more
a new five-levels CXR dataset is compiled. The dataset samples are based on 2 951 CXRs of COVID-19
16 964 CXRs of healthy
6 103 CXRs of bacterial pneumonia
5 725 CXRs of viral pneumonia
and 6 723 CXRs of opaque lung. In this dataset
COVID-19 CXRs are all labeled with COVID-19 infected lung masks. In our training process
the input images were resized as 448×448 pixels
the learning rate is initially set as 2×10
-4
and decreased gradually in a self-adaptive manner
and the total number of iterations is 200
the Adam learning procedure is conducted on four Tesla K80 GPU devices.
Result
2
In the classification experiments
we compared ViTNet to a general transformer network and five popular CNN deep-learning models (i.e.
ResNet18
ResNet50
VGG16(Visual Geometry Group)
Inception_v3
and deep layer aggregation network(DLAN) in terms of overall prediction accuracy
recall rate
F1 and kappa evaluator. It can be demonstrated that our model has the best with 95.37% accuracy
followed by Inception_v3 and DLAN with 95.17% and 94.40% accuracy
respectively
and the VGG16 is reached 94.19% accuracy. For the recall rate
F1 and kappa value
our model has better performance than the rest of networks as well. For the segmentation experiments
ViTNet is in comparison with four commonly-used segmentation networks like pyramid scene parsing network (PSPNet)
U-Net
U-Net+ and context encoder network (CE-Net). The evaluation indicators used are derived of the accuracy
sensitivity
specificity
Dice coefficient and area under ROC(region of interest) curve (AUC). The experimental results show that our model has its potentials in terms of the accuracy and AUC. The second best sensitivity is performed inferior to U-Net+ only. More specifically
our model achieved the 95.96% accuracy
78.89% sensitivity
97.97% specificity
98.55% AUC and a Dice coefficient of 76.68%. When it comes to the network efficiency
our model speed is 0.56 s per CXR. In addition
we demonstrate the segmentation results of six COVID-19 CXR images obtained by all the segmentation networks. It is reflected that our model has the best segmentation performance in terms of the illustration of
Fig. 5
Fig. 5
. Our model limitation is to classify a COVID-19 group as healthy group incorrectly
which is not feasible. The PCR method for COVID-19 is probably more trustable than the deep-leaning method
but the feedback duration of tested result typically needs for 1 or 2 days.
Conclusion
2
A novel ViTNet method is developed
which achieves the auto-diagnosis on CXR and lung region segmentation for COVID-19 infection simultaneously. The ViTNet has its priority in diagnosis performance and demonstrate its potential segmentation ability.
新冠肺炎(COVID-19)自动诊断肺部区域分割多头注意力机制混合损失
corona virus disease 2019(COVID-19)automatic diagnosislung region segmentationmulti-head attention mechanismhybrid loss
Apostolopoulos I D and Mpesiana T A. 2020. Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 43(2): 635-640 [DOI: 10.1007/s13246-020-00865-4]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Das D, Santosh K C and Pal U. 2020. Truncated inception net: COVID-19 outbreak screening using chest X-rays. Physical and Engineering Sciences in Medicine, 43(3): 915-925 [DOI: 10.1007/s13246-020-00888-x]
Degerli A, Ahishali M, Yamac M, Kiranyaz S, Chowdhury M E H, Hameed K, Hamid T, Mazhar R and Gabbouj M. 2021. COVID-19 infection map generation and detection from chest X-Ray images. Health Information Science and Systems, 9(1): #15 [DOI: 10.1007/s13755-021-00146-8]
Dinleyici K. 2020. Covid-normal-viral-opacity_V2 [DB/OL]. [2021-10-15].https://www.kaggle.com/kamildinleyici/covid-normal-viral-opacity-v2https://www.kaggle.com/kamildinleyici/covid-normal-viral-opacity-v2
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words: transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. Vienna, Austria: IEEE: 1-21
Farooq M and Hafeez A. 2020. COVID-ResNet: a deep learning framework for screening of COVID19 from radiographs [EB/OL]. [2020-03-31].https://arxiv.org/pdf/2003.14395.pdfhttps://arxiv.org/pdf/2003.14395.pdf
Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019.CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562]
He J, Chen J N, Liu S, Kortylewski A, Yang C, Bai Y T and Wang C H. 2021. TransFG: a transformer architecture for fine-grained recognition [EB/OL]. [2021-03-14].https://arxiv.org/pdf/2103.07976.pdfhttps://arxiv.org/pdf/2103.07976.pdf
Kang B, Guo J, Wang S, Xu B and Meng X F. 2020. Supercomputing-supported COVID-19 CT image comprehensive analysis assistant system. Journal of Image and Graphics, 25(10): 2142-2150
康波, 郭佳, 王帅, 徐波, 孟祥飞. 2020. 超级计算支撑的新冠肺炎CT影像综合分析辅助系统应用. 中国图象图形学报, 25(10): 2142-2150 [DOI: 10.11834/jig.200239]
Meng L and Li R H. 2020. Progress of artificial intelligence diagnosis and Prognosis technology for COVID-19 medical imaging. Journal of Image and Graphics, 25(10): 2058-2067
孟琭, 李镕辉. 2020. 新型冠状病毒肺炎(COVID-19)医学影像AI诊断研究进展. 中国图象图形学报, 25(10): 2058-2067 [DOI: 10.11834/jig.200222]
Owais M, Lee Y W, Mahmood T, Haider A, Haider H and Park K R. 2021. Multilevel deep-aggregated boosted network to recognize COVID-19 infection from large-scale heterogeneous radiographic data. IEEE Journal ofBiomedical and Health Informatics, 25(6): 1881-1891 [DOI: 10.1109/JBHI.2021.3072076]
Park S, Kim G, Oh Y, Seo J B, Lee S M, Kim J H, Moon S, Lim J K and Ye J C. 2021. Vision transformer using low-level chest X-ray feature corpus for COVID-19 diagnosis and severity quantification[EB/OL]. [2021-04-15].https://arxiv.org/pdf/2104.07235.pdfhttps://arxiv.org/pdf/2104.07235.pdf
Qatar-University, Tampere-University and Hamad Medical Corporation. 2021. QaTa-COV19 dataset [DB/OL]. [2021-10-15].https://www.kaggle.com/aysendegerli/qatacov19-datasethttps://www.kaggle.com/aysendegerli/qatacov19-dataset
Qatar-University and the-University-of-Dhaka. 2021. COVID-19 radiography database[DB/OL]. [2021-10-15].https://www.kaggle.com/tawsifurrahman/covid19-radiography-databasehttps://www.kaggle.com/tawsifurrahman/covid19-radiography-database
Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Kashem S B A, Islam M T, Al Maadeed S, Zughaier S M, Khan M S and Chowdhury M E H. 2021. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in Biology and Medicine, 132: #104319 [DOI: 10.1016/j.compbiomed.2021.104319]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Thakur S. 2020. Classification of COVID viral bacterial pneumonia [DB/OL]. [2021-11-05].https://www.kaggle.com/sriram-thakur/classification-of-covid-viral-bacterial-pneumoniahttps://www.kaggle.com/sriram-thakur/classification-of-covid-viral-bacterial-pneumonia
Unais S, Gokul L K V, Sunny P, Rahul B, Tarun K, Sanjana S and Kriti B. 2020. Curated dataset for COVID-19 posterior-anterior chest radiography images (X-Rays) [DB/OL]. [2021-10-15].https://data.mendeley.com/datasets/9xkhgts2s6/1https://data.mendeley.com/datasets/9xkhgts2s6/1
Vantaggiato E, Paladini E, Bougourzi F, Distante C, Hadid A and Taleb-Ahmed A. 2021. COVID-19 recognition using ensemble-CNNs in two new chest X-ray databases. Sensors, 21(5): #1742 [DOI: 10.3390/s21051742]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6000-6010
Zhang Y F, Wang C Y, Wang X G, Zeng W J and Liu W Y. 2021. FairMOT: on the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129(11): 3069-3087 [DOI: 10.1007/s11263-021-01513-4]
Zhou Z W, Siddiquee M R, Tajbakhsh N and Liang J M. 2018. UNet++: a nested U-net architecture for medical image segmentation//Proceedings of Deep Learning in Medical Image Analysis-and-Multimodal Learning for Clinical Decision Support-4th International Workshop, DLMIA 2018, and 8th International Workshop. Granada, Spain: MICCAI: 3-11
相关作者
相关机构