融合多头注意力机制的新冠肺炎联合诊断与分割

李金星; 孙俊; 李超; Bilal Ahmad

doi:10.11834/jig.211015

医学图像处理 | 浏览量 : 0 下载量: 2 CSCD: 2

PDF
导出
分享
收藏
专辑

融合多头注意力机制的新冠肺炎联合诊断与分割
A MHA-based integrated diagnosis and segmentation method for COVID-19 pandemic
2022年27卷第12期页码：3651-3662
纸质出版日期： 2022-12-16 ，

录用日期： 2022-03-08
DOI： 10.11834/jig.211015
稿件说明：

移动端阅览

李金星, 孙俊, 李超, Bilal Ahmad. 融合多头注意力机制的新冠肺炎联合诊断与分割[J]. 中国图象图形学报, 2022,27(12):3651-3662.

Jinxing Li, Jun Sun, Chao Li, Ahmad Bilal. A MHA-based integrated diagnosis and segmentation method for COVID-19 pandemic[J]. Journal of Image and Graphics, 2022,27(12):3651-3662.
李金星, 孙俊, 李超, Bilal Ahmad. 融合多头注意力机制的新冠肺炎联合诊断与分割[J]. 中国图象图形学报, 2022,27(12):3651-3662. DOI： 10.11834/jig.211015.

Jinxing Li, Jun Sun, Chao Li, Ahmad Bilal. A MHA-based integrated diagnosis and segmentation method for COVID-19 pandemic[J]. Journal of Image and Graphics, 2022,27(12):3651-3662. DOI： 10.11834/jig.211015.

摘要

目的

新冠肺炎疫情席卷全球，为快速诊断肺炎患者，确认患者肺部感染区域，大量检测网络相继提出，但现有网络大多只能处理一种任务，即诊断或分割。本文提出了一种融合多头注意力机制的联合诊断与分割网络，能同时完成X线胸片的肺炎诊断分类和新冠感染区分割。

方法

整个网络由3部分组成，双路嵌入层通过两种不同的图像嵌入方式分别提取X线胸片的浅层直观特征和深层抽象特征；Transformer模块综合考虑提取到的浅层直观与深层抽象特征；分割解码器扩大特征图以输出分割区域。为响应联合训练，本文使用了一种混合损失函数以动态平衡分类与分割的训练。分类损失定义为分类对比损失与交叉熵损失的和；分割损失是二分类的交叉熵损失。

结果

基于6个公开数据集的合并数据实验结果表明，所提网络取得了95.37%的精度、96.28%的召回率、95.95%的F1指标和93.88%的kappa系数，诊断分类性能超过了主流的ResNet50、VGG16(Visual Geometry Group)和Inception_v3等网络；在新冠病灶分割表现上，相比流行的U-Net及其改进网络，取得最高的精度(95.96%)，优异的敏感度(78.89%)、最好的Dice系数(76.68%)和AUC(area under ROC curve)指标(98.55%)；效率上，每0.56 s可输出一次诊断分割结果。

结论

联合网络模型使用Transformer架构，通过自注意力机制关注全局特征，通过交叉注意力综合考虑深层抽象特征与浅层高级特征，具有优异的分类与分割性能。

Abstract

Objective

In order to alleviate the COVID-19(corona virus disease 2019) pandemic

the initial implementation is focused on targeting and isolating the infectious patients in time. Traditional PCR(polymerase chain reaction) screening method is challenged for the costly and time-consuming problem. Emerging AI(artificial intelligence)-based deep learning networks have been applied in medical imaging for the COVID-19 diagnosis and pathological lung segmentation nowadays. However

current networks are mostly restricted by the experimental datasets with limited number of chest X-ray (CXR) images

and it merely focuses on a single task of diagnosis or segmentation. Most networks are based on the convolution neural network (CNN). However

the convolution operation of CNN is capable to extract local features derived from intrinsic pixels

and has the long-range dependency constraints for explicitly modeling. We develop a vision transformer network (ViTNet). The multi-head attention (MHA) mechanism is guided for long-range dependency model between pixels.

Method

We built a novel transformer network called ViTNet for diagnosis and segmentation both. The ViTNet is composed of three parts

including dual-path feature embedding

transformer module and segmentation-oriented feature decoder. 1) The embedded dual-path feature is based on two manners for the embedded CXR inputs. One manner is on the basis of 2D convolution with the sliding step equal to convolution kernel size

which divides a CXR to multiple patches and builds an input vector for each patch. The other manner is concerned of a pre-trained feature map (ResNet34-derived) as backbone in terms of deep CXR-based feature extraction. 2) The transformer module is composed of six encoders and one cross-attention module. The 2D-convolution-generated vector sequence is as inputs for transformer encoder. Owing that the encoder inputs are directly extracted from image pixels

they can be considered as the shallow and intuitive feature of CXR. The six encodes are in sequential

transforming the shallow feature to advanced global feature. The cross-attention module is focused on the results obtained by backbone and transformer encoders as inputs

the network can combine the deep abstract feature and encoded shallow feature

and absorb both the global information and the local information in terms of the encoded shallow feature and deep abstract feature

respectively. 3) The feature decoder for segmentation can double the size of feature map and provide the segmentation results. Our network is required to deal with two tasks simultaneously for both of classification and segmentation. A hybrid loss function is employed for their training

which can balance the training efforts between classification and segmentation. The classification loss is the sum of a contrastive loss and a multi-classes cross-entropy loss. The segmentation loss is a binary cross-entropy loss. What is more

a new five-levels CXR dataset is compiled. The dataset samples are based on 2 951 CXRs of COVID-19

16 964 CXRs of healthy

6 103 CXRs of bacterial pneumonia

5 725 CXRs of viral pneumonia

and 6 723 CXRs of opaque lung. In this dataset

COVID-19 CXRs are all labeled with COVID-19 infected lung masks. In our training process

the input images were resized as 448×448 pixels

the learning rate is initially set as 2×10

-4

and decreased gradually in a self-adaptive manner

and the total number of iterations is 200

the Adam learning procedure is conducted on four Tesla K80 GPU devices.

Result

In the classification experiments

we compared ViTNet to a general transformer network and five popular CNN deep-learning models (i.e.

ResNet18

ResNet50

VGG16(Visual Geometry Group)

Inception_v3

and deep layer aggregation network(DLAN) in terms of overall prediction accuracy

recall rate

F1 and kappa evaluator. It can be demonstrated that our model has the best with 95.37% accuracy

followed by Inception_v3 and DLAN with 95.17% and 94.40% accuracy

respectively

and the VGG16 is reached 94.19% accuracy. For the recall rate

F1 and kappa value

our model has better performance than the rest of networks as well. For the segmentation experiments

ViTNet is in comparison with four commonly-used segmentation networks like pyramid scene parsing network (PSPNet)

U-Net

U-Net+ and context encoder network (CE-Net). The evaluation indicators used are derived of the accuracy

sensitivity

specificity

Dice coefficient and area under ROC(region of interest) curve (AUC). The experimental results show that our model has its potentials in terms of the accuracy and AUC. The second best sensitivity is performed inferior to U-Net+ only. More specifically

our model achieved the 95.96% accuracy

78.89% sensitivity

97.97% specificity

98.55% AUC and a Dice coefficient of 76.68%. When it comes to the network efficiency

our model speed is 0.56 s per CXR. In addition

we demonstrate the segmentation results of six COVID-19 CXR images obtained by all the segmentation networks. It is reflected that our model has the best segmentation performance in terms of the illustration of

Fig. 5

. Our model limitation is to classify a COVID-19 group as healthy group incorrectly

which is not feasible. The PCR method for COVID-19 is probably more trustable than the deep-leaning method

but the feedback duration of tested result typically needs for 1 or 2 days.

Conclusion

A novel ViTNet method is developed

which achieves the auto-diagnosis on CXR and lung region segmentation for COVID-19 infection simultaneously. The ViTNet has its priority in diagnosis performance and demonstrate its potential segmentation ability.

关键词

新冠肺炎(COVID-19)自动诊断肺部区域分割多头注意力机制混合损失

Keywords

corona virus disease 2019(COVID-19)automatic diagnosislung region segmentationmulti-head attention mechanismhybrid loss

references

Apostolopoulos I D and Mpesiana T A. 2020. Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 43(2): 635-640 [DOI: 10.1007/s13246-020-00865-4]

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]

Das D, Santosh K C and Pal U. 2020. Truncated inception net: COVID-19 outbreak screening using chest X-rays. Physical and Engineering Sciences in Medicine, 43(3): 915-925 [DOI: 10.1007/s13246-020-00888-x]

Degerli A, Ahishali M, Yamac M, Kiranyaz S, Chowdhury M E H, Hameed K, Hamid T, Mazhar R and Gabbouj M. 2021. COVID-19 infection map generation and detection from chest X-Ray images. Health Information Science and Systems, 9(1): #15 [DOI: 10.1007/s13755-021-00146-8]

Dinleyici K. 2020. Covid-normal-viral-opacity_V2 [DB/OL]. [2021-10-15].https://www.kaggle.com/kamildinleyici/covid-normal-viral-opacity-v2https://www.kaggle.com/kamildinleyici/covid-normal-viral-opacity-v2

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words: transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. Vienna, Austria: IEEE: 1-21

Farooq M and Hafeez A. 2020. COVID-ResNet: a deep learning framework for screening of COVID19 from radiographs [EB/OL]. [2020-03-31].https://arxiv.org/pdf/2003.14395.pdfhttps://arxiv.org/pdf/2003.14395.pdf

Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019.CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562]

He J, Chen J N, Liu S, Kortylewski A, Yang C, Bai Y T and Wang C H. 2021. TransFG: a transformer architecture for fine-grained recognition [EB/OL]. [2021-03-14].https://arxiv.org/pdf/2103.07976.pdfhttps://arxiv.org/pdf/2103.07976.pdf

Kang B, Guo J, Wang S, Xu B and Meng X F. 2020. Supercomputing-supported COVID-19 CT image comprehensive analysis assistant system. Journal of Image and Graphics, 25(10): 2142-2150

康波, 郭佳, 王帅, 徐波, 孟祥飞. 2020. 超级计算支撑的新冠肺炎CT影像综合分析辅助系统应用. 中国图象图形学报, 25(10): 2142-2150 [DOI: 10.11834/jig.200239]

Meng L and Li R H. 2020. Progress of artificial intelligence diagnosis and Prognosis technology for COVID-19 medical imaging. Journal of Image and Graphics, 25(10): 2058-2067

孟琭, 李镕辉. 2020. 新型冠状病毒肺炎(COVID-19)医学影像AI诊断研究进展. 中国图象图形学报, 25(10): 2058-2067 [DOI: 10.11834/jig.200222]

Owais M, Lee Y W, Mahmood T, Haider A, Haider H and Park K R. 2021. Multilevel deep-aggregated boosted network to recognize COVID-19 infection from large-scale heterogeneous radiographic data. IEEE Journal ofBiomedical and Health Informatics, 25(6): 1881-1891 [DOI: 10.1109/JBHI.2021.3072076]

Park S, Kim G, Oh Y, Seo J B, Lee S M, Kim J H, Moon S, Lim J K and Ye J C. 2021. Vision transformer using low-level chest X-ray feature corpus for COVID-19 diagnosis and severity quantification[EB/OL]. [2021-04-15].https://arxiv.org/pdf/2104.07235.pdfhttps://arxiv.org/pdf/2104.07235.pdf

Qatar-University, Tampere-University and Hamad Medical Corporation. 2021. QaTa-COV19 dataset [DB/OL]. [2021-10-15].https://www.kaggle.com/aysendegerli/qatacov19-datasethttps://www.kaggle.com/aysendegerli/qatacov19-dataset

Qatar-University and the-University-of-Dhaka. 2021. COVID-19 radiography database[DB/OL]. [2021-10-15].https://www.kaggle.com/tawsifurrahman/covid19-radiography-databasehttps://www.kaggle.com/tawsifurrahman/covid19-radiography-database

Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Kashem S B A, Islam M T, Al Maadeed S, Zughaier S M, Khan M S and Chowdhury M E H. 2021. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in Biology and Medicine, 132: #104319 [DOI: 10.1016/j.compbiomed.2021.104319]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Thakur S. 2020. Classification of COVID viral bacterial pneumonia [DB/OL]. [2021-11-05].https://www.kaggle.com/sriram-thakur/classification-of-covid-viral-bacterial-pneumoniahttps://www.kaggle.com/sriram-thakur/classification-of-covid-viral-bacterial-pneumonia

Unais S, Gokul L K V, Sunny P, Rahul B, Tarun K, Sanjana S and Kriti B. 2020. Curated dataset for COVID-19 posterior-anterior chest radiography images (X-Rays) [DB/OL]. [2021-10-15].https://data.mendeley.com/datasets/9xkhgts2s6/1https://data.mendeley.com/datasets/9xkhgts2s6/1

Vantaggiato E, Paladini E, Bougourzi F, Distante C, Hadid A and Taleb-Ahmed A. 2021. COVID-19 recognition using ensemble-CNNs in two new chest X-ray databases. Sensors, 21(5): #1742 [DOI: 10.3390/s21051742]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6000-6010

Zhang Y F, Wang C Y, Wang X G, Zeng W J and Liu W Y. 2021. FairMOT: on the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129(11): 3069-3087 [DOI: 10.1007/s11263-021-01513-4]

Zhou Z W, Siddiquee M R, Tajbakhsh N and Liang J M. 2018. UNet++: a nested U-net architecture for medical image segmentation//Proceedings of Deep Learning in Medical Image Analysis-and-Multimodal Learning for Clinical Decision Support-4th International Workshop, DLMIA 2018, and 8th International Workshop. Granada, Spain: MICCAI: 3-11

文章被引用时，请邮件提醒。

提交

多级特征引导网络的红外与可见光图像融合

嵌入双尺度分离式卷积块注意力模块的口罩人脸姿态分类

CT图像肺及肺病变区域分割方法综述