注意力机制下密集空洞卷积的肺部图像分割

郭宁; 柏正尧

doi:10.11834/jig.200429

计算机断层扫描图像 | 浏览量 : 0 下载量: 102 CSCD: 3

PDF
导出
分享
收藏
专辑

注意力机制下密集空洞卷积的肺部图像分割
The integration of attention mechanism and dense atrous convolution for lung image segmentation
2021年26卷第9期页码：2146-2155
收稿：2020-08-05，

修回：2020-9-22，

录用：2020-9-29，

纸质出版：2021-09-16
DOI： 10.11834/jig.200429
稿件说明：

移动端阅览

郭宁, 柏正尧. 注意力机制下密集空洞卷积的肺部图像分割[J]. 中国图象图形学报, 2021,26(9):2146-2155. DOI： 10.11834/jig.200429.

Ning Guo, Zhengyao Bai. The integration of attention mechanism and dense atrous convolution for lung image segmentation[J]. Journal of Image and Graphics, 2021, 26(9): 2146-2155. DOI： 10.11834/jig.200429.

摘要

目的

卷积神经网络（convolutional neural network，CNN）在计算机辅助诊断（computer-aided diagnosis，CAD）肺部疾病方面具有广泛的应用，其主要工作在于肺部实质的分割、肺结节检测以及病变分析，而肺实质的精确分割是肺结节检出和肺部疾病诊断的关键。因此，为了更好地适应计算机辅助诊断系统要求，提出一种融合注意力机制和密集空洞卷积的具有编码—解码模式的卷积神经网络，进行肺部分割。

方法

将注意力机制引入网络的解码部分，通过增大关键信息权重以突出目标区域抑制背景像素干扰。为了获取更广更深的语义信息，将密集空洞卷积模块部署在网络中间，该模块集合了Inception、残差结构以及多尺度空洞卷积的优点，在不引起梯度爆炸和梯度消失的情况下，获得了更深层次的特征信息。针对分割网络常见的特征丢失等问题，对网络中的上/下采样模块进行改进，利用多个不同尺度的卷积核级联加宽网络，有效避免了特征丢失。

结果

在LUNA（lung nodule analysis）数据集上与现有5种主流分割网络进行比较实验和消融实验，结果表明，本文模型得到的预测图更接近于标签图像。Dice相似系数、交并比（intersection over union，IoU）、准确度（accuracy，ACC）以及敏感度（sensitivity，SE）等评价指标均优于对比方法，相比于性能第2的模型，分别提高了0.443%，0.272%，0.512%以及0.374%。

结论

本文提出了一种融合注意力机制与密集空洞卷积的肺部分割网络，相对于其他分割网络取得了更好的分割效果。

Abstract

Objective

As an important criterion for the diagnosis of early-stage lung cancer

chest computed tomography (CT) images-based pulmonary nodules detection have been implemented via location observation

scope and shape of the lesions. The CT image has been analyzed lung organizational structures like the lung parenchyma and the contextual part

such as hydrops

trachea

bronchus

and ribs. CT images-based lung parenchyma has been hard to interpret automatically and precisely. The precise extraction of lung parenchyma has played a vital role in lung-based diseases analyses. Most of lung segmentation have been conducted based on regular image processing algorithms like threshold or morphological operation. The convolutional neural networks (CNNs) have been used in computerized pulmonary disease analysis. CNN-driven lung segmentation algorithms have been adopted in computer-aided diagnosis (CAD). The U-shape structure has been designed for medical image segmentation based on end-to-end fully convolutional network (FCN) structure. The credibility for biomedical image segmentations have been realized based on the encoding and decoding symmetric network structure. A novel convolutional neural network based on U-Net architecture has been illustrated via integrating attention mechanism and dense atrous convolution (DAC).

Method

The network has contained an encoder and a decoder. The encoder has consisted of convolution and down sampling. The deductible spatial dimension of feature maps have been used to learn more semantic information. And the attention mechanism decoder has been implemented for de-convolution and up-sampling to re-configure the spatial dimension of the feature maps. The decoding mode using attention mechanism has been manipulated to make the target area output more effectively. Meanwhile

the algorithm of lung image segmentation has been used to identify the target-oriented neural network's attention using transmitted skip-connection to improve the weight of the salient feature. The feature resolution capability has been enhanced to the requirements for intensive spatial prediction via pooling consecutive operations and convolution striding. The DAC block has been deployed between the encoder and the decoder to extract multi-scale information of the context sufficiently. The advantages of Inception

ResNet and atrous convolution for the block have been inherited to capture multi-sized features consequently. The max-pooling and up-sampling operators have been utilized to reduce and increase the resolution of feature maps intensively based on the classic U-Net framework

which could lead to feature loss and accuracy reduced problems during training. The original max-pooling and up-sampling operators have been replaced via down-sample and up-sample block with inception structure to widen the multi-filters network and avoid feature loss. The Dice coefficient loss function has been used instead of the cross entropy loss to identify the gap between prediction and ground-truth. The deep learning framework Pytorch have been used on a server with two NVIDIA GeForce RTX 2080Ti graphics cards and each GPU has 11 Gigabyte memory. At the experimental stage

the original images have been resized to 256×256 pixels and 80% of these for training besides the test remaining. The proposed model has been trained for 120 epochs. Based on an initial learning rate of 0.000 1

the Adam has been opted as the optimization algorithm.

Result

In order to verify the efficiency of the proposed method

we conduct multi-compatible verifications called FCN-8 s

U-Net

UNet++

ResU-Net and CE-Net (context encoder network) have been conducted. Four segmentation metrics have been adopted to assess the segmentation. These metrics has evolved the Dice similarity coefficient (DSC)

the intersection over union (IoU)

sensitivity (SE) and accuracy (ACC). The experimental results on the LUNA16 dataset have demonstrated the priorities in terms of all metrics results. The average Dice similarity coefficient has reached 0.985 9

which has 0.443% higher than the segmentation results of the second-performing CE-Net. The model consequence has achieved 0.972 2

0.993 8

and 0.982 2 each in terms of IoU

ACC and SE. This second qualified segmentation performance has reached: 0.272%

0.512% and 0.374% each (more better). Compared with other algorithms

the predictable results of modeling has closer to the label made. The adhesive difficulties on the left and right lung cohesion issue have been resolved well.

Conclusion

An encoded/decoded structure in novel convolutional neural network has been integrated via attention mechanism and dense atrous convolution for lungs segmentation. The experiment results have illustrated that the qualified and effective framework for segmenting the lung parenchyma area have its own priority.

关键词

Keywords

references

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848[DOI:10.1109/TPAMI.2017.2699184]

Glorot X, Bordes A and Bengio Y. 2011. Deep sparse rectifier neural networks[EB/OL]. [2020-7-29] . http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019. CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292[DOI:10.1109/TMI.2019.2903562]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. [2020-07-29] . https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf

Kingma D P and Ba J L. 2015. Adam: a method for stochastic optimization[EB/OL]. [2020-07-29] . https://arxiv.org/pdf/1412.6980.pdf https://arxiv.org/pdf/1412.6980.pdf

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651[DOI:10.1109/CVPR.2015.7298965]

Luong M, Pham H and Manning C D. 2015. Effective approaches to attention-based neural machine translation//Proceedings of 2015 Conference on Empirical Methods in Natural Language. Lisbon, Portugal: ACL: 1412-1421[ DOI:10.18653/v1/d15-1166 http://dx.doi.org/10.18653/v1/d15-1166 ]

Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 565-571[ DOI:10.1109/3DV.2016.79 http://dx.doi.org/10.1109/3DV.2016.79 ]

Oktay O, Schlempe J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N Y, Kainz B, Glocker B and Rueckert D. 2018. Attention U-Net: learning where to look for the pancreas[EB/OL]. [2020-07-29] . https://arxiv.org/pdf/1804.03999.pdf https://arxiv.org/pdf/1804.03999.pdf

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Navab N, Hornegger J, Wells W and Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention. Lecture Notes in Computer Science, Vol. 9351. Munich, Germany: Springer: 234-241[ DOI:10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]

Shojaii R, Alirezaie J and Babyn P. 2005. Automatic lung segmentation in CT images using watershed transform//Proceedings of 2005 IEEE International Conference on Image Processing. Genova, Italy: IEEE: 1270-1273[ DOI:10.1109/ICIP.2005.1530294 http://dx.doi.org/10.1109/ICIP.2005.1530294 ]

Skourt B A, El Hassani A and Majda A. 2018. Lung CT image segmentation using deep neural networks. Procedia Computer Science, 127: 109-113[DOI:10.1016/j.procs.2018.01.104]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI:10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2818-2826[ DOI:10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ]

Szegedy C, Ioffe S, Vanhoucke V and Alemi A. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL]. [2020-7-29] . https://arxiv.org/pdf/1602.07261.pdf https://arxiv.org/pdf/1602.07261.pdf

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2020-7-29] . https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf

Yuan K H and Xiang L X. 2011. Automated lung segmentation for chest CT images used for computer aided diagnostics. Journal of Tsinghua University (Science and Technology), 51(1): 90-95

袁克虹, 向兰茜. 2011. 用于计算机辅助诊断的肺实质自动分割方法. 清华大学学报(自然科学版), 51(1): 90-95[DOI:10.16511/j.cnki.qhdxxb.2011.01.018]

Zhang Z, Wu C D, Coleman S and Kerr D. 2020. DENSE-INception U-net for medical image segmentation. Computer Methods and Programs in Biomedicine, 192: #105395[ DOI:10.1016/j.cmpb.2020.105395 http://dx.doi.org/10.1016/j.cmpb.2020.105395 ]

Zhou Z W, Rahman Siddiquee M, Tajbakhsh N and Liang J M. 2018. UNet++: a nested U-net architecture for medical image segmentation//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Lecture Notes in Computer Science, vol. 11045. Switzerland: Springer: 3-11[ DOI:10.1007/978-3-030-00889-5_1 http://dx.doi.org/10.1007/978-3-030-00889-5_1 ]