Deep-supervision and feature-aggregation network for linear lesion segmentation of high myopia
- Vol. 27, Issue 3, Pages: 961-972(2022)
Published: 16 March 2022 ,
Accepted: 04 January 2022
DOI: 10.11834/jig.210642
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 March 2022 ,
Accepted: 04 January 2022
移动端阅览
Xiao Tan, Yichao Diao, Xinjian Chen, Fei Shi, Ying Fan, Jiamin Xie, Weifang Zhu. Deep-supervision and feature-aggregation network for linear lesion segmentation of high myopia. [J]. Journal of Image and Graphics 27(3):961-972(2022)
目的
2
条纹状损伤是高度近视向病理性近视发展过程中的一种重要眼底改变。临床研究表明,在无创的眼底光学相干断层扫描(optical coherence tomography,OCT)图像中,条纹状损伤主要表现为视网膜色素上皮层—Bruch's膜—脉络膜毛细血管复合体(retinal pigment epithelium-Bruch's membrane-choriocapillario complex,RBCC)损伤,具体包括RBCC异常和近视牵引纹。由于OCT图像中条纹损伤存在目标小、边界模糊等问题,其自动分割极具挑战性。本文提出了一种基于特征聚合下采样和密集语义流监督的特征融合分割网络(deep-supervision and feature-aggregation based network,DSFA-Net),用于实现OCT图像中条纹损伤的自动分割。
方法
2
为减少网络参数量,DSFA-Net以通道减半的U-Net为基准网络,在编码器路径中设计并嵌入特征聚合下采样模块(feature aggregation pooling module,FAPM),使得网络在下采样过程中保留更多的上下文和局部信息,在解码器路径中设计并嵌入一种新的深监督模块——密集语义流监督模块(dense semantic flow supervision module,DSFSM),使得网络在解码过程中聚合不同分辨率图像之间的细节和语义信息。
结果
2
方法在上海市第一人民医院提供的751幅2维眼底OCT B扫描图像上进行验证与评估。相比于U-Net,本文方法在参数量降低53.19%的情况下,平均Dice相似系数(Dice similarity coefficient,DSC)、Jaccard和敏感度指标分别提高了4.30%、4.60%和2.35%。与多种较流行的语义分割网络相比,本文网络在保持最小网络参数量的同时,分割性能取得了较明显提升。为了更加客观地评估FAPM与DSFSM模块的性能,本文同时进行了一系列消融实验。在基准网络添加FAPM模块后,平均DSC、Jaccard和敏感度指标分别提高了1.05%、1.35%和3.35%。在基准网络添加DSFSM模块后,平均DSC、Jaccard和敏感度指标分别提高了4.90%、5.35%和5.90%。在基准网络添加FAPM与DSFSM模块后,平均DSC、Jaccard和敏感度指标分别提高了6.00%、6.45%和5.50%。消融实验的结果表明,本文提出的DSFSM和FAPM模块可以有效提升网络的分割性能。
结论
2
本文提出的DSFA-Net提升了眼底OCT图像中条纹损伤的分割精度,具备潜在的临床应用价值。
Objective
2
Linear lesion is an important symptom in the progressive development of high myopia to pathological myopia. Clinical studies have shown that linear lesion appears as retinal pigment epithelium-Bruch's membrane-choriocapillaris complex (RBCC) disruption in non-invasive retinal optical coherence tomography (OCT) images; it includes RBCC disorder and myopic stretch line. Recently
convolutional neural networks (CNNs) have demonstrated excellent performance on computer vision tasks
and many convolutional neural network based methods have been applied for medical segmentation tasks. However
the automatic segmentation of linear lesion is extremely challenging due to the small target attribution and blurred boundary problem. To tackle this issue
a novel deep-supervision and feature-aggregation based network (DSFA-Net) is proposed for the segmentation of linear lesion in OCT image with high myopia.
Method
2
To reduce the network parameters
the proposed DSFA-Net considers the U-Net with half channels the baseline. A novel feature aggregation pooling module (FAPM) is proposed and embedded in the encoder path to preserve more details for small targets. It can aggregate the contextual information and local spatial information during the downsampling operation. FAPM is performed using two steps. First
the input feature map is parallel fed into three pathways. The first two pathways contain a horizontal and vertical strip pooling layer followed by a 1D convolutional layer with kernel size of 1×3 and 3×1 and a reshape layer to capture contextual information. The third pathway contains a 2D convolutional layer with kernel size 7×7 followed by a sigmoid function to enable each pixel to obtain a normalized weight between 0 and 1. These weights are multiplied with the original input feature and fed into a reshape layer to capture the local spatial information. Second
the output features of these pathways are combined by the element-wise addition to obtain the aggregated output feature. A novel dense semantic flow supervision module (DSFSM) is proposed and embedded in the decoder path to aggregate the details and semantic information between features with different resolutions during the feature decoding. This approach combines the advantages of deep supervision and dense semantic flow supervision strategy and increases the effective feature maps in the hidden layers of the network. The proposed DSFA-Net is implemented and trained based on Python3.8 and Pytorch with NVDIA TITAN X GPU
Intel i7-9700KF CPU. The initial learning rate is set to 0.001
and the batch size is set to 2. Stochastic gradient descent (SGD) with a momentum of 0.9 and weight decay of 0.000 1 is adopted as the optimizer. Binary cross entropy (BCE) loss and Dice loss are combined as the total loss function for the proposed DSFA-Net because linear lesion has variant sizes
which cause data unbalance problem.
Result
2
The proposed DSFA-Net was evaluated on 751 2D retinal OCT B-scan images provided by the First People's Hospital Affiliated to Shanghai Jiao Tong University. The size of each OCT B-scan image is 256×512 pixels. The ground truth is manually annotated under the supervision of the experienced ophthalmologists. Compared with the original U-Net
the proposed DSFA-Net decreases the network parameter number by 53.19%
and the average Dice similarity coefficient (DSC)
Jaccard
and sensitivity indicators increase by 4.30%
4.60%
and 2.35%
respectively. Compared with the seven other existing image semantic segmentation networks
such as CE-Net
SegNet
and Attention-UNet
the proposed DSFA-Net has achieved state-of-the-art segmentation performance while maintaining the minimum amount of network parameters. Several ablation experiments have been designed and conducted to evaluate the performance of the proposed FAPM and DSFSM modules. With the embedding of FAPM into the encoder path of the baseline (baseline+FAPM)
the average DSC
Jaccard
and sensitivity indicators increase by 1.05%
1.35%
and 3.35%
respectively. With the embedding of DSFSM into the decoder path of the baseline (baseline+DSFSM)
the average DSC
Jaccard
and sensitivity indicators increase by 4.90%
5.35%
and 5.90%
respectively. With the embedding of FAPM and DSFSM into the baseline (the proposed DSFA-Net)
the average DSC
Jaccard
and sensitivity indicators increase by 6.00%
6.45%
and 5.50%
respectively. The results of the ablation experiment show that the proposed FAPM and DSFSM modules can effectively improve the segmentation performance of the network.
Conclusion
2
We propose a novel deep-supervision and feature-aggregation based network for the segmentation of linear lesion in OCT image with high myopia. The proposed FAPM and DSFSM modules can be inserted into convolutional neural networks conveniently. The experimental results prove that the proposed DSFA-Net improves the accuracy of linear lesion segmentation in retinal OCT images
indicating the potential clinical application value.
高度近视条纹损伤光学相干断层扫描(OCT)深监督特征聚合卷积神经网络(CNN)医学图像分割
high myopialinear lesionoptical coherence tomography(OCT)deep supervisionfeature aggregationconvolutional neural network(CNN)medical image segmentation
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495 [DOI: 10.1109/TPAMI.2016.2644615]
Boureau Y L, Bach F, LeCun Y and Ponce J. 2010. Learning mid-levelfeatures for recognition//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2559-2566 [DOI: 10.1109/CVPR.2010.5539963http://dx.doi.org/10.1109/CVPR.2010.5539963]
Boer P D, Kroese D P, Mannor S and Rubinstein R Y. 2005. A tutorial on the cross-entropy method. Annals of operations research, 134(1): 19-67 [DOI: 10.1007/s10479-005-5724-z]
Chen J N, Lu Y Y, Yu Q H, Luo X D, Adeli E, Wang Y, Lu L, Yuille A L and Zhou Y Y. 2021. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. [2021-07-19].https://arxiv.org/pdf/2102.04306.pdfhttps://arxiv.org/pdf/2102.04306.pdf
Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2021-07-19].https://arxiv.org/pdf/1706.05587.pdfhttps://arxiv.org/pdf/1706.05587.pdf
Fang Y X, Yokoi T, Nagaoka N, Shinohara K, Onishi Y, Ishida T, Yoshida T, Xu X, Jonas J B and Ohno-Matsui K. 2018. Progression of myopic maculopathy during 18-year follow-up. Ophthalmology, 125(6): 863-877 [DOI: 10.1016/j.ophtha.2017.12.005]
Feng S L, Zhao H M, Shi F, Cheng X N, Wang M, Ma Y H, Xiang D H, Zhu W F and Chen X J. 2020. CPFNet: context pyramid fusion network for medical image segmentation. IEEE Transactions on Medical Imaging, 39(10): 3008-3018 [DOI: 10.1109/TMI.2020.2983721]
Gao Z T, Wang L M and Wu G S. 2019. LIP: local importance-based pooling//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3354-3363 [DOI: 10.1109/ICCV.2019.00345http://dx.doi.org/10.1109/ICCV.2019.00345]
Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019. CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562]
Hou Q B, Zhang L, Cheng M M and Feng J S. 2020. Strip pooling: rethinking spatial pooling for scene parsing//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4002-4011 [DOI: 10.1109/CVPR42600.2020.00406http://dx.doi.org/10.1109/CVPR42600.2020.00406]
Huang S S, Zheng Y F, Foster P J, Huang W Y and He M G. 2009. Prevalence and causes of visual impairment in Chinese adults in urban southern China: the Liwan Eye Study. Archives of Ophthalmology, 127(10): 1362-1367 [DOI: 10.1001/archophthalmol.2009.138]
Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2015. Spatial transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2017-2025
Jiang H J, Chen X J, Shi F, Ma Y H, Xiang D H, Ye L, Su J Z, Li Z Y, Chen Q Y, Hua Y H, Xu X, Zhu W F and Fan Y. 2019. Improved cGAN based linear lesion segmentation in high myopia ICGA images. Biomedical Optics Express, 10(5): 2355-2366 [DOI: 10.1364/BOE.10.002355]
Li X T, You A S, Zhu Z, Zhao H L, Yang M K, Yang K Y, Tan S H and Tong Y H. 2020. Semantic flow for fast and accurate scene parsing//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 775-793 [DOI: 10.1007/978-3-030-58452-8_45http://dx.doi.org/10.1007/978-3-030-58452-8_45]
Mariotti S P, Kocur I, Resnikoff S, Jong M, Naidoo K S, He M G, Holden B A, Salomão S R, Sankaridurg P, Jonas J B, Saw S M, Smith E L Ⅲ, Kedir J, Trier K, Wong T Y, Minto H, Yekta A A, Vitale S, Morgan I G, Ohno-Matsui K, Pärssinen O, Rao G and Zhao J L. 2015. The impact of myopia and high myopia: report of the Joint World Health Organization-Brien Holden Vision Institute Global Scientific Meeting on Myopia[EB/OL]. [2021-07-19].https://www.researchgate.net/publication/318216691https://www.researchgate.net/publication/318216691
Marr D and Vaina L. 1982. Representation and recognition of the movements of shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences, 214(1197): 501-524 [DOI: 10.1098/rspb.1982.0024]
Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 565-571 [DOI: 10.1109/3DV.2016.79http://dx.doi.org/10.1109/3DV.2016.79]
Nagi J, Ducatelle F, Di Caro G A, Cireşan D, Meier U, Giusti A, Nagi F, Schmidhuber J and Gambardella L M. 2011. Max-pooling convolutional neural networks for vision-based hand gesture recognition//Proceedings of 2011 IEEE International Conference on Signal and Image Processing Applications. Kuala Lumpur, Malaysia: IEEE: 342-347 [DOI: 10.1109/ICSIPA.2011.6144164http://dx.doi.org/10.1109/ICSIPA.2011.6144164]
Ohno-Matsui K, Yoshida T, Futagami S, Yasuzumi K, Shimada N, Kojima A, Tokoro T and Mochizuki M. 2003. Patchy atrophy and lacquer cracks predispose to the development of choroidal neovascularisation in pathological myopia. British Journal of Ophthalmology, 87(5): 570-573 [DOI: 10.1136/bjo.87.5.570]
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N Y, Kainz B, Glocker B and Rueckert D. 2018. Attention U-Net: learning where to look for the pancreas [EB/OL]. [2021-07-19].https://arxiv.org/pdf/1804.03999.pdfhttps://arxiv.org/pdf/1804.03999.pdf
Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Ruder S. 2016. An overview of gradient descent optimization algorithms [EB/OL]. [2021-07-19].https://arxiv.org/pdf/1609.04747.pdfhttps://arxiv.org/pdf/1609.04747.pdf
Salamon J and Bello J P. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3): 279-283 [DOI: 10.1109/LSP.2017.2657381]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651 [DOI: 10.1109/tpami.2016.2572683]
Shinohara K, Moriyama M, Shimada N, Tanaka Y and Ohno-Matsui K. 2014. Myopic stretch lines: linear lesions in fundus of eyes with pathologic myopia that differ from lacquer cracks. Retina, 34(3): 461-469 [DOI: 10.1097/IAE.0b013e3182a6b494]
Tang Y T, Wang X F, Wang J C, Huang W, Gao Y P, Luo Y and Lu Y. 2015. Prevalence and causes of visual impairment in a Chinese adult population: the Taizhou Eye Study. Ophthalmology, 122(7): 1480-1488 [DOI: 10.1016/j.ophtha.2015.03.022]
Tokoro T. 1988. On the definition of pathologic myopia in group studies. Acta Ophthalmologica, 66(S185): 107-108 [DOI: 10.1111/j.1755-3768.1988.tb02681.x]
Wang L H, Huang W Y, He M, Zheng Y F, Huang S S, Liu B, Jin L, Congdon N G and He M G. 2013. Causes and five-year incidence of blindness and visual impairment in urban southern China: the Liwan Eye Study. Investigative Ophthalmology and Visual Science, 54(6): 4117-4121 [DOI: 10.1167/iovs.13-11911]
Wang L W, Lee C Y, Tu Z W and Lazebnik S. 2015. Training deeper convolutional networks with deep supervision [EB/OL]. [2021-07-19].https://arxiv.org/pdf/1505.02496.pdfhttps://arxiv.org/pdf/1505.02496.pdf
Zhang X L, Fu P F, Zhao Y J, Xie H and Wang W R. 2020. Point clouddata classification and segmentation model using graph CNN and different pooling functions. Journal of Image and Graphics, 25(6): 1201-1208
张新良, 付鹏飞, 赵运基, 谢恒, 王琬如. 2020. 融合图卷积和差异性池化函数的点云数据分类分割模型. 中国图象图形学报, 25(6): 1201-1208 [DOI: 10.11834/jig.190367]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239 [DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
相关文章
相关作者
相关机构