
发布时间: 2020-08-16
DOI: 10.11834/jig.190644
2020 | Volume 25 | Number 8




expand article info 张子振1,2, 刘明1,2, 朱德江1,2
1. 河北大学电子信息工程学院, 保定 071000;
2. 河北省数字医疗工程重点实验室, 保定 071000


目的 糖尿病视网膜病变(diabetic retinopathy,DR)是一种病发率和致盲率都很高的糖尿病并发症。临床中,由于视网膜图像不同等级之间差异性小以及临床医生经验的不同,会出现误诊、漏诊等情况,目前基于人工DR的诊断分类性能差且耗时费力。基于此,本文提出一种融合注意力机制(attention mechanism)和高效率网络(high-efficiency network,EfficientNet)的DR影像自动分类识别方法,以此达到对病变类型的精确诊断。方法 针对实验中DR数据集存在的问题,进行剔除、去噪、扩增和归一化等处理;利用EfficientNet进行特征提取,采用迁移学习的策略用DR的数据集对EfficientNet进行学习与训练,提取深度特征。为了解决病变之间差异小的问题,防止网络对糖尿病视网膜图像的特征学习时出现错分等情况,在EfficientNet输出结果上加入注意力机制;根据网络提取的特征在深度分类器中进行分类,将视网膜图像按等级进行五分类。结果 本文方法的分类精度、敏感性、特异性和二次加权(kappa)值分别为97.2%、95.6%、98.7%和0.84,具有较好的分类性能及鲁棒性。结论 基于融合注意力机制的高效率网络(attention EfficientNet,A-EfficientNet)的DR分类算法有效地提高了DR筛查效率,解决了人工分类的手动提取特征的局限性,在临床上对医生诊断起到了辅助作用,能更有效地防治此类恶性眼疾造成严重视力损伤、甚至失明。


糖尿病视网膜病变分类; 高效率网络; 注意力机制; 深度学习; 迁移学习; 深度特征

Automatic recognition and classification of diabetic retinopathy images by combining an attention mechanism and an efficient network
expand article info Zhang Zizhen1,2, Liu Ming1,2, Zhu Dejiang1,2
1. College of Electronic and Information Engineering, Hebei University, Baoding 071000, China;
2. Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, China
Supported by: National Natural Science Foundation of China (61473112)


Objective Diabetic retinopathy (DR) is a diabetic complication with high incidence and blindness rate. The diagnosis of DR through color fundus images requires experienced clinicians to identify the presence and significance of many small features. This requirement, along with a complex grading system, makes the diagnosis a difficult and time-consuming task. The small difference among various grades of retinal images and the limited experience of clinicians usually lead to misdiagnosis and missed diagnosis. The large population of diabetic patients and their massive screening requirements have generated interest in a computer-aided and fully automatic diagnosis of DR. The present diagnostic classification of artificial DR is poor, time consuming, and laborious. On this basis, an automatic classification method for DR with a high-efficiency network (EfficientNet) incorporating an attention mechanism (A-EfficientNet) is proposed in this study. The goal of accurate classification is achieved by using the strategy of transfer learning to guide the classification model to learn the features among lesions. Method The characteristics of DR dataset present that noise in images and labels is unavoidable in raw dataset. This condition indicates a high demand for the robustness of our classification system. Preprocessing of retinal images mainly includes retinal image culling, noise reduction, enhancement, and normalization of retinal images. The images after such preprocessing will improve the performance of the classification network, and the computing power of the computer will also be accelerated. We use EfficientNet as the basic network for feature extraction. The migration-learning strategy is adopted to learn and train A-EfficientNet with DR datasets. We design an attention mechanism in EfficientNet to address the problem of small differences among lesions in the dataset and prevent the network from misclassifying fine-grained features in retinal image categories. The mechanism not only can extract features from fundus images but also can focus on the lesion area. The EfficientNet classification model integrated with the attention mechanism(A-EfficientNet) can match images with image label categories to achieve the classification task of retinal images. With fully connected layers and softmax, the model is used to learn for classifying DR status as DRfree, mild DR, moderate DR, severe DR, and proliferative DR. Result Experimental results show that, in the case of insufficient samples of retinal images, the use of transfer learning and data enhancement strategies can help the classification model extract deep features for classification. A-EfficientNet can learn the features in the training samples and pay attention to the differences among various categories of features to achieve accurate classification. The reference to the attention mechanism greatly improves the performance of the classification network. In summary, a high-efficiency network (EfficientNet) incorporating an attention mechanism is proposed in this study. The network plays a positive role in extracting lesion features at various stages of images. The classification accuracy of the model for lesions reaches 97.2%. To improve the classification efficiency, the results of the two-category model indicates sensitivity of 95.6% on the high-specificity operating point and specificity of 98.7% on the high-sensitivity operating point. The result shows that the model achieves a kappa score of 0.840, which is higher than that of the existing non ensemble model. Therefore, the classification model based on the attention mechanism and EfficientNet can accurately distinguish the lesion types of retinopathy. Transfer-learning strategies and data augmentation provide accurate information and assistance for the accurate classification of fundus images. A-EfficientNet reflects the good classification performance of the model and the robustness of the network-learning ability. Conclusion In this study, we use A-EfficientNet to realize automatic classification of retinal images. We also propose a new classification framework for DR, which mainly benefits from four parts, namely, the image-preprocessing stage, the transfer-learning strategy, the introduction of data enhancement, and A-EfficientNet for feature extraction. The dataset used in the experiment has problems, such as noise, artifacts, and unsuccessful focusing. After such problematic data are removed, the dataset is denoised and normalized to facilitate the classification network for calculating and learning the data. When the training samples are insufficient, the introduction of data enhancement meets the data requirements of the deep learning classification model, satisfies the model's learning of the data, and improves the generalization ability of the model. The introduction of an attention mechanism based on EfficientNet can improve the classification performance of the algorithm. Therefore, the automatic classification algorithm for DR incorporating the attention mechanism of EfficientNet can effectively improve the efficiency of DR screening, avoid the limitations of artificial feature extraction and image classification, and solve the problem of over fitting caused by insufficient sample data. The experimental results show that this method can be used to diagnose DR. It can operate efficiently without marking the information of suspicious lesions and thus avoids the time-consuming work of labeling lesions by experts and reduces false positives. The classification model can also be referenced in the classification tasks of other images.

Key words

mutil-classification of diabetic retinopathy; high-efficiency network (EfficientNet); attention mechanism; deep learning; transfer learning; deep features

0 引言

糖尿病视网膜病变(diabetic retinopthy, DR)是由于长期患有糖尿病引起的一种眼底视网膜并发症(Laud等,2017)。据世界卫生组织统计,全球约有4.25亿人患有糖尿病,其中约1/3的糖尿病患者发生了糖尿病视网膜病变。据研究显示(Flaxman等,2017),DR的早期阶段几乎没有迹象或症状,致使许多患者错过了治疗的最佳时间。此外,由于DR检查人数过多,给医生带来诊断压力,导致病变诊断得不到及时反馈,从而出现漏诊、误诊等问题。因此,DR的早期发现和及时治疗至关重要。糖尿病视网膜病变在不同阶段会出现不同的病理特征,最终会对眼部造成损伤而导致失明。计算机辅助自动诊断在临床上具有很大潜力,可以在短时间内准确检测DR,帮助医生提高DR的筛查率并减少失明的情况。


深度学习在视网膜图像检测和分类任务上取得了显著成果(Krizhevsky等,2017李琼等,2018Voets等,2018Li等,2017Gargeya和Leng,2017Wang等,2018Graham,2015Doshi等,2016Zhou等,2018)。Gulshan等人(2016)采用InceptionV3深度模型,利用包括12.8万多幅眼底图像的数据集检测DR,得益于大量的训练数据和眼底专家对眼底图像的筛选,在两个不同测试集上获得的AUC(area under curve)值非常高。Li等人(2017)采用迁移学习的方法对DR进行二分类检测,将预训练的卷积神经网络(convolutional neural network,CNN)模型进行微调,在对视网膜图像特征提取的基础上,通过支持向量机对提取的特征进行训练分类,在MESSIDOR(methods for evaluating segmentation and indexing techniques dedicated to retinal ophthalmology)(French ministry,2014)和DRiDB数据库上进行验证,实现了二分类任务。Gargeya等人(2017)利用残差网络(ResNet)(He等,2016)加决策树分类器训练二分类模型来区分患病和正常健康的图像,在MESSIDOR-2等数据集上进行了验证。Wang等人(2018)用DenseNet121网络进行特征提取,通过机器学习提升树算法进行预测,经过数据集测试,验证了模型具有更好的性能。Graham(2015)等在Kaggle(2015)竞赛上对DR分类取得了不错的结果。



目前,对DR进行五分类检测对临床有积极的辅助作用。Doshi等人(2016)利用深度卷积神经网络训练数据集,实现了DR的诊断和五分类任务,并在Kaggle数据集上进行了验证。由于采用GPU(graphics processing unit)加速训练,使得训练速度得到提升,但是效果没有得到显著提高。Zhou等人(2018)利用多单元结构对高像素的眼底影像进行多任务学习,通过分类和回归的形式预测标签,并在EyePACs数据集上验证了该方法的有效性,但是仅在高分辨率图像上有明显效果,对分辨率低的图像无法精确识别与分类。Pratt等人(2016)利用数据扩增和卷积神经网络在Kaggle数据集上进行了DR的五分类,取得了一定进步。

基于上述分析,为了提高对DR的分类准确性和高效性,本文提出一种融合注意力机制的高效率网络(attention EfficientNet, A-EfficientNet)的分类算法对眼底图像进行自动分类,利用EfficientNet网络(Tan和Le,2019)通过优化网络宽度、网络深度和增大分辨率来达到提升指标的优点,在达到现有分类网络准确率的情况下,大幅度地减少模型参数量和计算量,提高了模型的泛化能力。特别地,在网络训练过程中加入注意力机制,根据不同病变的注意力映射与输入的特征映射做乘操作来对提取的特征进行自适应学习,从而对眼底细粒度图像病变之间的差异性起到了关注作用,抑制其他无关信息对网络学习的影响。

1 本文方法

基于融合注意力机制和高效率网络的糖尿病视网膜病变分类模型如图 1所示,通过对糖尿病视网膜病变图像进行数据预处理和数据增强等操作来提高数据量,以此完成分类模型的特征提取与分类。

图 1 融合注意力机制的高效率分类模型
Fig. 1 Classification network combining attention mechanism and EfficientNet model

1.1 迁移学习


1.2 高效率网络



$ \mathit{\boldsymbol{N}} = \mathop \otimes \limits_{i = 1,2, \cdots ,s} {\mathit{\boldsymbol{F}}^{{L_i}}}({\mathit{\boldsymbol{X}}_{[{H_i},{W_i},{C_i}]}}) $ (1)

式中,$\boldsymbol{N}$表示分类网络,$⊗$代表卷积操作,$\boldsymbol{X}$表示输入张量,$\boldsymbol{}F$表示基础网络层,$i$表示卷积层数,$L_{i}$表示网络的深度。该网络通过调整3个维度(高度$H$,宽度$W$,通道数$C$)进行优化,需要找到最优的3个维度的缩放参数。以此在满足模型参数和计算量达到最大化时,使模型的准确率有所提高。模型的最大精度记作$Acc_{\max}(\boldsymbol{N}(d, w, r))$,具体公式如式(2)。

$ \begin{array}{*{20}{l}} {\mathit{\boldsymbol{N}}(d,w,r) = \otimes \mathop{ \boldsymbol{F}}\limits^ \wedge {{\kern 1pt} ^{d \times \mathop {{L_i}}\limits^ \wedge }}({\mathit{\boldsymbol{X}}_{[r \times {{\hat H}_i},r \times {{\hat W}_i},w \times {{\hat C}_i}]}})}\\ {{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i = 1,2, \cdots ,s} \end{array} $ (2)


$ {\alpha ^2} \times {\beta ^2} \times {\gamma ^2} \approx 2,\alpha \ge 1,\beta \ge 1,\gamma \ge 1 $ (3)

为了得到能满足式(2)的3个维度参数,使用复合参数$φ$对网络的深度、宽度以及分辨率进行优化。首先固定$φ$= 1,然后通过网格搜索找到满足式(3)的最优$α$$β$$γ$参数,经过实验调整得出$α$=2.2,$β=1.6$$γ=1.16$


1.3 注意力机制

医学图像往往含有许多无关信息,可能会干扰决策。在糖尿病视网膜病变五分类任务中,像小动脉瘤这样的微小特征、渗出物等类别之间差异小的细粒度图像对于医生的DR分级是至关重要的。加入注意力机制后,在进行特征提取时能够让网络自主选择需要关注的区域,这样在学习的过程中,能将糖尿病视网膜图像的病变类型进行关注与学习,可以对模型提取的特征进行选择,以此获取更多具有描述性的信息,这样能模仿临床医生专注于关键特征,对临床诊断起到辅助作用。注意力机制网络主要是把EfficientNet的输出特征映射$\boldsymbol{F}$作为输入,然后放入含有3个$1×1$的卷积层中并且加入激活函数ReLU和Sigmoid,将输入转换成非线性特征。具体网络如图 2所示。

图 2 注意力机制
Fig. 2 Attention mechanism

加入该网络的目的就是生成注意力映射$\boldsymbol{A}$。将特征映射$\boldsymbol{F}$与注意力映射$\boldsymbol{A}$相乘,目的是生成图像的掩码$\boldsymbol{M}$。为了减少网络的参数及避免过拟合,在图像掩码$\boldsymbol{M}$和注意力映射$\boldsymbol{A}$上分别使用全局平均池化(global average pooling,GAP)。最后,使用除操作获得图像的权重,并过滤掉不相关的信息。注意力机制的输出为

$ \mathit{\boldsymbol{O}} = GAP({\mathit{\boldsymbol{A}}^l})/GAP({\mathit{\boldsymbol{A}}^l} \times {\mathit{\boldsymbol{F}}^l}) $ (4)

式中,$\boldsymbol{A}^{l}$$\boldsymbol{F}^{l}$分别代表第$l$层注意力映射和第$l$层特征映射, $GAP$( )为全局平均池化操作。

1.4 训练算法


在Adam优化器中,损失函数$L$的梯度为$\boldsymbol{g}$、1阶和2阶矩阵估计分别为$\boldsymbol{s}$$\boldsymbol{r}$,在设置的固定学习步长$\varepsilon $下进行最优解的优化,具体推导为

$ {\mathit{\boldsymbol{g}} \leftarrow + \frac{1}{m}{\nabla _\theta }\sum\limits_i L (f({x_i};\theta ),{y_i})} $ (5)

$ {\mathit{\boldsymbol{s}} \leftarrow {\rho _1}\mathit{\boldsymbol{s}} + (1 - {\rho _1})\mathit{\boldsymbol{g}}} $ (6)

$ {\mathit{\boldsymbol{r}} \leftarrow {\rho _2}\mathit{\boldsymbol{r}} + (1 - {\rho _2})\mathit{\boldsymbol{g}} \otimes \mathit{\boldsymbol{g}}} $ (7)

$ \mathit{\boldsymbol{\hat s}} \leftarrow \frac{\mathit{\boldsymbol{s}}}{{1 - {\rho _1}t}} $ (8)

$ \mathit{\boldsymbol{\hat r}} \leftarrow \frac{\mathit{\boldsymbol{r}}}{{\rho _2^t}} $ (9)

$ {\varDelta \theta = - \varepsilon \frac{{\mathit{\boldsymbol{\hat s}}}}{{\sqrt {\mathit{\boldsymbol{\hat r}}} + \delta }}} $ (10)

$ {\theta \leftarrow \theta + \varDelta \theta } $ (11)

式中,$θ$表示参数变量,$m$为样本总数,函数$f$表示网络的输出,$x_{i}$$y_{i}$分别代表函数$f$的输入和输出,$ρ_{1}$$ρ_{2}$为超参数。具体设置为$ρ_{1}=1.0$$ρ_{2}=0.999$$\varepsilon =0.001$$δ=10^{-8}$

2 数据集

2.1 数据及预处理

实验使用Kaggle(2015)和MESSIDON(French ministry,2014)数据集。Kaggle数据集来自免费提供视网膜筛选的平台EyePacs,由多家医院提供,包括了在各种成像条件下拍摄的几万幅高分辨率彩色眼底图像。MESSIDOR数据是来自法国国防研究部建立的眼底筛查项目,数据图像由3个不同的眼科机构提供。

根据病变程度将视网膜图像分为5类,如图 3所示,依次为健康、轻微病变、中度病变、严重病变和增生性病变的视网膜图像,但是某些图像中包含伪影、失焦以及曝光不足或过度等情况,这些质量差的无意义图像(如图 4所示)都会影响图像中的像素强度值,并产生与分类级别无关的不必要变化。因此,为了规范这些图像,减少冗余信息和环境伪影,对眼底图像进行预处理是必要的。

图 3 糖尿病视网膜图像
Fig. 3 Diabetic retina images ((a) no DR; (b) mild DR; (c) moderate DR; (d) sever DR; (e) proliferative DR)
图 4 糖尿病视网膜病变无意义图像
Fig. 4 Diabetic retinopathy meaningless images

2.2 归一化


$ \begin{array}{*{20}{l}} {{I_i}(x,y;\sigma ) = \alpha (I(x,y) - }\\ {G(x,y;\sigma ) * I(x,y)) + \beta } \end{array} $ (12)

式中,$G(x, y; σ)$代表高斯平滑函数,$σ$是标准差,用来评估图像的背景,以此来提高对比度,$β$是强度,增强后大多数图像的像素值在[0, 255]。在实验调试过程中这些参数的值设置为$α=4$$σ=r/30$$β=128$

处理前后的图像如图 5所示。由于原始数据的视网膜图像尺寸太大,按照网络训练图像的大小要求,将输入图像都设置为$456×456$像素。

图 5 预处理前后的糖尿病视网膜图像
Fig. 5 Diabetic retinal images before and after pretreatment ((a) unprocessed image; (b) preprocessed image)

2.3 数据增强

数据集中每种类别数据数量如表 1所示,可以看出,数据类别存在严重不均衡问题,无法高精度地解决医学图像的识别问题。因此,除了预处理步骤外,利用数据增强来大量增加训练数据,以提高模型的泛化性能。对每幅图像均随机旋转0~90°、随机水平和垂直翻转、随机水平和垂直移动,增加样本量少的类别数量,解决样本不均衡问题。

表 1 糖尿病视网膜病变分级数量
Table 1 Number of grades of diabetic retinopathy

类别 DR程度 数量/幅 所占比例/%
0 健康视网膜 25 810 73.5
1 轻微病变视网膜 2 443 7.0
2 中度病变视网膜 5 292 15.1
3 严重病变视网膜 873 2.5
4 增生性病变视网膜 708 2.0

3 实验结果与分析

3.1 实验设置

实验在Intel 2667V4CPU、32 GB内存、NVIDIA Tesla K80显卡的Ubuntu16.04服务器上进行调试,使用的模型框架为tensorflow,分类网络的参数配置如表 2所示。

表 2 分类网络的参数配置
Table 2 Parameter configuration of classification network

名称 类型 输出大小($H×W×C$)
Input input_3 456×456×3
Model efficientnet 15×15×2 048
BatchN batch_normalization_234 15×15×2 048
Dropout dropout_4 15×15×2 048
Conv2D conv2d_316 15×15×64
Conv2D conv2d_317 15×15×16
Conv2D conv2d_318 15×15×8
Conv2D conv2d_319 15×15×1
Conv2D conv2d_320 15×15×2 048
Multiply multiply_80 15×15×2 048
Glo global_average_pooling2d_3 1×1×2 048
Glo global_average_pooling2d_4 1×1×2 048
Lambda rescaleGAP 1×1×2 048
Dropout dropout_5 1×1×2 048
Dense dense_3 1×1×128
Dropout dropout_6 1×1×128
Dense dense_4 1×1×5

目前,糖尿病视网膜分类通常采用Inception V3网络(Zeng等,2019庞浩和王枞,2017Rakhlin,2017)进行数据训练。为了更全面地对比试验结果,本文设计了4组实验,分别利用InceptionV3和EfficientNet网络及InceptionV3和EfficientNet网络加注意力机制训练网络。将Kaggle数据集的35 126幅图像和MESSIDOR数据集的1 200幅图像(Decencière等,2014)用于训练,且每幅图像定义一个标签,按每幅图像的病变类型,选择{0, 1, 2, 3, 4}中的一个。将其中7 025幅图像作为验证集,5 096幅图像作为测试集,剩下的图像用作训练集。为了验证模型的有效性,对实验数据做了二分类实验,检测是否患病的有效性。

3.2 实验指标

实验选用特异性(specificity,SP)、敏感性(sensitivity,SE)、准确率(accuracy,AC)、混淆矩阵(confusion matrix)和kappa系数等指标对结果进行评价,计算为

$ {SP = \frac{{TN}}{{TN + FP}}} $ (13)

$ {SE = \frac{{TP}}{{TP + FN}}} $ (14)

$ {AC = \frac{{TP + TN}}{{TP + FP + FN + TN}}} $ (15)




$ \kappa = 1 - \frac{{\sum\limits_{i,j} {{w_{i,j}}} {o_{i,j}}}}{{\sum\limits_{i,j} {{w_{i,j}}} ((\sum\limits_j {{o_{i,j}}} ) \times (\sum\limits_i {{o_{i,j}}} ))/n}} $ (16)

式中,$\boldsymbol{o}$$5×5$矩阵;$n$表示样本个数;$o_{i, j}$代表真实的$i$类错分为$j$类的数量;$w_{i, j}$为加权系数,通过加权惩罚错分项来提高分类效果,加权系数计算为

$ {w_{i,j}} = \frac{{{{(i - j)}^2}}}{{{{(N - 1)}^2}}} $ (17)

3.3 实验结果

本文采用混淆矩阵对Softmax的分类结果进行统计评估,根据式(13)—(17)分别计算糖尿病视网膜病变的五分类和二分类的各项评估指标,得到的五分类的混淆矩阵如表 3所示。可以看出,大部分数据都集中在对角线上,二次加权kappa值为0.84,验证了模型分类的一致性。对于二分类任务,特异性为98.7%,敏感性为95.6%,准确率为97.2%。

表 3 融合注意力机制的EfficientNet分类混淆矩阵
Table 3 Classification confusion matrix incorporating EfficientNet and attention mechanism

真实标签 预测类别
0 1 2 3 4
0 4 535 240 312 3 6
1 117 302 59 3 7
2 287 201 449 70 51
3 4 3 32 90 45
4 0 1 41 16 83

3.4 实验对比分析

3.4.1 与Pratt等人(2016)的方法相比

在对分类的实验结果进行对比时,采用混淆矩阵的形式来说明。由于标签0数据较多,与标签0类数据相比,其他4类病变的数据较少,在数据增强后,将本文分类方法与Pratt等人(2016)利用CNN的方法对数据较少的4种病变的分类准确率进行比较,对比结果如表 4所示。可以看出,本文方法对4种病变的分类准确率都高于Pratt等人(2016)的方法。此外,采用本文方法得到的五分类混淆矩阵(如表 3所示)虽然存在类别之间的错分情况,但都是分成相邻的类别。采用Pratt等人(2016)方法产生的混淆矩阵如表 5所示,可以看出该方法将大部分的样本错分为正常类,这种情况在临床中会将患者诊断为正常,错过最佳的诊断机会,最终会导致失明。通过与Pratt等人的结果对比,验证了本文模型方法的优越性。

表 4 Pratt等人(2016)和本文方法对四分类DR的分类准确率
Table 4 Classification accuracy of methods of Pratt et al.(2016) and ours for four-class DR

方法 DR类别
1 2 3 4
Pratt等人(2016) 0 23.3 7.8 44.3
本文 61.8 42.4 51.7 58.8

表 5 Pratt等人(2016)方法产生的混淆矩阵
Table 5 Confusion matrix generated by method of Pratt et al.(2016)

真实标签 预测类别
0 1 2 3 4
0 3 456 0 145 1 34
1 344 0 27 0 1
2 543 0 179 5 40
3 40 0 63 10 15
4 28 0 23 3 43

3.4.2 与其他网络结构相比

EfficientNet加入注意力机制产生的分类结果的混淆矩阵如表 3所示。为进一步验证分类网络加入注意力机制后的效果,在InceptionV3网络中加入注意力机制,产生的分类结果的混淆矩阵如表 6所示。从表 3表 6可以看出,大部分数据均落在对角线上,验证了模型适用于DR图像分类任务。但具体到每一类验证时,可以看出本文提出的融合注意力机制的EfficientNet分类网络(A-EfficientNet)的效果明显优于融合注意力机制的Inception V3分类网络。

表 6 融合注意力机制的InceptionV3网络产生的混淆矩阵
Table 6 Confusion matrix generated by InceptionV3 network incorporating the attention mechanism

真实标签 预测类别
0 1 2 3 4
0 3 466 878 620 63 69
1 233 240 12 1 2
2 327 389 269 35 41
3 26 29 50 42 27
4 3 12 37 25 64

表 7是分别利用InceptionV3网络和EfficientNet及InceptionV3网络和EfficientNet加注意力机制训练网络的实验结果。可以看出,在加入注意力机制的情况下,InceptionV3和EfficientNet模型的准确率分别达到80.4%和97.2%。在不加入注意力机制的情况下,由于视网膜图像与自然图像之间特征差异大,虽然EfficientNet分类模型在0~2类的准确性高于InceptionV3网络分类模型,但是并不能有效提取DR图像中的特征信息,因为在彩色视网膜图像的3通道中,通常绿色通道分量图像中包含较多的生理结构以及病变信息,而自然图像中每个通道的分量相对比较平均,从而导致分类模型对其他类别的分类准确性低。因此,利用EfficientNet在深度、宽度、像素等方面进行特征提取,加上注意力机制对细粒度图像的关注,提高了分类模型对DR的分类效果。EfficientNet加入注意力机制后预测类别出现的概率及图像类别分类的效果如表 8图 6所示,可以看出,加入注意力机制后对每一个类别都有预测,真实标签与预测的类别概率基本一致。该分类模型明确表征出EfficientNet自适应优化网络结构的优势,体现出注意力机制对不同等级病变之间的有效特征提取与分类,表明了A-EfficientNet分类网络的优越性。

表 7 实验结果对比
Table 7 Comparison of experimental results

网络结构 准确率/% 特异性/% 敏感性/% kappa值
InceptionV3 78.3 88.9 78.2 0.722
InceptionV3+attention 80.4 90.7 83.5 0.782
EfficientNet 92.5 97.5 91.8 0.80
A-EfficientNet 97.2 98.7 95.6 0.84

表 8 EfficientNet加入注意力机制后预测类别出现的概率
Table 8 Probability of predicting categories after adding attention mechanism to EfficientNet

真实标签 预测类别/%
0 1 2 3 4
0 98.75 1.25 0 0 0
1 2.1 97.90 0 0 0
2 1.72 9.87 88.41 0 0
3 0.05 2.74 7.21 89.91 0.09
4 1.76 3.21 5.36 8.68 80.99
图 6 加入注意力机制后EfficientNet分类效果图
Fig. 6 EfficientNet classification effect maps after adding attention mechanism ((a) actual label 0; (b) actual label 1; (c) actual label 2; (d) actual label 3; (e) actual label 4)


4 结论



  • Casanova R, Saldana S, Chew E Y, Danis R P, Greven C M, Ambrosius W T. 2014. Application of random forests methods to diabetic retinopathy classification analyses. PLoS One, 9(6): e98587 [DOI:10.1371/journal.pone.0098587]
  • Decencière E, Zhang X W, Cazuguel G, Laÿ B, Cochener B, Trone C, Gain P, Ordóñez-Varela J R, Massin P, Erginay A, Charton B, Klein J C. 2014. Feedback on a publicly distributed image database:the Messidor database. Image Analysis and Stereology, 33(3): 231-234 [DOI:10.5566/ias.1155]
  • Doshi D, Shenoy A, Sidhpura D and Gharpure P. 2016. Diabetic retinopathy detection using deep convolutional neural networks//Proceedings of 2016 International Conference on Computing, Analytics and Security Trends (CAST). Pune, India: IEEE: 261-266[DOI: 10.1109/CAST.2016.7914977]
  • Flaxman S R, Bourne R R A, Resnikoff S, Ackland P, Braithwaite T, Cicinelli M V, Das A, Jonas J B, Keeffe J, Kempen J H, Leasher J, Limburg H, Naidoo K, Pesudovs K, Silvester A, Stevens G A, Tahhan N, Wong T Y, Zheng Y F. 2017. Global causes of blindness and distance vision impairment 1990-2020:a systematic review and meta-analysis. Lancet Global Health, 5(12): e1221-e1234 [DOI:10.1016/S2214-109X(17)30393-5]
  • Fleiss J L, Cohen J, Everitt B S. 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5): 323-327 [DOI:10.1037/h0028106]
  • French Ministry. 2014. Methods to evaluate segmentation and indexing techniques in the field of retinal ophthalmology[EB/OL].[2019-11-11]
  • Gargeya R, Leng T. 2017. Automated identification of diabetic retinopathy using deep learning. Ophthalmology, 124(7): 962-969 [DOI:10.1016/j.ophtha.2017.02.008]
  • Graham B. 2015. Kaggle Diabetic Retinopathy Detection Competition Report. Coventry University of Warwick
  • Gulshan V, Peng L, Coram M, Stumpe M C, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson P C, Mega J L, Webster D R. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22): 2402-2410 [DOI:10.1001/jama.2016.17216]
  • He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]
  • Huang Y P, Cheng Y L, Bapna A, Firat O, Chen M X, Chen D H and Lee H J, Ngiam J, Le Q V, Wu Y H, Chen Z F. 2018. GPipe: efficient training of giant neural networks using pipeline parallelism[EB/OL].[2019-11-11].
  • Kaggle. 2015. Diabetic retinopathy detection[EB/OL].[2019-11-11].
  • Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization[EB/OL].[2019-11-11].
  • Krizhevsky A, Sutskever I, Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI:10.1145/3065386]
  • Laud K, Shabto U and Tello C. 2017. Diabetic Retinopathy//Principles of Diabetes Mellitus. 2010: 331-346[DOI: 10.1007/978-3-319-18741-9_2-1]
  • Li Q, Bai Z Y, Liu Y F. 2018. Automated classification of diabetic retinal images by using deep learning method. Journal of Image and Graphics, 23(10): 1594-1603 (李琼, 柏正尧, 刘莹芳. 2018. 糖尿病性视网膜图像的深度学习分类方法. 中国图象图形学报, 23(10): 1594-1603) [DOI:10.11834/jig.170683]
  • Li X G, Pang T T, Xiong B, Liu W X, Liang P and Wang T F. 2017. Convolutional neural networks based transfer learning for diabetic retinopathy fundus image classification//Proceedings of the 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Shanghai, China: IEEE: 1-11[DOI: 10.1109/CISP-BMEI.2017.8301998]
  • Pan S J, Yang Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): 1345-1359 [DOI:10.1109/TKDE.2009.191]
  • Pang H, Wang C. 2017. Deep learning model for diabetic retinopathy detection. Journal of Software, 28(11): 3018-3029 (庞浩, 王枞. 2017. 用于糖尿病视网膜病变检测的深度学习模型. 软件学报, 28(11): 3018-3029) [DOI:10.13328/j.cnki.jos.005332]
  • Pratt H, Coenen F, Broadbent D M, Harding S P, Zheng Y L. 2016. Convolutional neural networks for diabetic retinopathy. Procedia Computer Science, 90: 200-205 [DOI:10.1016/j.procs.2016.07.014]
  • Rakhlin. 2017. Diabetic retinopathy detection through integration of deep learning classification framework[EB/OL].[2019-11-11].
  • Shahin E M, Taha T E, Al-Nuaimy W, El Rabaie S, Zahran O F and El-Samie F E A. 2012. Automated detection of diabetic retinopathy in blurred digital fundus images//Proceedings of the 8th International Computer Engineering Conference (ICE-NCO). Cairo, Egypt: IEEE: 20-25[DOI: 10.1109/ICENCO.2012.6487084]
  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014. Dropout:a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): 1929-1958
  • Tan M X and Le Q V. 2019. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL].[2019-11-09].
  • Voets M, Møllersen K, Bongo L A. 2018. Reproduction study using public data of:development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. PLoS One, 14(6): e0217541 [DOI:10.1371/journal.pone.0217541]
  • Wang Y, Wang G A, Fan W G and Li J X. 2018. A deep learning based pipeline for image grading of diabetic retinopathy//Proceedings of 2018 International Conference on Smart Health. Wuhan, China: Springer: 240-248[DOI: 10.1007/978-3-030-03649-2_24]
  • Zagoruyko S and Komodakis N. 2016. Wide residual networks//Proceedings of the British Machine Vision Conference. York: BMVA Press: #87[DOI: 10.5244/C.30.87]
  • Zeng X L, Chen H Q, Luo Y, Ye W B. 2019. Automated diabetic retinopathy detection based on binocular siamese-like convolutional neural network. IEEE Access, 7: 30744-30753 [DOI:10.1109/ACCESS.2019.2903171]
  • Zhou K, Gu Z W, Liu W, Luo W X, Cheng J, Gao S H and Liu J. 2018. Multi-cell multi-task convolutional neural networks for diabetic retinopathy grading//Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Honolulu, HI, USA: IEEE: 2724-2727[DOI: 10.1109/EMBC.2018.8512828]