Current Issue Cover
多重注意力和级联上下文糖网病病灶分割

郭燕飞, 杜杭丽, 杨成龙, 孔祥真(曲阜师范大学计算机学院)

摘 要
目的 糖尿病视网膜病变(糖网病)(diabetic retinopathy, DR)是人类致盲的首要杀手,因此自动准确的糖网病病灶分割对于糖网病分级和诊疗至关重要。然而,不同类型的糖网病病灶结构复杂,大小尺度不一致且存在类间相似性和类内差异性,导致同时准确分割多种病灶充满挑战。针对上述问题,本文提出一种基于多重注意力和级联上下文融合的糖网病多类型病灶分割方法。方法 首先,三重注意力模块提取病灶的通道注意力、空间注意力和像素点注意力特征并进行加法融合以保证病灶特征的一致性。另外,级联上下文特征融合模块采用自适应平均池化和非局部操作提取不同层网络的全局上下文信息以扩大病灶的感受野。最后,平衡注意力模块计算病灶前景、背景和边界注意力图,并利用挤压激励模块在特征通道之间加权以重新平衡三个区域的注意力,令网络更多关注病灶的边缘细节实现精细化分割。结果 在国际公开的糖网病图像数据集DDR、IDRiD和E-Ophtha上进行充分的对比实验和消融实验,四种病灶分割的平均AUC(area under curve)分别达到0.6790、0.7503和0.6601。结论 基于多重注意力和级联上下文融合的糖网病分割方法(MCFNet)能够克服其他眼底组织和病灶噪声的不良干扰,同时实现糖网病四种病灶的精准分割,具有较好的准确性和鲁棒性,为临床医生进行糖网病诊疗提供有力支持。
关键词
MCFNet: multi-attention and cascaded context fusion network for multiple lesion segmentation of diabetic retinopathy images

Guo Yanfei, Du Hangli, Yang Chenglong, Kong Xiangzhen(College of computer science and engineering, Qufu Normal University)

Abstract
Objective Diabetic retinopathy (DR) is a leading cause of blindness in humans, and regular screening is helpful to early detection and containment of DR. Therefore, automated and accurate lesion segmentation is crucial for DR grading and diagnosis. However, it is full of challenges due to complex structures, inconsistent scales and blurry edges of different kinds of lesions. However, manual segmentation of DR lesions is time-consuming and labor-intensive, which makes it difficult for large-scale popularization due to the limited doctor resources and high cost of manual annotation. Therefore, it is crucial to develop an automatic DR lesion segmentation method to reduce clinical workload and increase efficiency. Recently, Convolutional Neural Networks (CNNs) have been widely applied in the field of medical image segmentation and disease classification. Existing deep learning-based methods for DR lesion segmentation are classified into image-based and patch-based approaches. Some studies adopted attention mechanism to segment lesions using the whole fundus image as input. However, these methods may lose edge details of lesions, which makes getting fine-grained lesion segmentation results difficult. Other works cropped the original images to patches and input them into the encoder-decoder networks for DR lesion segmentation. However, most of the approaches utilize fixed weights to fuse coding features at different levels, ignoring the information differences between them, which is hard to effectively integrate multi-level features for accurate lesion segmentation. To address the above issues, this paper proposes a multi-attention and cascaded context fusion network (MCFNet) for simultaneous segmentation of multiple lesions. Method The proposed network adopts an encoder-decoder framework, including VGG16 backbone network, triple attention module (TAM), cascaded context fusion module (CFM) and balanced attention module (BAM). Firstly, directly fusing multi-level features from different stages of the encoder easily results in inconsistent feature scales and information redundancy. Dynamically selecting important information from multi-resolution feature maps not only preserves contextual information in low-resolution feature maps, but also effectively reduces background noise interference in high-resolution feature maps. TAM is proposed to extract three types of attention features, i.e., channel attention, spatial attention and pixel-point attention. Firstly, the channel attention assigns different weights to different feature channels, which enables the selection of specific feature patterns for lesion segmentation. Moreover, the spatial attention highlights the location information of lesion in the feature map, which makes our model pay more attention to lesion areas. Lastly, the pixel-point attention mechanism is used to extract small-scale lesion features. TAM ensures feature consistency and selectivity by learning and fusing these attention features. In addition, traditional receptive field ranges make it hard to fully capture the subtle features due to small size of lesions. We propose the CFM that captures global context information at different levels and performs summation with local context information from TAM. The module is designed to expand the scope of multi-scale receptive fields, which helps to improve the accuracy and robustness of small-scale lesion segmentation. Furthermore, we propose the balanced attention module (BAM) to solve the issue of rough and inconspicuous lesion edges. It calculates the foreground, background and boundary attention map, respectively, to reduce the adverse interference of the background noise and make the lesion contour more clearly. Result We quantitatively compared the lesion segmentation performance with current methods on the IDRiD, DDR and E-Ophtha datasets. The experimental results show that, despite the variations in the number and appearance of retinal images from different countries and ethnicities, the proposed model outperforms the state-of-the-arts and has better accuracy and robustness. Specifically, in IDRiD dataset, MCFNet achieves AUC values of 0.9171, 0.7197, 0.6557 and 0.7087 for lesion segmentation in the EX, HE, MA and SE on IDRiD, respectively. The mAUC, mIOU and mDice of four kinds of lesions on IDRiD dataset reaches 0.7503, 0.6387 and 0.7003, respectively. In DDR dataset, the proposed model achieves mAUC, mIOU and mDice of 0.6790, 0.4347 and 0.5989, respectively. When compared to PSPNet, the mAUC, mIOU and mDice of our model improved by 52.7%, 18.63% and 33.06%, respectively. In E-Ophtha dataset, the proposed MCFNet achieves mAUC, mIOU and mDice of 0.6601, 0.4495 and 0.6285, respectively. When compared to MLSF-Net, the mAUC, mIOU and mDice of our model improved by 15.11%, 4.06% and 20.68%, respectively. Moreover, we qualitatively compare our proposed model with other methods to visually compare the segmentation performance of different models. Compared with the other methods, the segmentation results of our method are closer to the Ground truth, and the edges are finer and more accurate. Furthermore, to verify the effectiveness of the proposed TAM, CFM and BAM, we performed comprehensive ablation experiments on three datasets of IDRiD, DDR, and E-Ophtha. The proposed model achieves mAUC, mIOU and mDice of 0.5975, 0.4512 and 0.5848 on IDRiD when only using baseline. The fusion of VGG16 with TAM, CFM and BAM achieves the best segmentation results for all four types of multi-scale lesions. It demonstrates that the proposed modules contribute to improving the multiple lesion segmentation performance in various degrees. Conclusion This paper proposes a multi-attention and cascaded context fusion network for the multiple lesion segmentation of diabetic retinopathy images. The proposed MCFNet introduces TAM to learn and fuse channel attention, spatial attention and pixel-point attention features to ensure feature consistency and selectivity. CFM utilizes adaptive average pooling and non-local operation to capture local and global contextual features for concatenation fusion to expand the receptive field of fundus lesions. BAM calculates attention maps for foreground, background and lesion contours and uses Squeeze-and-Excitation modules to rebalance the attention features of these regions, which preserves edge details and reduces interference from background noise. Experimental results on the IDRiD, DDR, and E-Ophtha datasets demonstrate the superiority of our proposed method compared to the state-of-arts. It effectively overcomes the interference of background and other lesion noises, which achieves accurate segmentation of different types of multi-scale lesions.
Keywords

订阅号|日报