Current Issue Cover
融合残差注意力机制的UNet视盘分割

侯向丹1,2, 赵一浩1,2, 刘洪普1,2, 郭鸿湧2, 于习欣2, 丁梦园2(1.河北工业大学人工智能与数据科学学院, 天津 300401;2.河北省大数据计算重点实验室, 天津 300401)

摘 要
目的 青光眼和病理性近视等会对人的视力造成不可逆的损害,早期的眼科疾病诊断能够大大降低发病率。由于眼底图像的复杂性,视盘分割很容易受到血管和病变等区域的影响,导致传统方法不能精确地分割出视盘。针对这一问题,提出了一种基于深度学习的视盘分割方法RA-UNet(residual attention UNet),提高了视盘分割精度,实现了自动、端到端的分割。方法 在原始UNet基础上进行了改进。使用融合注意力机制的ResNet34作为下采样层来增强图像特征提取能力,加载预训练权重,有助于解决训练样本少导致的过拟合问题。注意力机制可以引入全局上下文信息,增强有用特征并抑制无用特征响应。修改UNet的上采样层,降低模型参数量,帮助模型训练。对网络输出的分割图进行后处理,消除错误样本。同时,使用DiceLoss损失函数替代普通的交叉熵损失函数来优化网络参数。结果 在4个数据集上分别与其他方法进行比较,在RIM-ONE(retinal image database for optic nerve evaluation)-R1数据集中,F分数和重叠率分别为0.957 4和0.918 2,比UNet分别提高了2.89%和5.17%;在RIM-ONE-R3数据集中,F分数和重叠率分别为0.969和0.939 8,比UNet分别提高了1.5%和2.78%;在Drishti-GS1数据集中,F分数和重叠率分别为0.966 2和0.934 5,比UNet分别提高了1.65%和3.04%;在iChallenge-PM病理性近视挑战赛数据集中,F分数和重叠率分别为0.942 4和0.891 1,分别比UNet提高了3.59%和6.22%。同时还在RIM-ONE-R1和Drishti-GS1中进行了消融实验,验证了改进算法中各个模块均有助于提升视盘分割效果。结论 提出的RA-UNet,提升了视盘分割精度,对有病变区域的图像也有良好的视盘分割性能,同时具有良好的泛化性能。
关键词
Optic disk segmentation by combining UNet and residual attention mechanism

Hou Xiangdan1,2, Zhao Yihao1,2, Liu Hongpu1,2, Guo Hongyong2, Yu Xixin2, Ding Mengyuan2(1.School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;2.Hebei Provincial Key Laboratory of Big Data Computing, Tianjin 300401, China)

Abstract
Objective Glaucoma and pathologic myopia are two important causes of irreversible damage to vision. The early detection of these diseases is crucial for subsequent treatment. The optic disk, which is the starting point of blood vessel convergence, is approximately elliptical in normal fundus images. An accurate and automatic segmentation of the optic disk from fundus images is a basic task. Doctors often diagnose eye diseases on the basis of the colored fundus images of patients. Browsing the images repeatedly to make appropriate diagnoses is a tedious and arduous task for doctors. Doctors are likely to miss some subtle changes in the image when they are tired, resulting in missed diagnoses. Therefore, using computers to segment optic disks automatically can help doctors in the diagnosis of these diseases. Glaucoma, pathologic myopia, and other eye diseases can be reflected by the shape of the optic disk; thus, an accurate segmentation of the optic disk can assist doctors in diagnosis. However, achieving an accurate segmentation of optic disks is challenging due to the complexity of fundus images. Many existing methods based on deep learning are susceptible to pathologic regions. UNet has been widely used in medical image segmentation tasks; however, it performs poorly in optic disk segmentation. Convolution is the core of convolutional neural networks. The importance of information contained in different spatial locations and channels varies. Attention mechanisms have received increasing attention over the past few years. In this study, we present a new automatic optic disk segmentation network based on UNet to improve segmentation accuracy. Method According to the design idea of UNet, the proposed model consists of an encoder and a decoder, which can achieve end-to-end training. The ability of the encoder to extract discriminative representations directly affects the segmentation performance. Achieving pixel-wise label data is expensive, especially in the field of medical image analysis; thus, transfer learning is adopted to train the model. Given that ResNet has a strong feature extraction capability, the encoder adopts a modified and pretrained ResNet34 as the backbone to achieve hierarchical features and then integrates a squeeze-and-excitation (SE) block into appropriate positions to enhance the performance further. The final average pooling layer and the fully connected layer of ResNet34 are removed, but the rest are kept. The SE block can boost feature discriminability, which includes SE operations. The SE block can model the relationship between different feature map channels to recalibrate channel-wise feature responses adaptively. In the encoder, all modules, except for four SE blocks, use the pretrained weights on ImageNet (ImageNet Large-Scale Visual Recognition Challenge) as initialization, thereby speeding up convergence and preventing overfitting. The input images are downsampled for a total of five times to extract abstract semantic features. In the decoder, 2×2 deconvolution with stride 2 is used for upsampling. Five upsampling operations are conducted. In contrast to the original UNet decoder, each deconvolution, except for the last one, outputs a feature map of 128 channels, thus reducing model parameters. The shallow feature map preserves more detailed spatial information, whereas the deep feature map has more high-level semantic information. A set of downsampling layers enlarges the receptive field of the network but causes a loss of detailed location information. The skip connection between the encoder and decoder can combine high-level semantic information with low-level detailed information for fine-grained segmentation. The feature map in the encoder first goes through a 1×1 convolution layer, and then the output of 1×1 convolution is concatenated with the corresponding feature map in the decoder. Using skip connection is crucial in restoring image details in the decoder layers. Lastly, the network outputs a two-channel probability map for the background and the optic disk; this map has the same size as the input image. The network utilizes the last deconvolution with two output channels, followed by SoftMax activation, to generate the final probability map of the background and the optic disk simultaneously. The segmentation map predicted by the network is rough; thus, postprocessing is used to reduce false positives. In addition, DiceLoss is used to replace the traditional cross entropy loss function. Considering that the training images are limited, we first perform data augmentation, including random horizontal, vertical, and diagonal flips, to prevent overfitting. An NVidia GeForce GTX 1080Ti device is used to accelerate network training. We adopt Adam optimization with an initial learning rate of 0.001. Result To verify the effectiveness of our method, we conduct experiments on four public datasets, namely, RIM-ONE (retinal image database for optic nerve evaluation)-R1, ONE-R1, RIM-ONE-R3, Drishti-GS1, and iChallenge-PM. Two evaluation metrics, namely, F score and overlap rate, are computed. We also provide some segmentation results to compare different methods visually. The extensive experiments demonstrate that our method outperforms several other deep learning-based methods, such as UNet, DRIU, DeepDisc, and CE-Net, on four public datasets. In addition, the visual segmentation results produced by our method are more similar to the ground truth label. Compared with the UNet results in RIM-ONE-R1, RIM-ONE-R3, Drishti-GS1, and iChallenge-PM, the F score (higher is better) increases by 2.89%, 1.5%, 1.65%, and 3.59%, and the overlap rate (higher is better) increases by 5.17%, 2.78%, 3.04%, and 6.22%, respectively. Compared with the DRIU results in RIM-ONE-R1, RIM-ONE-R3, Drishti-GS1, and iChallenge-PM, the F score (higher is better) increases by 1.89%, 1.85%, 1.14%, and 2.01%, and the overlap rate (higher is better) increases by 3.41%, 3.42%, 2.1%, and 3.53%, respec tively. Compared with the DeepDisc results in RIM-ONE-R1, RIM-ONE-R3, Drishti-GS1, and iChallenge-PM, the F score (higher is better) increases by 0.24%, 0.01%, 0.18%, and 1.44%, and the overlap rate (higher is better) increases by 0.42%, 0.01%, 0.33%, and 2.55%, respectively. Compared with the CE-Net results in RIM-ONE-R1, RIM-ONE-R3, Drishti-GS1, and iChallenge-PM, the F score (higher is better) increases by 0.42%, 0.2%, 0.43%, and 1.07%, and the overlap rate (higher is better) increases by 0.77%, 0.36%, 0.79%, and 1.89% respectively. We also conduct ablation experiments on RIM-ONE-R1 and Drishti-GS1. Results demonstrate the effectiveness of each part of our algorithm. Conclusion In this study, we propose a new end-to-end convolutional network model based on UNet and apply it to the optic disk segmentation problem in practical medical image analysis. The extensive experiments prove that our method outperforms other state-of-the-art deep learning-based optic disk segmentation approaches and has excellent generalization performance. In our future work, we intend to introduce some recent loss functions, focusing on the segmentation of the optic disk boundary.
Keywords

订阅号|日报