基于卷积字典扩散模型的眼底图像增强算法

王珍; 霍光磊; 兰海; 胡建民; 魏宪

发布时间： 2024-01-29
摘要点击次数： 703
全文下载次数： 486
DOI: :10.11834/jig.230595
| Volume | Number

基于卷积字典扩散模型的眼底图像增强算法

王珍¹, 霍光磊², 兰海³, 胡建民⁴, 魏宪⁵(1.福建农林大学机电工程学院，中国科学院福建物质结构研究所;2.泉州通维科技有限责任公司;3.中国科学院福建物质结构研究所;4.福建医科大学医学技术与工程学院;5.华东师范大学软件工程学院)

摘要

目的视网膜眼底图像被广泛用于临床筛查和诊断眼科疾病，然而，由于散焦，光线条件不佳等引起的眼底图像模糊，导致医生无法正确诊断，且现有图像增强方法恢复的图像仍存在模糊、高频信息缺失、噪点增多问题。本文提出了一个卷积字典扩散模型，它将卷积字典学习的去噪能力与条件扩散模型的灵活性相结合，从而解决了上述问题。方法算法主要包括两个过程：扩散过程和去噪过程。首先向输入图像中逐步添加随机噪声，得到趋于纯粹噪声的图像；然后训练一个神经网络去逐渐的将噪声从图像中移除，直到最终获得一张清晰的图像。本文利用卷积网络来实现卷积字典学习并获取图像稀疏表示，该算法充分利用图像的先验信息，有效避免重建图像高频信息缺失和噪点增多的问题。结果将本文所提模型算法在EyePACS数据集上进行训练，并分别在合成数据集DRIVE (dgital retinal images for vessel extraction)、CHASEDB1（child heart and health study in england）、ROC（retinopathy online challenge）和真实数据集RF（real fundus）、HRF（high-resolution fundus）上进行测试，验证了所提方法在图像增强任务上的性能及跨数据集的泛化能力，其评价指标峰值信噪比（peak signal-to-noise ratio，PSNR）和学习感知图像块相似度(learned perceptual image patch similarity，LPIPS)与原始扩散模型（learning enhancement from degradation，Led）相比平均分别提升了1.9929和0.0289。此外，将本文所提方法用于真实眼科图像下游任务的前处理能够有效提升下游任务的表现，在含有分割标签的DRIVE数据集上进行的视网血管分割实验结果显示，相较于原始扩散模型，其分割指标对比其受试者工作特征曲线下面积（area under the curve，AUC），准确率（accuracy，Acc）和敏感性（sensitivity，Sen）平均分别提升0.0314，0.0030和0.0738。结论提出的方法能够在保留真实眼底特征的同时去除模糊、恢复更丰富的细节，从而有利于临床图像的分析和应用。

关键词

眼底图像增强卷积字典学习稀疏表示扩散模型条件扩散模型

Fundus Image Enhancement Algorithm Based on Convolutional Dictionary Diffusion Modeling

(Quanzhou Tongwei Technology Co., Ltd)

Abstract

Objective Retinal fundus images have important clinical applications in ophthalmology, which can be used to screen and diagnose a variety of ophthalmic diseases, such as diabetic retinopathy, macular degeneration, and glaucoma. However, the acquisition of these images is often affected by various factors in real scenarios, including lens defocus, poor ambient light conditions, patient eye movements, and camera performance. These issues often lead to quality problems such as blurriness, unclear details, and inevitablenoise in fundus images. Such poor quality images pose a challenge to ophthalmologists in their diagnostic work. For example,blurred images will lead to the absence of detailed information about the morphological structure of the retina,which makes the physicians difficult to accurately localize and identify abnormalities, lesions, exudations, and other conditions. Existing fundus image enhancement methods have made some progress in improving image quality, but there are still some problems such as image blurring, artifacts, missing high-frequency information and increased noise. Therefore, in this paper, we propose a convolutional dictionary diffusion model which combines convolutional dictionary learning with conditional diffusion model. The aim of this algorithm is to cope with the above mentioned problems of low quality images to provide an effective tool for fundus images enhancement. With our approach, the quality of fundus images can be improved and enable physicians to increase diagnostic confidence, improve assessment accuracy, monitor treatment progress, and ensure the better care for patients. This will contribute to ophthalmic research and provide more opportunities for prospective healthcare management and medical intervention, which provides positively impacting patients’ ocular health and overall quality of life. Method The algorithm consists of two parts: simulation of diffusion process and inverse denoising process. First, random noise is gradually added to the input image to get a purely noisy image; then a neural network is trained to gradually remove the noise from the image until a clear image is finally obtained. In order to better preserve the fine-grained structure of the image, this paper takes the blurred fundus image as the conditional information. While there is some difficulty to collect the blurred-clear fundus image pair, synthetic fundus dataset is widely used for training. Therefore, a Gaussian filtering algorithm is designed to simulate the defocus blur images. In the training process, the conditional information and the noisy image are firstly spliced and fed into the network, and the abstract features of the image are extracted by continuously reducing the image size through downsampling. This can significantly reduce the time and space complexity of the sparse representation calculating. Then the convolutional network is used to implement convolutional dictionary learning and obtain the sparse representation of the image. Since self-attention can capture non-local similarity and long-range dependency, this paper adds self-attention to the convolutional dictionary learning module to improve the reconstruction quality. Finally, hierarchical feature extraction is achieved by feature concatenating to realize the information fusion between different levels and better use of local features in the image. The downsampled feature is recoverd to the original image size by an inverse convolutional layer. The model minimizes the negative log-likelihood loss which represents the difference in probability distribution between the generated image and the original image. After the model is trained, a clear fundus image is generated by gradually removing the noise from a noisy picture with a blurred image as conditional input. Results The proposed method was evaluated on EyePACS dataset and multiple experiments were performed on synthetic datasets DRIVE (dgital retinal images for vessel extraction), CHASEDB1 (child heart and health study in england), ROC (retinopathy online challenge) and realistic datasets RF (real fundus) and HRF (high-resolution fundus) to demonstrate the generalizability of our model. The experiment results showthat the evaluation metrics peak signal-to-noise ratio (PSNR) and learning-perception peak signal-to-noise ratio (PSNR) and learned perceptual image patch similarity (LPIPS) are improved on average by 1.9929 and 0.0289, respectively, compared with the original diffusion model (learning enhancement from degradation (Led)). Moreover, the proposed approach was used as a pre-processing module for downstream tasks. The experiment on retinal vessel segmentation is adopted to prove that our approach can benefit the downstream tasks in clinical application. The results of segmentation experiments on the DRIVE dataset show that all the segmentation metrics,are getting improved compared to the original diffusion model. Specifically, the area under the curve (AUC), accuracy (Acc) and sensitivity (Sen) were improved by 0.0314, 0.0030 and 0.0738 on average, respectively. Conclusion The proposed method provides a practical tool for fundus image deblurring and a new perspective to improve the quality and accuracy of diagnostic. This has a positive impact on both patients and ophthalmologists and is expected to promote further development in the interdisciplinary researchof ophthalmology and computer science.

Keywords

Fundus image enhancement convolutional dictionary learning sparse representation diffusion model conditional diffusion model