Current Issue Cover

黄颖1, 程彬2, 房少杰3, 刘歆3(1.重庆邮电大学;2.重庆邮电大学计算机科学与技术学院;3.重庆邮电大学软件工程学院)

摘 要
目的 现有的阴影去除方法通常依赖于像素级重建,旨在学习阴影图像和无阴影图像之间的确定性映射关系。然而阴影去除关注阴影区域的局部恢复,容易导致在去除阴影的同时破坏了非阴影区域。此外,现有的大多数扩散模型在恢复图像时存在耗时和对分辨率敏感等问题。为此,提出了一种用于阴影去除的小波非均匀扩散模型。方法 首先将图像通过小波分解为低频分量与高频分量,然后针对低频和高频分量分别设计扩散生成网络来重建无阴影图像的小波域分布,并分别恢复这些分量中的各种退化信息,如低频(颜色、亮度)和高频细节等。结果 实验在3个阴影数据集上进行训练和测试,在SRD(shadow removal dataset)数据集中,与9种代表性方法进行比较,在非阴影区域和整幅图像上,PSNR(peak signal-to-noise ratio)、SSIM(structural similarity index)、RMSE(root mean square error)均取得最优或次优的结果;在ISTD+(augmented dataset with image shadow triplets)数据集中,与6种代表性方法进行比较,在非阴影区域上,性能取得了最佳,PSNR和RMSE分别提高了0.47dB和0.1。除此之外,在SRD数据集上,ShadowDiffusion方法在生成不同分辨率图像时性能有明显差异,而本文方法性能基本保持稳定。此外,本文方法生成速度与其相比减少了约4倍。结论 提出的方法能够加快扩散模型的采样时间,在去除阴影的同时,恢复出阴影区域缺失的颜色、亮度和丰富的细节等信息。
Shadow removal with wavelet-based non-uniform diffusion model

(Chongqing University of Posts and Telecommunications)

Objective Shadows are a common occurrence in optical images captured under partial or complete obstruction of light. In such images, shadow regions typically exhibit various forms of degradation, such as low contrast, color distortion, and loss of scene structure. Shadows not only impact the visual perception of humans but also impose constraints on the implementation of numerous sophisticated computer vision algorithms. Shadow removal can assist in many computer vision tasks. It aims to enhance the visibility of shadow regions in images and achieve consistent illumination distribution between shadow and non-shadow regions. Currently, deep learning-based shadow removal methods can be roughly divided into two categories. One typically utilizes deep learning to minimize the pixel-level differences between shadow regions and their corresponding non-shadow regions, aiming to learn deterministic mapping relationships between shadow and non-shadow images. However, the primary focus of shadow removal lies in locally restoring shadow regions, often overlooking the essential constraints required for effectively restoring boundaries between shadow and non-shadow regions. As a result, there are discrepancies in brightness between the restored shadow and non-shadow areas, along with the emergence of artifacts along the boundaries. Another approach involves using image generation models to directly model the complex distribution of shadow-free images, avoiding the direct learning of pixel-level mapping relationships, and treating shadow removal as a conditional generation task. While diffusion models have garnered significant attention due to their powerful generation capabilities, most existing diffusion generation models suffer from issues such as time-consuming image restoration and sensitivity to resolution when recovering images. Inspired by these challenges, a wavelet non-uniform diffusion model (WNDM) has been proposed, which combines the advantages of wavelet decomposition and the generation ability of diffusion models to solve the above problems. Method Firstly, the image is decomposed into low-frequency and high-frequency components via wavelet decomposition. Then, diffusion generation networks are designed separately for low-frequency and high-frequency components to reconstruct the wavelet domain distribution of shadow-free images and restore various degraded information within these components, such as low-frequency (color, brightness) and high-frequency details. Since wavelet transform can decompose the image into high-frequency and low-frequency images without sacrificing information, and the spatial size of the decomposed images is halved, modeling diffusion in the wavelet domain not only greatly accelerates model inference but also captures information that may be lost in the pixel domain. Furthermore, considering the complexity of the distribution of low-frequency and high-frequency components and their sensitivity to noise, for example, high-frequency components exhibit sparsity, making it easier for neural networks to learn their features. Hence, this study devises two separate adaptive diffusion noise scheduling tables tailored for low-frequency and high-frequency components. The branch for low-frequency diffusion adjustment independently fine-tunes the low-frequency information within shadow images, whereas the branch for high-frequency diffusion adjustment independently refines the high-frequency information within shadow images, resulting in the generation of more precise low-frequency and high-frequency images, respectively. Additionally, to streamline model complexity and optimize computational resources, the low-frequency and high-frequency diffusion adjustment branches are consolidated to share a denoising network. The difference lies in the design of two prediction branches in the final layer of this network. These branches consist of several stacked convolution blocks, each predicting the low-frequency and high-frequency components of the shadow-free image, respectively. Finally, high-quality shadow-free images are reconstructed using inverse wavelet transform. Result The experiments were conducted on three shadow removal datasets for training and testing. On the SRD (shadow removal dataset) dataset, comparisons were made with nine state-of-the-art shadow removal algorithms. It achieved the best or second-best results in terms of PSNR (peak signal-to-noise ratio), SSIM (structural similarity index), and RMSE (root mean square error) in both non-shadow regions and the entire image. On the ISTD (dataset with image shadow triplets) dataset, the performance was the best in non-shadow regions, with improvements of 0.36 dB in PSNR, 0.004 in SSIM, and 0.04 in RMSE compared to the second-best model. It ranked second in performance across all metrics for the entire image. On the ISTD+ (augmented dataset with image shadow triplets) dataset, compared with six state-of-the-art shadow removal algorithms, the performance was the best in non-shadow regions, with improvements of 0.47 dB in PSNR and 0.1 in RMSE. Additionally, regarding the advanced shadow removal diffusion model, ShadowDiffusion, on the SRD dataset, when generating images of 256×256 resolution, the RMSE for the entire image was 3.63. However, when generating images of the original resolution of 840×640, there was a significant performance drop, with RMSE increasing to 7.19. By contrast, our approach yielded RMSE values of 3.80 and 4.06 for images of dimensions 256×256 and 840×640, respectively, showcasing consistent performance. Additionally, the time required to generate a single original image of size 840×640 was reduced by roughly fourfold compared to ShadowDiffusion. Furthermore, our method was expanded to address image raindrop removal tasks, delivering competitive results on the RainDrop dataset. Conclusion In this paper, the proposed method accelerates the sampling time of the diffusion model. While removing shadows, it restores missing color, brightness, and rich details in shadow regions. It treats shadow removal as an image generation task in the wavelet domain and designs two adaptive diffusion flows for the low-frequency and high-frequency components of the image wavelet domain to address the degradation of low-frequency (color, brightness) and high-frequency detail information caused by shadow images. Benefiting from the frequency decomposition of the wavelet transform, WNDM does not learn from the entangled pixel space domain but effectively separates and trains them separately, thereby generating more refined low-frequency and high-frequency information for reconstructing the final image. Extensive experiments on multiple datasets demonstrate the effectiveness of WNDM, achieving competitive results compared to state-of-the-art methods.