Image deblurring network combining mamba and snake-like convolution

Qiu Yunfei; Liu Zeyan; Wang Maohua

doi:10.11834/jig.240618

Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Image deblurring network combining mamba and snake-like convolution
Pages: 1-12(2024)
Published Online： 30 December 2024 ，
DOI： 10.11834/jig.240618
稿件说明：

移动端阅览

邱云飞,刘则延,王茂华.融合Mamba与蛇形卷积的图像去模糊网络[J].中国图象图形学报,

Qiu Yunfei,Liu Zeyan,Wang Maohua.Image deblurring network combining mamba and snake-like convolution[J].Journal of Image and Graphics,
邱云飞,刘则延,王茂华.融合Mamba与蛇形卷积的图像去模糊网络[J].中国图象图形学报, DOI： 10.11834/jig.240618.

Qiu Yunfei,Liu Zeyan,Wang Maohua.Image deblurring network combining mamba and snake-like convolution[J].Journal of Image and Graphics, DOI： 10.11834/jig.240618.

摘要

目的

针对Transformer在图像去模糊过程中难以精确恢复图像细节的问题。提出了一种结合Mamba模型与蛇形卷积技术的图像去模糊网络（Mamba Snake Convolution Network，MSNet）。

方法

首先，结合Mamba框架与蛇形卷积，提出蛇形状态空间模块（Snake State-Space Module，SSSM）。SSSM通过调整卷积核的形状和路径，动态适应图像局部特征并调整卷积方向，以对齐不同的模糊条纹模式。其次，使用多方向扫描模块（Direction scan module，DSM）进行多个方向的扫描，捕捉图像中的长期依赖。再利用离散状态空间方程合并多方向的结构信息，增强模型对全局结构的捕捉能力。最后，引入蛇形通道注意力（Snake Channel Attention，SCA），利用门控设计筛选和调整模糊信息的权重，确保在去除模糊的同时保留关键细节。

结果

实验在GoPro和HIDE数据集上，与主流的CNN（convolutional neural networks）和Transformer去模糊方法相比，MSNet的峰值信噪比（peak signal to noise ratio，PSNR）分别提升了1.2和1.9个百分点，结构相似性（structural similarity，SSIM）分别提升了0.6和0.7个百分点。

结论

本文所提出的方法可以有效去除图像模糊并恢复细节。

Abstract

Objective

Traditional image deblurring methods， such as those based on Convolutional Neural Networks （CNNs） and Transformers， have achieved substantial advancements in improving deblurring performance. Despite these achievements， these methods are still constrained by high computational demands and limitations in restoring intricate image details. In complex conditions involving motion blur or high-frequency details， existing approaches often rely on fixed convolution kernels or global self-attention mechanisms. Such static designs lack the adaptability to handle diverse types of blur effectively， leading to suboptimal detail recovery and inadequate reconstruction of global image structures. Moreover， Transformer-based deblurring methods frequently require extensive computational resources， which significantly diminishes their feasibility for deployment on mobile devices or embedded systems. These resource constraints not only restrict their applicability in practical scenarios but also impede their broader adoption in real-world applications. To address these challenges， this study proposes a novel image deblurring method， termed MSNet. By integrating the efficient state-space modeling capabilities of the Mamba framework with snake convolution techniques， MSNet leverages the complementary strengths of these innovations. This approach aims to reduce computational overhead while achieving high-fidelity recovery of fine image details and structural information. With its enhanced adaptability and efficiency， MSNet is better suited for practical applications， offering robust performance in tackling complex deblurring tasks across diverse scenarios.

Method

To achieve the objective， the MSNet network integrates three key modules： the Snake State-Space Module （SSSM）， the Directional Scanning Module （DSM）， and the Snake Channel Attention Module （SCA）. Each module is designed for a specific purpose， and together， they effectively tackle both local detail recovery and global structure restoration. The SSSM combines the Mamba framework with snake convolution （SConv） technology， aiming to enhance the model's ability to capture subtle blur features. Unlike traditional CNN-based methods relying on fixed convolution kernels， SSSM dynamically adjusts the shape and path of the convolutional kernels， allowing them to adapt to local image features and blur stripe patterns. Snake convolution alters the convolution path to effectively capture local blur features. Moreover， the Mamba framework takes advantage of state-space models， processing long-range dependencies with linear computational complexity. In contrast to the high computational complexity of Transformer-based models relying on self-attention， Mamba can more efficiently capture long-term dependencies in the image， avoiding the excessive computational burden associated with Transformer models. Simultaneously， snake convolution enhances the precision with which the network adapts to local image features， offering notable advantages in capturing complex motion blur and fine detail blur. The DSM module transforms image features into a one-dimensional sequence and scans these features in multiple directions （diagonal， horizontal， and vertical） to capture long-range dependencies. This module effectively improves global structure restoration， particularly in scenes with objects moving simultaneously in multiple directions， allowing for better reconstruction of the overall image structure. The SCA module uses a gating mechanism to filter and adjust the weights of the blurred information. Combining snake convolution with a channel attention mechanism， this module allows the model to dynamically adjust the weights of different features， prioritizing key image details while removing irrelevant blur information. Through this selective focus， the SCA module significantly enhances detail recovery and optimizes the overall deblurring performance.

Result

To validate the effectiveness of MSNet， we conducted comparative and ablation experiments on two widely used image deblurring benchmark datasets， GoPro and HIDE. During the experiments， MSNet was compared against several commonly used deblurring methods. The results show that MSNet exhibits outstanding performance in addressing image blur artifacts and restoring fine details. On the GoPro dataset， MSNet achieved significant improvements in PSNR and SSIM compared to Transformer-based and CNN-based methods. MSNet demonstrated superior accuracy in restoring blurred regions， effectively overcoming the limitations of existing methods in handling complex scenes. This highlights MSNet’s ability to process images with intricate details and challenging blur conditions more effectively than its counterparts. On the HIDE dataset， MSNet also outperformed Transformer- and CNN-based methods， achieving higher PSNR and SSIM scores. It showed remarkable accuracy in deblurring fine textual and facial details in blurred images. By leveraging its adaptive convolution design and multi-directional scanning approach， MSNet exhibited strong robustness and generalization capabilities， making it well-suited for complex and dynamic scenarios. Moreover， MSNet demonstrated exceptional computational efficiency. It achieved a computational complexity of 67.3 GFLOPs on the GoPro dataset， significantly lower than MIMO-UNet and other comparative methods. This balance of high deblurring performance and low computational cost makes MSNet an ideal solution for real-time deblurring tasks in resource-constrained environments. Ablation studies further validated the contributions of MSNet's key modules. The removal of the Serpentine State-Space Module （SSSM） or the Serpentine Channel Attention （SCA） module led to a significant drop in PSNR， with the greatest decrease occurring when both modules were removed. These findings highlight the critical role of these modules in improving deblurring accuracy and restoring fine image details. Additionally， network depth analysis revealed that MSNet-28 （28 layers） achieved the best performance， with a PSNR of 33.51 dB and an SSIM of 0.97. This confirms the importance of optimizing network depth and module design to enhance overall performance.

Conclusion

In conclusion， MSNet demonstrates outstanding performance across multiple datasets， not only showcasing its exceptional deblurring accuracy and detail recovery capabilities but also achieving a good balance in computational efficiency. By incorporating the state-space model of the Mamba framework and the flexibility of serpentine convolution， MSNet efficiently handles long-range dependencies， particularly exhibiting stronger adaptability in complex blur scenarios. The ablation experiments validate the importance of each module， with the Serpentine State-Space Module （SSSM） and Serpentine Channel Attention （SCA） modules playing key roles in detail recovery and global structure reconstruction. In summary， MSNet excels in deblurring tasks with its strong generalization capabilities， efficient computation， and superior performance in detail recovery.

关键词

图像去模糊Mamba模型方向扫描蛇形卷积蛇形通道注意力

Keywords

image deblurringmamba modelselective scan modulesnake convolutionsnake channel attention

references

Chen H， Wang Y， Guo T， Xu C， Deng Y， Liu Z， Ma S， Xu C， Xu C and Gao W. 2021. Pre-trained image processing transformer//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 12299-12310 ［DOI： 10.48550/arXiv.2012.00364http://dx.doi.org/10.48550/arXiv.2012.00364］

Chen J B， Xiong B S， Kuang F and Zhang Z Z. 2023. Motion deblurring based on deep feature fusion attention and double-scale. Journal of Image and Graphics，28（12）： 3731-3743

陈加保，熊邦书，况发，章照中. 2023. 深度特征融合注意力与双尺度的运动去模糊. 中国图象图形学报，28（12）： 3731-3743 ［DOI： 10.11834/jig.220931http://dx.doi.org/10.11834/jig.220931］

Chen K， Chen B， Liu C， Li W Y， Zou Z X and Shi Z W. 2024. Rsmamba：Remote sensing image classification with state space model. IEEE Geoscience and Remote Sensing Letters，21：1-5 ［DOI： 10.1109/LGRS.2024.3407111http://dx.doi.org/10.1109/LGRS.2024.3407111］

Chen L， Chu X， Zhang X and Sun J. 2022. Simple baselines for image r-estoration//European Conference on Computer Vision. Ch-am：Springer Nature Switzerland，17-33 ［DOI： 10.48550/arXiv.2204.04676http://dx.doi.org/10.48550/arXiv.2204.04676］

Chen L， Lu X， Zhang J ， Chu X J and Chen C P. 2021. Hinet：Half instance normalizatio-n network for image restoration//Proceedings of the IE-EE/CVF Conference on Computer Vision and Pattern Reco-gnition，182-192 ［DOI： 10.48550/arXiv.2105.06086http://dx.doi.org/10.48550/arXiv.2105.06086］

Cheng R Q， Yu Y， Shi D Z and Cai W. 2022. The critical review of image and video quality assessment methods. Journal of Image and Graphics，27（5）： 1410-1429

程茹秋，余烨，石岱宗，蔡文. 2022. 图像与视频质量评价综述. 中国图象图形学报，27（5）： 1410-1429 ［DOI： 10.11834/jig.210314http://dx.doi.org/10.11834/jig.210314］

Chen Z， Zhang Y， Liu D， Xia B， Gu J J， Kong L H and Yuan X. 2024. Hierarchical integration diffusion model for realistic image deblurring. Advances in Neural Information Processing Systems，36： 29114-29125 ［DOI： 10.48550/arXiv.2305.12966http://dx.doi.org/10.48550/arXiv.2305.12966］

Cho S J， Ji S W， Hong J P， Jung S W and Ko S J. 2021. Rethinking coarse-to-fine approach in single image deblurring//Proceedings of the IEEE/CVF International Conference on Computer Vision，4641-4650［DOI： 10.48550/arXiv.2108.05054http://dx.doi.org/10.48550/arXiv.2108.05054］

Gu A， Johnson I， Goel K， Saab K， Dao T， Rudra A and Ré C. 2021. Combining recurrent， convoluti-onal， and continuous-time models with linear state space la-yers. Advances in Neural Information Processing Systems，34：572-585 ［DOI： 10.48550/arXiv.2110.13985http://dx.doi.org/10.48550/arXiv.2110.13985］

Helstrom C W. 1967. Image restoration by the method of least s-quares. Journal of the Optical Society of America A， 57（3）： 297-303 ［DOI： 10.1007/BF02634580http://dx.doi.org/10.1007/BF02634580］

Hirsch M， Schuler C J， Harmeling S，and Schölkopf B. 2011. Fast removal of non-uniform camera shake//International Conference on Computer Vision， 463-470 ［DOI： 10.1109/ICCV.2011.6126276http://dx.doi.org/10.1109/ICCV.2011.6126276］

Hu J， Shen L， Sun G. 2018. Squeeze-and-excitation networks//Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Hu Z Y， Zhou Q， Chen M J， Cui J C， Wu X F and Zheng B Y. 2024. Survey of image deblurring. Journal of Image and Graphics， 29（04）： 841-861

胡张颖，周全，陈明举，崔景程，吴晓富，郑宝玉. 2024. 图像去模糊研究综述. 中国图象图形学报， 29（04）： 841-861 ［DOI：10.11834/jig.230555http://dx.doi.org/10.11834/jig.230555］

Jia J. IEEE， 2007. Single image motion deblurring using trans-parency//2007 IEEE Conference on Computer Vision and Pattern Recognition： 1–8 ［DOI： 10.1109/CVPR.2007.383029http://dx.doi.org/10.1109/CVPR.2007.383029］

Kim K， Lee S and Cho S. 2022. Mssnet：Multi-scale-stage network for single image deblurring//European Conference on Computer Vision. Cham：Springer Nature Switzerland， 524-539 ［DOI： 10.48550/arXiv.2202.09652http://dx.doi.org/10.48550/arXiv.2202.09652］

Krishnan D， Tay T and Fergus R. IEEE， 2011. Blind deconvolution using a normalized sparsity measure//Proceedings of th-e IEEE Conference on Computer Vision and Pattern Recognition， 233–240 ［DOI： 10.1109/CVPR.2011.5995521http://dx.doi.org/10.1109/CVPR.2011.5995521］

Kupyn O， Budzan V， Mykhailych M， Mishkin D and Matas J2018. Deblurgan：Blind m-otion deblurring using conditional adversarial networks//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， 8183-8192 ［DOI： 10.1109/CVPR.2018.00854http://dx.doi.org/10.1109/CVPR.2018.00854］

Li Z， Gao Z， Yi H， Han Yi， Fu Y and Chen B. 2023. Image Deblurring With Image Blurr-ing. IEEE Transactions on Image Processing， 32： 5595-5609 ［DOI： 10.1109/TIP.2023.3321515http://dx.doi.org/10.1109/TIP.2023.3321515］

Liang J， Cao J， Sun G， Zhang K， Gool L V and Radu Timofte. 2021. Swinir：Image restoration using swin transformer//Proceedings of the IEEE/CVF International Conference on Computer Vision， 1833-1844 ［DOI： 10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210］

Liu Z H， Dang Y Y， Song W T Luis K， Jiang Y and Qi Y. 2019. An experimental evaluati-on of covariates effects on unconstrained face verification. IEEE Transactions on Biometrics， Behavior， and Identity Science， 1（1）： 42-55 ［DOI： 10.1016/j.nanoen.2019.104015http://dx.doi.org/10.1016/j.nanoen.2019.104015］

Mao X， Liu Y， Liu F， Liu F， Li Q， Shen W and Wang Y. 2023. Intriguing findings of frequency selection for image deblurring//Proceedings of the AAAI Conference on Artificial Intelligence， 37（2）： 1905-1913 ［DOI： 10.48550/arXiv.2111.11745http://dx.doi.org/10.48550/arXiv.2111.11745］

Mao Y， Wan Z， Dai Y and Yu X. 2022. Deep idempotent network for efficient single image blind deblurring. IEEE Transactions on Circuits and Systems for Video Technology， 33（1）： 172-185 ［DOI： 10.1109/TCSVT.2022.3202361http://dx.doi.org/10.1109/TCSVT.2022.3202361］

Nah S， Hyun K T and Mu L K. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition： 3883-3891 ［DOI： 10.48550/arXiv.1612.02177http://dx.doi.org/10.48550/arXiv.1612.02177］

Nguyen E， Goel K， Gu A， Downs G W， Shah P， Dao T， Baccus S A and Christopher Ré. 2022. S4nd：Modeling images and vid-eos as multidimensional signals with state spaces. Adva-nces in Neural Information Processing Systems， 35： 2846-2861 ［DOI： 10.48550/arXiv.2210.06583http://dx.doi.org/10.48550/arXiv.2210.06583］

Pan J， Sun D， Pfister H and Yang M H. 2016. Blind image deblurring using dark channel prior//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition： 1628-1636 ［DOI： 10.1109/CVPR.2016.180http://dx.doi.org/10.1109/CVPR.2016.180］

Ren W， Cao X， Pan J， Guo X J， Zuo W M and Yang M H. 2016. Image deblurring via enhanced low-rank prior. IEEE Transactions on Image Pr-ocessing， 25（7）： 3426–3437 ［DOI： 10.1109/TIP.2016.2571062http://dx.doi.org/10.1109/TIP.2016.2571062］

Qi Y， He Y， Qi X， Zhang Y and Yang G Y. 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation//Proceedings of the IEEE/CVF International Conference on Computer Vision： 6070-6079 ［DOI： 10.1109/ICCV51070.2023.00558http://dx.doi.org/10.1109/ICCV51070.2023.00558］

Shen Z， Wang W， Lu X， Shen J B， Ling H B， Xu T F and Shao L. 2019. Human-aware motion deblurring//Proceedings of the IEEE/CVF International Conferen-ce on Computer Vision： 5572-5581 ［DOI： 10.48550/arXiv.2001.06816http://dx.doi.org/10.48550/arXiv.2001.06816］

Tao X， Gao H， Shen X， Shen X Y， Wang J and Jia J Y. 2018. Scale-recurrent network for deep image deblurring//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition： 8174-8182 ［DOI： 10.48550/arXiv.1802.01770http://dx.doi.org/10.48550/arXiv.1802.01770］

Tsai F J， Peng Y T， Lin Y Y， Tsai C C and Lin C W. 2022. Stripformer：Strip transfor-mer for fast image deblurring//European Conference on Computer Vision. Cham：Springer Nature Switzerland： 146-162 ［DOI： 10.48550/arXiv.2204.04627http://dx.doi.org/10.48550/arXiv.2204.04627］

Tsai F J， Peng Y T， Tsai C C， Lin Y Y， and Lin C W. 2022. BANet：A blur-aware attention network for dynamic scene deblurring. IEEE Transactions on Image Processing，31： 6789-6799 ［DOI： 10.1109/TIP.2022.3216216http://dx.doi.org/10.1109/TIP.2022.3216216］

Wang J， Zhu W， Wang P， Yu X， Liu L D， Omar M and Hamid R. 2023. Selective structured state-spaces for long-form video understanding//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 6387-6397 ［DOI： 10.48550/arXiv.2303.14526http://dx.doi.org/10.48550/arXiv.2303.14526］

Wang Z， Bovik A C， Sheikh H R and Simoncelli E. P. 2004. Image quality assessment：from error visibility to structural similarity. IEEE Transactions on Image Processing， 13（4）： 600-612 ［DOI： 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861］

Wang Z， Cun X， Bao J， Zhou W G， Liu J Z， Li， H Q. 2022. Uformer：A general u-shaped transformer for image restoration//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 17683-17693 ［DOI： 10.48550/arXiv.2106.03106http://dx.doi.org/10.48550/arXiv.2106.03106］

Whyte O， Sivic J， Zisserman A and Ponce J. 2012. Non-uniform deblurring for shaken images. International Journal of Computer Vis-ion，98： 168-186 ［DOI： 10.1109/CVPR.2010.5540175http://dx.doi.org/10.1109/CVPR.2010.5540175］

Wu D， Zhao H T and Zheng S B. 2020. Motion deblurring method based on DenseNets. Journal of Image and Graphics，25（5）： 890-899

吴迪，赵洪田，郑世宝. 2020. 密集连接卷积网络图像去模糊. 中国图象图形学报，25（5）： 890-899 ［DOI： 10.11834/jig.190400http://dx.doi.org/10.11834/jig.190400］

Xia C L， Wang X L， Lv F， Hao X and Shi Y F. 2024. Vit-comer：Vision transformer with convolutional multi-scale feature interaction for dense predictions//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 5493-5502 ［DOI： 10.48550/arXiv.2403.07392http://dx.doi.org/10.48550/arXiv.2403.07392］

Yan Y， Ren W， Guo Y， Wang R and Cao X C. 2017. Image deblurring via extreme channels prior//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition： 4003-4011 ［DOI： 10.1109/CVPR.2017.738http://dx.doi.org/10.1109/CVPR.2017.738］

Yun S and Ro Y. 2024. Shvit：Single-head vision transformer with me-mory efficient macro design//Proceedings of the IEEE/C-VF Conference on Computer Vision and Pattern Recognition： 5756-5767 ［DOI： 10.48550/arXiv.2401.16456http://dx.doi.org/10.48550/arXiv.2401.16456］

Yuan Y， Su W and Ma D. 2020. Efficient dynamic scene deblurring using spatially variant deconvolution network with optical flow guided training//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 3555-3564 ［DOI： 10.1109/CVPR42600.2020.00361http://dx.doi.org/10.1109/CVPR42600.2020.00361］

Zamir S W， Arora A， Khan S， Hayat M， Khan F S and Yang M H. 2022. Restormer：Efficient transfor-mer for high-resolution image restoration//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 5728-5739 ［DOI： 10.48550/arXiv.2111.09881http://dx.doi.org/10.48550/arXiv.2111.09881］

Zhang K， Luo W， Zhong Y， Ma L， Stenger B， Liu W and Li H D. 2020. Deblurring by realistic blurr-ing//Proceedings of the IEEE/CVF Conference on Com-puter Vision and Pattern Recognition： 2737-2746 ［DOI： 10.48550/arXiv.2004.01860http://dx.doi.org/10.48550/arXiv.2004.01860］

Zamir S W， Arora A， Khan S， Hayat M， Khan F S， Yang M H， and Shao L. 2021. Multi-stage progressive image restoration//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 14821-14831 ［DOI： 10.48550/arXiv.2102.02808http://dx.doi.org/10.48550/arXiv.2102.02808］

Alert me when the article has been cited

提交

Real-world image deblurring： challenges and prospects

Survey of image deblurring

Blind image deblurring with reinforced use of edges

Image deblurring using an adaptive sparse gradient model

Multi-scale image deblurring based on prompt learning and gated feedforward networks