面向少样本表计缺陷图像生成的稳定扩散模型
Stable Diffusion Model for Few-shot Meter Defect Image Generation
- 2025年 页码:1-13
收稿日期:2024-12-27,
修回日期:2025-03-04,
录用日期:2025-04-08,
网络出版日期:2025-04-09
DOI: 10.11834/jig.240777
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2024-12-27,
修回日期:2025-03-04,
录用日期:2025-04-08,
网络出版日期:2025-04-09,
移动端阅览
目的
2
变电站缺陷检测对电力系统的安全稳定运行具有重要意义,其中表计作为关键设备,其缺陷检测对保障电力调度和运行监控的准确性尤为重要。目前表计缺陷相关数据极其稀缺,成为实现高效检测的主要挑战。然而,现有缺陷生成方法多依赖于对已有缺陷样本的简单变换或直接生成,难以在少样本条件下生成多样化、高质量的缺陷图像,限制了实际应用效果。为此,本文提出一种面向少样本表计缺陷图像生成的稳定扩散模型,旨在生成高质量、形式丰富的缺陷图像,以满足实际场景需求。
方法
2
首先,针对现有模型生成图像与实际表计图像差距较大的问题,通过微调方法,将独特标识符与表计图像绑定,实现表计知识嵌入,提升生成图像与实际变电站表计的相似度;其次,为克服传统模型在生成多样性和缺陷样式控制上的不足,设计裂纹特征建模方法,通过逐像素融合线稿图与裂纹掩码,并结合约束图限制建模区域,生成具备几何约束的控制图像,精准表达缺陷特征;最后,基于生成的控制图像,利用超网络机制动态调整生成过程,严格控制缺陷形状、位置及样式,确保生成结果符合实际需求。
结果
2
在真实巡检图像构建的表计数据集上进行实验,结果表明所提方法在轮廓细节、缺陷样式和表计多样性方面表现更优,弗雷歇特启动距离(Fréchet Inception Distance,FID)和启动分数(Inception Score,IS)指标分别达到76.72和2.45。在下游检测任务中,加入生成数据后,检测精度提升26.9%,mAP50(mean Average Precision)增加19.1%,充分验证了生成数据对提升检测性能的有效性。
结论
2
所提方法在少样本条件下有效解决了现有生成模型在图像多样性不足和生成质量不稳定等问题,显著提升了生成样本的实用价值。生成的高质量缺陷图像为电力巡检系统的高效、可靠运行提供了有力的数据支持,展现了广泛的工业应用潜力。
Objective
2
Meters, as critical components of substations, are essential for maintaining power grid stability. However, prolonged exposure to harsh environments such as extreme weather and temperature fluctuations makes them prone to defects like cracks and deformations. These issues can disrupt operations and threaten grid reliability. Early detection is vital to prevent cascading failures. While recent advancements in computer vision have improved defect identification, training high-performance models relies on large, accurately labeled datasets, which are costly and limited by the scarcity of real-world defect data for specialized equipment like meters. To address this issue, generative data augmentation methods offer an effective and promising solution. Techniques such as generative adversarial networks (GANs) and denoising diffusion probabilistic models (DDPMs) have proven their ability to generate visually compelling images by training on large-scale datasets. These methods are widely used to supplement existing datasets, enhance data diversity, and improve model training efficiency. However, when applied to small-sample datasets of substation meter defects, these approaches face significant challenges, such as contour distortion, insufficient texture details, and excessive similarity to original images. These issues degrade the quality of generated images, fail to capture the subtle characteristics of meter defects, and limit their usefulness in tasks like defect detection and segmentation. To overcome these limitations, this study proposes a novel defect generation method based on the Stable Diffusion model, specifically designed for small-sample scenarios. By leveraging its capability to balance high-quality and diverse image generation, this approach addresses the weaknesses of existing methods, improving both the fidelity and variability of generated defect images. The proposed method ensures better alignment with real-world applications and enhances the applicability of synthetic images in downstream detection and analysis tasks. Ultimately, it contributes to improved defect detection performance and increased reliability in industrial applications.
Method
2
This study proposes a novel small-sample defect generation method for substation meters based on a diffusion model, addressing the limitations of existing approaches in capturing structural and defect-specific features while achieving high-quality image generation to significantly enhance downstream applications. First, the pre-trained Stable Diffusion model was fine-tuned using a meter knowledge embedding strategy. This process effectively integrated the structural characteristics of substation meters and defect features into the model weights, enhancing the model’s ability to comprehend meter patterns and improving the accuracy of key feature representation and reconstruction. Second, a crack feature modeling module was developed. This module utilized a structured preprocessing approach to process normal meter images and seamlessly integrated them with existing defect masks, generating control images with precise geometric and spatial constraints. The module effectively delineated the spatial distribution of defects, ensuring accuracy and consistency in defect localization, and provided reliable conditional guidance for subsequent generation processes. Finally, an innovative hypernetwork-based conditional generation mechanism was introduced. While maintaining the diversity of generated images, this mechanism achieved precise manipulation of defect shapes, positions, and other characteristics. By dynamically adjusting model weights and refining input conditions, the hypernetwork effectively ensured local constraints and global coherence during the generation process. It enabled precise control over defect generation while moderately reducing strict constraints on other details, granting the model creative flexibility to balance high-quality and diverse image generation.
Result
2
Comprehensive experimental validation was conducted on the constructed Substation Meter Dataset (SMD), demonstrating that the proposed method can generate high-quality images with diverse and precise defect characteristics, closely aligning with real-world application scenarios. The introduction of synthetic data significantly improved the performance of the model in downstream defect detection tasks. Notably, when 40% synthetic data was added to the training set, model precision increased by 26.9%, and mAP50 improved by 19.1%, further verifying the effectiveness of the proposed method in enhancing detection accuracy and robustness. Moreover, comparative experiments with advanced mainstream methods highlighted the superiority of the proposed approach. Fréchet Inception Distance (FID) and Inception Score (IS) were used as evaluation metrics to measure the similarity between generated and real images and the diversity of generated images, respectively. A lower FID score indicates higher generation quality, reflecting a smaller gap between the distributions of generated and real images, while a higher IS score demonstrates better clarity and diversity of the generated images. Experimental results show that the proposed method achieved the best performance in both FID and IS metrics, with scores of 76.72 and 2.45, respectively, significantly surpassing other mainstream methods.
Conclusion
2
This study proposes a small-sample generation method based on the Stable Diffusion model, focusing on the generation of substation meter defect images. Experimental results demonstrate that the proposed method effectively addresses issues associated with existing generation models, such as poor quality and high redundancy of images generated from small-sample specialized datasets. By producing high-quality defect images, the method significantly enhances the accuracy and robustness of downstream defect detection tasks, providing a solid and reliable technical foundation for the stable operation of power systems.
Alaluf Y , Tov O , Mokady R , Patashnik O and Cohen-Or D . 2022 . Hyperstyle: StyleGAN inversion with hypernetworks for real image editing . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 18511 - 18521 [ DOI: 10.1109/CVPR52688.2022.01803 http://dx.doi.org/10.1109/CVPR52688.2022.01803 ]
Arjovsky M and Bottou L . 2017 . Towards principled methods for training generative adversarial networks . stat , 1050 : 17 [ DOI: 10.48550/arXiv.1701.04862 http://dx.doi.org/10.48550/arXiv.1701.04862 ]
Couairon G , Verbeek J , Schwenk H and Guillou L . 2022 . Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427 [DOI: 10.48550/arXiv.2210.11427] .
Croitoru F A , Hondru V , Ionescu R T , Verbeek J and Schwenk H . 2023 . Diffusion models in vision: A survey . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 9 ): 10850 - 10869 [ DOI: 10.1109/TPAMI.2023.3291007 http://dx.doi.org/10.1109/TPAMI.2023.3291007 ].
Dhariwal P and Nichol A . 2021 . Diffusion models beat gans on image synthesis . Advances in Neural Information Processing Systems , 34 : 8780 - 8794 [ DOI: 10.48550/arXiv.2105.05233 http://dx.doi.org/10.48550/arXiv.2105.05233 ]
Dinh T M , Tran A T , Nguyen R , Nguyen T T and Le H H . 2022 . Hyperinverter: Improving stylegan inversion via hypernetwork . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 11389 - 11398 [ DOI: 10.1109/CVPR52688.2022.01114 http://dx.doi.org/10.1109/CVPR52688.2022.01114 ]
Duan Y , Hong Y , Niu L and Zhang L . 2023 . Few-shot defect image generation via defect-aware feature manipulation . Proceedings of the AAAI Conference on Artificial Intelligence , 37 ( 1 ): 571 - 578 [ DOI: 10.1609/aaai.v37i1.25214 http://dx.doi.org/10.1609/aaai.v37i1.25214 ]
Goodfellow I , Pouget-Abadie J , Mirza M , Xu B , Warde-Farley D , Ozair S , Courville A and Bengio Y . 2014 . Generative adversarial nets . Advances in Neural Information Processing Systems , 27 : 2672 - 2680 [ DOI: 10.48550/arXiv.1406.2661 http://dx.doi.org/10.48550/arXiv.1406.2661 ]
Gulrajani I , Ahmed F , Arjovsky M , Dumoulin V and Courville A . 2017 . Improved training of wasserstein gans . Advances in Neural Information Processing Systems , 5767 - 5777 [ DOI: 10.48550/arXiv.1704.00028 http://dx.doi.org/10.48550/arXiv.1704.00028 ]
Ho J , Jain A and Abbeel P . 2020 . Denoising diffusion probabilistic models . Advances in Neural Information Processing Systems , 33 : 6840 - 6851 [ DOI: 10.48550/arXiv.2006.11239 http://dx.doi.org/10.48550/arXiv.2006.11239 ]
Ho J and Salimans T . 2022 . Classifier-free diffusion guidance . arXiv preprint arXiv : 2207 . 12598 [ DOI: 10.48550/arXiv.2207.12598 http://dx.doi.org/10.48550/arXiv.2207.12598 ]
Hu E J , Shen Y , Wallis P , Wang D , Chen T and Dai A M . 2021 . Lora: Low-rank adaptation of large language models . arXiv preprint arXiv : 2106 . 09685 [ DOI: 10.48550/arXiv.2106.09685 http://dx.doi.org/10.48550/arXiv.2106.09685 ]
Kang M , Zhu J Y , Zhang R , Park J , Shechtman E , Paris S and Park T . 2023 . Scaling up gans for text-to-image synthesis . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 10124 - 10134 [ DOI: 10.1109/CVPR2023.10124 http://dx.doi.org/10.1109/CVPR2023.10124 ]
Karras T , Laine S and Aila T . 2019 . A style-based generator architecture for generative adversarial networks . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 4401 - 4410 [ DOI: 10.1109/CVPR.2019.00453 http://dx.doi.org/10.1109/CVPR.2019.00453 ]
Karras T , Laine S , Aittala M , Hellsten J , Lehtinen J and Aila T . 2020 . Analyzing and improving the image quality of stylegan . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 8110 - 8119 [ DOI: 10.1109/CVPR42600.2020.00814 http://dx.doi.org/10.1109/CVPR42600.2020.00814 ]
Kim J and Kim T K . 2024 . Arbitrary-scale image generation and upsampling using latent diffusion model and implicit neural decoder . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 9202 - 9211
Kirillov A , Mintun E , Ravi N , Mohan B , Mao H , Rolland A , Gustafson L and Girshick R . 2023 . Segment anything . Proceedings of the IEEE/CVF International Conference on Computer Vision : 4015 - 4026 [ DOI: 10.1109/ICCV2023.00451 http://dx.doi.org/10.1109/ICCV2023.00451 ]
Li C , Gan Z , Yang Z , Wang L , Tang J and Deng J . 2024 . Multimodal foundation models: From specialists to general-purpose assistants . Foundations and Trends® in Computer Graphics and Vision , 16 ( 1-2 ): 1 - 214
Liu B C , Zhu Y Z , Song K P and Elgammal A . 2020 . Towards faster and stabilized gan training for high-fidelity few-shot image synthesis . International Conference on Learning Representations [DOI: 10.48550/arXiv.2004.02532]
Liu J W , Wang Q , Fan H J , Wang Y N , Tang Y D and Qu L Q . 2024 . Residual denoising diffusion models . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 2773 - 2783
Lin T Y , Maire M , Belongie S , Hays J , Perona P , Ramanan D , Dollar P and Zitnick C . 2014 . Microsoft coco: Common objects in context. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, Proceedings, Part V 13 : 740 - 755 [ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Sauer A , Chitta K , Müller J and Wang S . 2021 . Projected gans converge faster . Advances in Neural Information Processing Systems , 34 : 17480 - 17492 [ DOI: 10.48550/arXiv.2104.00487 http://dx.doi.org/10.48550/arXiv.2104.00487 ]
Nichol A Q and Dhariwal P . 2021 . Improved denoising diffusion probabilistic models. International Conference on Machine Learning . PMLR : 8162 - 8171 [ DOI: 10.48550/arXiv.2102.09672 http://dx.doi.org/10.48550/arXiv.2102.09672 ]
Niu S , Li B , Wang X , Huang J , Zhang H and Yang L . 2021 . Region-and strength-controllable GAN for defect generation and segmentation in industrial images . IEEE Transactions on Industrial Informatics , 18 ( 7 ): 4531 - 4541 [ DOI: 10.1109/TII.2021.3090307 http://dx.doi.org/10.1109/TII.2021.3090307 ]
Niu T , Li B , Li W , Zhang X , Chen J and Liu Y . 2021 . Positive-sample-based surface defect detection using memory-augmented adversarial autoencoders . IEEE/ASME Transactions on Mechatronics , 27 ( 1 ): 46 - 57 [ DOI: 10.1109/TMECH.2021.3057804 http://dx.doi.org/10.1109/TMECH.2021.3057804 ]
Qi Y C , Wu X L , Zhao Z B , Shi B Q and Nie L Q . 2021 . Bolt defect detection for aerial transmission lines using Faster R-CNN with an embedded dual attention mechanism . Journal of Image and Graphics , 26 ( 11 ) : 2594 - 2604
戚银城 , 武学良 , 赵振兵 , 史博强 , 聂礼强 . 2021 . 嵌入双注意力机制的 Faster R-CNN 航拍输电线路螺栓缺陷检测 . 中国图象图形学报 , 26 ( 11 ) : 2594 - 2604 [ DOI:10. 11834 / jig. 200793]
Rombach R , Blattmann A , Lorenz D , Esser P and Ommer B . 2022 . High-resolution image synthesis with latent diffusion models . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 10684 - 10695 [ DOI: 10.1109/CVPR52688.2022.01041 http://dx.doi.org/10.1109/CVPR52688.2022.01041 ]
Royle J A , Dorazio R M and Link W A . 2007 . Analysis of multinomial models with unknown index using data augmentation . Journal of Computational and Graphical Statistics , 16 ( 1 ): 67 - 85 [ DOI: 10.1198/106186007X179210 http://dx.doi.org/10.1198/106186007X179210 ]
Ruiz N , Li Y , Rubinstein M , Jampani V , Pritch Y and Aberman K . 2023 . DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation . Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition : 22500 - 22510 [ DOI: 10.1109/CVPR52729.2023.02155 http://dx.doi.org/10.1109/CVPR52729.2023.02155 ]
Sharma P , Ding N , Goodman S and Soricut R . 2018 . Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) : 2556 - 2565 [ DOI: 10.18653/v1/P18-1250 http://dx.doi.org/10.18653/v1/P18-1250 ]
Siddiqui Z A and Park U . 2020 . A drone based transmission line components inspection system with deep learning technique . Energies , 13 ( 13 ): 3348 [ DOI: 10.3390/en13133348 http://dx.doi.org/10.3390/en13133348 ]
Song J , Meng C and Ermon S . 2020 . Denoising diffusion implicit models . arXiv preprint arXiv : 2010 . 02502 [ DOI: 10.48550/arXiv.2010.02502 http://dx.doi.org/10.48550/arXiv.2010.02502 ]
Valvano G , Agostino A , De Magistris G , Graziano A and Veneri G . 2024 . Controllable image synthesis of industrial data using stable diffusion . Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision : 5354 - 5363 [ DOI: 10.1109/WACV56688.2024.00555 http://dx.doi.org/10.1109/WACV56688.2024.00555 ]
Wang Q , Zhang B , Birsak M and Houlsby N . 2023 . Instructedit: Improving automatic masks for diffusion-based image editing with user instructions . arXiv preprint arXiv : 2305 . 18047 [ DOI: 10.48550/arXiv.2305.18047 http://dx.doi.org/10.48550/arXiv.2305.18047 ]
Xie L F , Jiang C , Sun Q Q , Wang H W , Song Q W and Guan G F . 2019 . Map creation and localization for substation inspection robots based on AMC algorithm . Electric Power Engineering Technology , 38 ( 05 ): 16 - 23 .
谢林枫 , 蒋超 , 孙秋芹 , 王昊炜 , 宋庆武 , 官国飞 . 2019 . 基于AMC算法的变电站巡检机器人地图创建与定位 . 电力工程技术 , 38 ( 05 ): 16 - 23 . [ DOI: 10.19319/j.cnki.1673-2520.2019.05.004 http://dx.doi.org/10.19319/j.cnki.1673-2520.2019.05.004 ]
Yang L , Zhang Z , Song Y , Schwenk H and Verbeek J . 2023 . Diffusion models: A comprehensive survey of methods and applications . ACM Computing Surveys , 56 ( 4 ): 1 - 39 [ DOI: 10.1145/3587390 http://dx.doi.org/10.1145/3587390 ]
Zhang L , Rao A and Agrawala M . 2023 . Adding conditional control to text-to-image diffusion models . Proceedings of the IEEE/CVF International Conference on Computer Vision : 3836 - 3847 [ DOI: 10.1109/ICCV2023.00451 http://dx.doi.org/10.1109/ICCV2023.00451 ]
Zhao Z B , Jiang Z G , Li Y X , Qi Y C , Zhai Y J , Zhao W Q and Zhang K . 2021 . Overview of visual defect detection of transmission line components . Journal of Image and Graphics , 26 ( 11 ): 2545 - 2560 .
赵振兵 , 蒋志钢 , 李延旭 , 戚银城 , 翟永杰 , 赵文清 , 张珂 . 2021 . 输电线路部件视觉缺陷检测综述 . 中国图象图形学报 , 26 ( 11 ): 2545 - 2560 . [ DOI: 10.11834/jig.200689 http://dx.doi.org/10.11834/jig.200689 ]
Zhu X H , Qian L P and Fu W . 2021 . A review of image data augmentation techniques . Software Guide , 20 ( 05 ): 230 - 236 .
朱晓慧 , 钱丽萍 , 傅伟 . 2021 . 图像数据增强技术研究综述 . 软件导刊 , 20 ( 05 ): 230 - 236 . [ DOI: 10.11907/rjdk.202372 http://dx.doi.org/10.11907/rjdk.202372 ]
相关作者
相关机构