自监督E-Swin的输电线路金具检测

张珂; 周睿恒; 石超君; 韩槊; 杜明坤; 赵振兵

doi:10.11834/jig.220888

图像分析和识别 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

自监督E-Swin的输电线路金具检测
Self-supervised E-Swin based transmission line fittings detection
2023年28卷第10期页码：3064-3076
纸质出版日期： 2023-10-16 ，
DOI： 10.11834/jig.220888
稿件说明：

移动端阅览

张珂，周睿恒，石超君，韩槊，杜明坤，赵振兵. 2023. 自监督E-Swin的输电线路金具检测. 中国图象图形学报， 28(10):3064-3076

Zhang Ke， Zhou Ruiheng， Shi Chaojun， Han Shuo， Du Mingkun， Zhao Zhenbing. 2023. Self-supervised E-Swin based transmission line fittings detection. Journal of Image and Graphics， 28(10):3064-3076
张珂，周睿恒，石超君，韩槊，杜明坤，赵振兵. 2023. 自监督E-Swin的输电线路金具检测. 中国图象图形学报， 28(10):3064-3076 DOI： 10.11834/jig.220888.

Zhang Ke， Zhou Ruiheng， Shi Chaojun， Han Shuo， Du Mingkun， Zhao Zhenbing. 2023. Self-supervised E-Swin based transmission line fittings detection. Journal of Image and Graphics， 28(10):3064-3076 DOI： 10.11834/jig.220888.

摘要

目的

输电线路金具种类繁多、用处多样，与导线和杆塔安全密切相关。评估金具运行状态并实现故障诊断，需对输电线路金具目标进行精确定位和识别，然而随着无人机巡检采集的数据逐渐增多，将全部数据进行人工标注愈发困难。针对无标注数据无法有效利用的问题，提出一种基于自监督E-Swin Transformer（efficient shifted windows Transformer）的输电线路金具检测模型，充分利用无标注数据提高检测精度。

方法

首先，为了减少自注意力的计算量、提高模型计算效率，对Swin Transformer自注意力计算进行优化，提出一种高效的主干网络E-Swin。然后，为了利用无标注金具数据加强特征提取效果，针对E-Swin设计轻量化的自监督方法，并进行预训练。最后，为了提高检测定位精度，采用一种添加额外分支的检测头，并结合预训练之后的主干网络构建检测模型，利用少量有标注的数据进行微调训练，得到最终检测结果。

结果

实验结果表明，在输电线路金具数据集上，本文模型的各目标平均检测精确度（AP

）为88.6%，相比传统检测模型提高了10%左右。

结论

本文改进主干网络的自注意力计算，并采用自监督学习，使模型高效提取特征，实现无标注数据的有效利用，构建的金具检测模型为解决输电线路金具检测的数据利用问题提供了新思路。

Abstract

Objective

Transmission line is a key of infrastructure of power system. To keep the stability of the power system， it is required to preserve key components-based operation in the transmission line like fittings. Fittings are recognized as aluminum or iron-made metal accessories for multiple applications in relevant to such domains of protective fittings， connecting fittings， tension clamps and suspension clamps. Fittings can be mainly used to support， fix and connect bare conductors and insulators. Such components are erosional for such complicated natural environment year by year. They are likely to have displacement， deflection and damage， which will affect the stability of the transmission system structure. If the defects of fittings are not sorted out quickly， they will cause severe circuit-damaged accidents. To assess status of the fittings and realize fault diagnosis， it is required to locate and identify the target of the transmission line fittings accurately. The emerging deep learning and unmanned aerial vehicle inspection techniques have been developing to optimize conventional single manual inspection technology further. A maintenance mode is melted into gradually， which can use unmanned aerial vehicle to acquire images， and the deep learning method is then incorporated to process aerial photos automatically. Most of these methods are focused on supervised learning only， that is， model training-before artificial data annotation is required for. As more and more data on transmission line components are collected by unmanned aerial vehicle patrols， manual labeling requires a large amount of human resources， and such missing and incorrect labeling problems will be occurred after that. To resolve this problem， we develop a fitting detection model based on self-supervised Transformer. Self-supervised learning is focused on unlabeled data-related pretext task design to mine the feature representation of the data itself and improve the feature extraction ability of the model. Less supervised data is then used for fine-tuning training through detection or segmentation-related downstream tasks. To resolve the problem of large amount of the model calculation， Swin Transformer is improved and an efficient one-stage fitting detection model is built up based on self-supervised learning.

Method

Transformer model has shown its great potentials for computer vision in recent years. Due to its global self-attention calculation， Transformer can be used to extract more effective image feature information than convolutional neural network （CNN） to some extent. In addition， self-supervised learning feature of Transformer in natural language processing （NLP） domain has been gradually developing in computer vision （CV） domain. The fitting detection method proposed is segmented into three main categories. First， Swin Transformer is used as the backbone network. The calculation of self-attention is improved to solve the problem of large amount of calculation， and a smaller and more efficient backbone E-Swin is generated further. Second， the self-supervised pretext task of image reconstruction is designed. The improved backbone network is pre-trained in terms of self-supervised learning， and feature extraction ability of the model is trained in related to a large number of unlabeled data. After the self-supervised training， the network will be used as the backbone of the detection model. Finally， to improve the detection accuracy and get the final model， an optimized detector head is used to establish a high-precision one-stage detection model， and a small amount of labeled data is used for fine-tuning training.

Result

The transmission line fittings dataset is used to train and evaluate the model. The samples of image data are cut out derived from the inspection of the transmission line unmanned aerial vehicle （UAV）. The first dataset is a large number of unlabeled data for self-supervised learning. The aerial photos-related dataset is clipped directly to remove background redundancy and preserve valid target information. The second dataset is a kind of labeled dataset for fine-tuning with a total of 1 600 images. It is split into train samples and test samples according to the ratio of 4∶1. These samples consist of 12 types of fittings with a total labeled target of 10 178. The experimental results show that the average precision （AP

） of the model on the transmission line fittings dataset is 88.6%， which is 10% higher than the traditional detection models.

Conclusion

Self-attention calculation of the backbone network is improved. Self-supervised learning can be used to extract efficient features and effective application of unlabeled data can be realized. A one-stage fitting detection model is facilitated for resolving the problem of data application in transmission line fittings detection further.

关键词

深度学习目标检测输电线路金具自监督学习E-Swin Transformer模型一阶段检测器

Keywords

deep learningobject detectiontransmission line fittingself-supervised learningE-swin Transformerone-stage detector

references

Bai J Y， Zhao R， Gu F Q and Wang J. 2019. Multi-target detection and fault recognition image processing method. High Voltage Engineering， 45（11）： 3504-3511

白洁音，赵瑞，谷丰强，王姣. 2019. 多目标检测和故障识别图像处理方法. 高电压技术， 45（11）： 3504-3511 ［DOI： 10.13336/j.10036520.hve.20191031014http://dx.doi.org/10.13336/j.10036520.hve.20191031014］

Beal J， Kim E， Tzeng E， Park D H， Zhai A and Kislyuk D. 2020. Toward Transformer-based object detection ［EB/OL］. ［2023-02-20］. https://arxiv.org/pdf/2012.09958.pdfhttps://arxiv.org/pdf/2012.09958.pdf

Bochkovskiy A， Wang C Y and Mark Liao H Y. 2020. Yolov4： optimal speed and accuracy of object detection ［EB/OL］. ［2023-02-20］. https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf

Cai Z W and Vasconcelos N. 2018. Cascade R-CNN： delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6154-6162 ［DOI： 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644］

Carion N， Massa F， Synnaeve G， Usunier N， Kirillov A and Zagoruyko S. 2020. End-to-end object detection with Transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 213-229 ［DOI： 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13］

Dai J F， Li Y， He K M and Sun J. 2016. R-FCN： object detection via region-based fully convolutional networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 379-387

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-02-20］. https：//arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

He K M， Chen X L， Xie S N， Li Y H， Doll􀆦r P and Girshick R. 2022. Masked autoencoders are scalable vision learners//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 15979-15988 ［DOI： 10.1109/CVPR52688.2022.01553http://dx.doi.org/10.1109/CVPR52688.2022.01553］

Huang H， Tao H J and Wang H F. 2019. Low-illumination image enhancement using a conditional generative adversarial network. Journal of Image and Graphics， 24（12）： 2149-2158

黄鐄，陶海军，王海峰. 2019. 条件生成对抗网络的低照度图像增强方法. 中国图象图形学报， 24（12）： 2149-2158 ［DOI： 10.11834/jig.190145http://dx.doi.org/10.11834/jig.190145］

Li X， Wang W H， Yang L F and Yang J. 2022. Uniform masking： enabling MAE pre-training for pyramid-based vision Transformers with locality ［EB/OL］. ［2023-02-20］. https://arxiv.org/pdf/2205.10063.pdfhttps://arxiv.org/pdf/2205.10063.pdf

Lin T Y， Goyal P， Girshick R， He K M and Doll􀅡r P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2999-3007 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Liu S， Qi L， Qin H F， Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8759-8768 ［DOI： 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Qi Y C， Jiang A X， Zhao Z B， Lang J Y and Nie L Q. 2019. Fittings detection method in patrol images of transmission line based on improved SSD. Electrical Measurement and Instrumentation， 56（22）： 7-12， 43

戚银城，江爱雪，赵振兵，郎静宜，聂礼强. 2019. 基于改进SSD模型的输电线路巡检图像金具检测方法. 电测与仪表， 56（22）： 7-12， 43 ［DOI： 10.19753/j.issn1001-1390.2019.022.002http://dx.doi.org/10.19753/j.issn1001-1390.2019.022.002］

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Tang Y， Han J， Wei W L， Ding J and Peng X J. 2018. Research on part recognition and defect detection of trainsmission line in deep learning. Electronic Measurement Technology， 41（6）： 60-65

汤踊，韩军，魏文力，丁建，彭新俊. 2018. 深度学习在输电线路中部件识别与缺陷检测的研究. 电子测量技术， 41（6）： 60-65 ［DOI： 10.19651/j.cnki.emt.1701266http://dx.doi.org/10.19651/j.cnki.emt.1701266］

Tian Z， Shen C H， Chen H and He T. 2019. FCOS： fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 9626-9635 ［DOI： 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wan N， Tang X M， Liu S Y， Chen J Q， Guo K G， Li L Y and Liu S. 2020. Transmission line image object detection method considering fine-grained contexts//Proceedings of the 4th IEEE Information Technology， Networking， Electronic and Automation Control Conference （ITNEC）. Chongqing， China： IEEE： 499-502 ［DOI： 10.1109/ITNEC48623.2020.9084729http://dx.doi.org/10.1109/ITNEC48623.2020.9084729］

Wang W H， Xie E Z， Li X， Fan D P， Song K T， Liang D， Lu T， Luo P and Shao L. 2021. Pyramid vision Transformer： a versatile backbone for dense prediction without convolutions//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 548-558 ［DOI： 10.1109/ICCV48922.2021.00061http://dx.doi.org/10.1109/ICCV48922.2021.00061］

Wu S K， Li X P and Wang X G. 2020. IoU-aware single-stage object detector for accurate localization. Image and Vision Computing， 97： #103911 ［DOI： 10.1016/j.imavis.2020.103911http://dx.doi.org/10.1016/j.imavis.2020.103911］

Xie Z D， Zhang Z， Cao Y， Lin Y T， Bao J M， Yao Z L， Dai Q and Hu H. 2022. SimMIM： a simple framework for masked image modeling//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 9653-9663

Zhai Y J， Wang Q M， Yang X， Zhao Z B and Zhao W Q. 2022. A multi-fitting decoupling detection method for transmission lines based on external knowledge. CAAI Transactions on Intelligent Systems， 17（5）： 980-989

翟永杰，王乾铭，杨旭，赵振兵，赵文清. 2022. 融合外部知识的输电线路多金具解耦检测方法. 智能系统学报， 17（5）： 980-989 ［DOI： 10.11992/tis.202107026http://dx.doi.org/10.11992/tis.202107026］

Zhang K， Feng X H， Guo Y R， Su Y K， Zhao K， Zhao Z B， Ma Z Y and Ding Q L. 2021. Overview of deep convolutional neural networks for image classification. Journal of Image and Graphics， 26（10）： 2305-2325

张珂，冯晓晗，郭玉荣，苏昱坤，赵凯，赵振兵，马占宇，丁巧林. 2021. 图像分类的深度卷积神经网络模型综述. 中国图象图形学报， 26（10）： 2305-2325 ［DOI： 10.11834/jig.200302http://dx.doi.org/10.11834/jig.200302］

Zhang Q L and Yang Y B. 2021. ResT： an efficient Transformer for visual recognition//Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021： 15475-15485

Zhang Q L and Yang Y B. 2022. ResT V2： simpler， faster and stronger ［EB/OL］. ［2023-02-20］. https://arxiv.org/pdf/2204.07366.pdfhttps://arxiv.org/pdf/2204.07366.pdf

Zhao Z B， Jiang Z G， Li Y X， Qi Y C， Zhai Y J， Zhao W Q and Zhang K. 2021. Overview of visual defect detection of transmission line components. Journal of Image and Graphics， 26（11）： 2545-2560

赵振兵，蒋志钢，李延旭，戚银城，翟永杰，赵文清，张珂. 2021. 输电线路部件视觉缺陷检测综述. 中国图象图形学报， 26（11）： 2545-2560 ［DOI： 10.11834/jig.200689http://dx.doi.org/10.11834/jig.200689］

Zhao Z B， Qi H Y and Nie L Q. 2019. Research overview on visual detection of transmission lines based on deep learning. Guangdong Electric Power， 32（9）： 11-23

赵振兵，齐鸿雨，聂礼强. 2019. 基于深度学习的输电线路视觉检测研究综述. 广东电力， 32（9）： 11-23 ［DOI： 10.3969/j.issn.1007-290X.2019.009.002http://dx.doi.org/10.3969/j.issn.1007-290X.2019.009.002］

Zhao Z B， Xiong J， Li B， Wang Y R and Zhang S. 2022. Typical fittings and its partial defect detection method based on improved cascade R-CNN. High Voltage Engineering， 48（3）： 1060-1067

赵振兵，熊静，李冰，王亚茹，张帅. 2022. 基于改进Cascade R-CNN的典型金具及其部分缺陷检测方法. 高电压技术， 48（3）： 1060-1067 ［DOI： 10.13336/j.1003-6520.hve.20211148http://dx.doi.org/10.13336/j.1003-6520.hve.20211148］

Zheng Z H， Wang P， Liu W， Li J Z， Ye R G and Ren D W. 2020. Distance-IOU Loss： faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence， 34（7）： 12993-13000 ［DOI： 10.1609/aaai.v34i07.6999http://dx.doi.org/10.1609/aaai.v34i07.6999］

文章被引用时，请邮件提醒。

提交

融合帧间时序关系的标准胎儿四腔心超声切面自动获取