轻量化航拍图像电力线语义分割

许刚; 李果

doi:10.11834/jig.200690

电力视觉前沿技术 | 浏览量 : 0 下载量: 0 CSCD: 2

PDF
导出
分享
收藏
专辑

轻量化航拍图像电力线语义分割
Research on lightweight neural network of aerial powerline image segmentation
2021年26卷第11期页码：2605-2618
纸质出版日期： 2021-11-16 ，

录用日期： 2021-04-20
DOI： 10.11834/jig.200690
稿件说明：

移动端阅览

许刚, 李果. 轻量化航拍图像电力线语义分割[J]. 中国图象图形学报, 2021,26(11):2605-2618.

Gang Xu, Guo Li. Research on lightweight neural network of aerial powerline image segmentation[J]. Journal of Image and Graphics, 2021,26(11):2605-2618.
许刚, 李果. 轻量化航拍图像电力线语义分割[J]. 中国图象图形学报, 2021,26(11):2605-2618. DOI： 10.11834/jig.200690.

Gang Xu, Guo Li. Research on lightweight neural network of aerial powerline image segmentation[J]. Journal of Image and Graphics, 2021,26(11):2605-2618. DOI： 10.11834/jig.200690.

摘要

目的

电力线在航拍图像中的提取是智能巡检的重要研究内容，基于深度学习的图像语义分割模型在此领域的应用已有较好的效果。然而，图像训练集容量较小和预训练模型计算量过大是两个待解决的问题。

方法

首先使用生成对抗网络模型结合圆锥曲线和色相扰动进行数据集增强，对3种不同的损失函数以及两个色彩空间所训练的U-Net网络模型进行对比，给出最优组合。然后提出了一种联合一阶泰勒展开和输出通道2范数的显著度指标，对上述完整模型使用改进的通道级参数正则化方法来稀疏化模型权重，并对稀疏模型进行网络剪枝和重训练以降低模型的计算量。最后，在判决阈值的选择上，使用自适应阈值替代固定值法以增强对亮度变化的鲁棒性。

结果

实验结果表明，提出的灰度输入轻量化模型IoU（intersection-over-union）指标为0.459，但其参数量和计算量相当于IoU指标为0.573的可见光完整模型的0.03%和3.05%，且自适应阈值法在合适的光照变化范围内能达到该条件下最优阈值的相似结果。

结论

验证了不同数据集增强方法、损失函数、输入色彩空间组合对模型收敛性能、训练速度和过拟合程度的影响，给出了各色彩空间内的最佳组合。同时，采用网络剪枝的方式极大降低了电力线语义分割网络的参数量和运算量，对网络模型的落地部署有积极的作用。

Abstract

Objective

Powerline semantic segmentation of aerial images

as an important content of powerline intelligent inspection research

has received widespread attention. Recently

several deep learning-based methods have been proposed in this field and achieved high accuracy. However

two major problems still need to be solved before deep learning models can be applied in practice. First

the sample size of publicly available datasets is small. Unlike target objects in other semantic segmentation tasks (e.g.

cars and buildings)

powerlines have few textures and structural features

which make powerlines easy to be misidentified

especially in scenes that are not covered by the training set. Therefore

constructing a training set that contains many different background samples is crucial to improve the generalization capability of the model. The second problem is the conflict between the amount of model computation and the limited terminal computing resources. Previous work has demonstrated that an improved U-Net model can segment powerlines from aerial images with satisfactory accuracy. However

the model is computationally expensive for many resource-constrained inference terminals (e.g.

unmanned aerial vehicles(UAVs)).

Method

In this study

the background images in the training set were learned using a generative adversarial network (GAN) to generate a series of pseudo-backgrounds

and curved powerlines were drawn on the generated images by utilizing conic curves. In detail

a multi-scale-based automatic growth model called progressive growing of GANs (PGGAN) was adopted to learn the mapping of a random noise vector to the background images in the training set. Then

its generator was used to generate serials of the background images. These background images and the curved powerlines generated by the conic curves were fused in the alpha channel. We created three training sets. The first one consisted of only 2 000 real background pictures

and the second was a mixture of 10 000 real and generated background images. The third training dataset was composed of 200 generated backgrounds and used to evaluate the similarity between the generated and original images. At the input of the segmentation network

random hue perturbation was applied to the images to enhance the generalization of the model across seasons. Then

the convergence accuracy of U-Net networks with three different loss functions was compared in RGB and grayscale color spaces to determine the best combination. Specifically

we trained U-Net with focal

soft-IoU

and Dice loss functions in RGB and gray spaces and compared the convergence accuracy

convergence speed

and overfitting of the six obtained models. Afterward

sparse regularization was applied to the pre-trained full model

and structured network pruning was performed to reduce the computation load in network inference. A saliency metric that combines first-order Taylor expansion and 2-norm metric was proposed to guide the regularization and pruning process. It provided a higher compression rate compared with the 2-norm that was used in the previous pruning algorithm. Conventional saliency metrics based on first-order expansion can change by orders of magnitude during the regularization process

thus making threshold selection during the iterative process difficult. Compared with these conventional metrics

the proposed metric has a more stable range of values

which enables the use of iteration-based regularization methods. We adopted a 0-norm-based regularization method to widen the saliency gap between important and unimportant neurons. To select the decision threshold

we used an adaptive approach

which was more robust to changes in luminance compared with the fixed-threshold method used in previous work.

Result

Experimental results showed that the convergence accuracy of the curved powerline dataset was higher than that of the straight powerline dataset. In RGB space

the hybrid dataset using GAN had higher convergence accuracy than the dataset using only real images

but no significant improvement in gray space was observed due to the possibility of model collapse. We confirmed that hue disturbance can effectively improve the performance of the model across seasons. The experimental results of the different loss functions revealed that the convergence intersection-over-union(IoU) of RGB and gray spaces under their respective optimal loss functions was 0.578 and 0.586

respectively. Dice and soft-IoU had a negligible difference in convergence speed and achieved the best accuracy in gray and RGB spaces

respectively. The convergence of focal loss was the slowest in both spaces

and neither achieved the optimal accuracy. At the pruning stage

by using the conventional 2-norm saliency metric

the proposed gray space lightweight model (IoU of 0.459) reduced the number of floating-point operations per second (FLOPs) and parameters to 3.05% and 0.03% of the full model in RGB space

respectively (IoU of 0.573). When the proposed joint saliency metric was used

the numbers of FLOPs and parameters further decreased to 0.947% and 0.015% of the complete model

respectively

while maintaining an IoU of 0.42. The experiment also showed that the Otsu threshold method worked stably within the appropriate range of illumination changes

and a negligible difference from the optimal threshold was observed.

Conclusion

Improvements in the dataset and loss function independently enhanced the performance of the baseline model. Sparse regularization and network pruning reduced the network parameters and calculation load

which facilitated the deployment of the model on resource-constrained inferring terminals

such as UAVs. The proposed saliency measure exhibited better compression capabilities than the conventional 2-norm metric

and the adaptive threshold method helped improve the robustness of the model when the luminance changed.

关键词

智能巡检图像语义分割稀疏正则化网络剪枝生成对抗网络(GAN)

Keywords

smart inspectionimage semantic segmentationsparse regularizationnetwork pruninggenerated adversarial network (GAN)

references

Arjovsky M and Bottou L. 2017. Towards principled methods for training generative adversarial networks//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: [s. n.]

Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein GAN[EB/OL]. [2020-10-27].https://arxiv.org/pdf/1701.07875.pdfhttps://arxiv.org/pdf/1701.07875.pdf

Baker L, Mills S, Langlotz T and Rathbone C. 2016. Power line detection using Hough transform and line tracing techniques//Proceedings of 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ). Palmerston North, New Iealand: IEEE: 1-6[DOI: 10.1109/IVCNZ.2016.7804438http://dx.doi.org/10.1109/IVCNZ.2016.7804438]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. Montreal, Canada: MIT Press: 2672-2680[DOI: 10.5555/2969033.2969125http://dx.doi.org/10.5555/2969033.2969125]

Han S, Mao H and Dally W J. 2016. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding//Proceedings of the 4th International Conference on Learning Representations Conference Track Proceedings. San Juan, Puerto Rico: [s. n.]

Karras T, Aila T, Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality, stability, and variation//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: [s. n.]

Le Cun Y, Denker J S and Solla S A. 1989. Optimal brain damage//Proceedings of the 2nd International Conference on Neural Information Processing Systems. Denver, USA: MIT Press: 598-605[DOI: 10.5555/2969830.2969903http://dx.doi.org/10.5555/2969830.2969903]

Li B L, Wu B W, Su J and Wang G R. 2020. Eagleeye: fast sub-net evaluation for efficient neural network pruning//Proceedings of 2020 European Conference on Computer Vision(ECCV). Glasgow, UK: Springer: 639-654[DOI: 10.1007/978-3-030-58536-5_38http://dx.doi.org/10.1007/978-3-030-58536-5_38]

Liu J W, Li Y X, Gong Z, Liu X G and Zhou Y J. 2020. Power line recognition method via fully convolutional network. Journal of Image and Graphics, 25(5): 956-966

刘嘉玮, 李元祥, 龚政, 刘心刚, 周拥军. 2020. 全卷积网络电线识别方法. 中国图象图形学报, 25(5): 956-966 [DOI: 10.11834/jig.190316]

Liu Z, Sun M J, Zhou T H, Huang G and Darrell T. 2019. Rethinking the value of network pruning//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: [s. n.]

Madaan R, Maturana D and Scherer S. 2017. Wire detection using synthetic data and dilated convolutional networks for unmanned aerial vehicles//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, Canada: IEEE: 3487-3494[DOI: 10.1109/IROS.2017.8206190http://dx.doi.org/10.1109/IROS.2017.8206190]

Molchanov P, Tyree S, Karras T, Aila T and Kautz J. 2017. Pruning convolutional neural networks for resource efficient inference//Proceedings of the 5th International Conference on Learning Representations. Toulon, Franc: [s. n.]

Molchanov P, Mallya A, Tyree S, Frosio I and Kautz J. 2019. Importance estimation for neural network pruning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 11256-11264[DOI: 10.1109/CVPR.2019.01152http://dx.doi.org/10.1109/CVPR.2019.01152.]

Paszke A, Chaurasia A, Kim S and Culurciello E. 2017. ENET: a deep neural network architecture for real-time semantic segmentation[EB/OL]. [2021-01-06].https://arxiv.org/pdf/1606.02147.pdfhttps://arxiv.org/pdf/1606.02147.pdf

Ronneberger O, Fischer P and Brox T. 2015. U-Net: Convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Song B Q and Li X L. 2014. Power line detection from optical images. Neurocomputing, 129: 350-361[DOI: 10.1016/j.neucom.2013.09.023]

Wang X W. 2019. Research on Semantic Segmentation of Power Line Based on Image. Hangzhou: Zhejiang University

王栩文. 2019. 基于图像的输电线路语义分割技术研究. 杭州: 浙江大学

YetginÖE and GerekÖN. 2019a. Powerline Image Dataset (Infrared-IR and Visible Light-VL)[DB/OL]. [2020-10-18]. https://data.mendeley.com/datasets/n6wrv4ry6v/8[DOI: 10.17632/n6wrv4ry6v.8http://dx.doi.org/10.17632/n6wrv4ry6v.8]

YetginÖE and GerekÖN. 2019b. Ground Truth of Powerline Dataset (Infrared-IR and Visible Light-VL)[DB/OL]. [2020-10-18]. https://data.mendeley.com/datasets/twxp8xccsw/9[DOI: 10.17632/twxp8xccsw.9http://dx.doi.org/10.17632/twxp8xccsw.9]

Yu C Q, Wang J B, Peng C, Gao C X, Yu G and Sang N. 2018. BiSeNet: bilateral segmentation network for real-time semantic segmentation//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 334-349[DOI: 10.1007/978-3-030-01261-8_20http://dx.doi.org/10.1007/978-3-030-01261-8_20]

Zhang H, Yang W, Yu H, Zhang H J and Xia G S. 2019. Detecting power lines in UAV images with convolutional features and structured constraints. Remote Sens, 11(11): #1342[DOI: 10.3390/RS11111342]

Zhang J J, Liu L, Wang B H, Chen X G, Wang Q and Zheng T R. 2012. High speed automatic power line detection and tracking for a UAV-based inspection//Proceedings of 2012 International Conference on Industrial Control and Electronics Engineering. Xi'an, China: IEEE: 266-269[DOI: 10.1109/ICICEE.2012.77http://dx.doi.org/10.1109/ICICEE.2012.77]

Zhao H S, Qi X J, Shen X Y, Shi J P and Jia J Y. 2018. ICNet for real-time semantic segmentation on high-resolution images//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 418-434[DOI: 10.1007/978-3-030-01219-9_25http://dx.doi.org/10.1007/978-3-030-01219-9_25]

Zhao L, Wang X P, Yao H T and Tian M. 2021. Survey of power line extraction methods based on visible light aerial image. Power System Technology, 45(4): 1536-1546

赵乐, 王先培, 姚鸿泰, 田猛. 2021. 基于可见光航拍图像的电力线提取算法综述. 电网技术, 45(4): 1536-1546 [DOI:10.13335/j.1000-3673.pst.2020.0300a]

文章被引用时，请邮件提醒。

提交

互注意力机制驱动的轻量级图像语义分割网络

引入概率分布的深度神经网络贪婪剪枝

深层聚合残差密集网络的超声图像左心室分割