Current Issue Cover
全卷积网络的电线识别方法

刘嘉玮,李元祥,龚政,刘心刚,周拥军(上海交通大学航空航天学院;中国航空工业集团公司雷华电子技术研究所)

摘 要
目的 电线预警对于直升机和无人飞行器的低空飞行安全至关重要,利用可见光和红外图像识别电线是一个有效的途径。传统识别方法需要人工设计的滤波器提取出电线的局部特征,再使用Hough变换等方法找出直线;而支持向量机、随机森林等机器学习方法仅给出图像中有无电线的识别结果。本文提出一种基于全卷积网络的电线识别方法,能在自动学习特征提取器的同时得到电线的具体位置等信息。方法 首先利用复杂背景生成大量包含电线图像和像素标签的成对仿真数据;然后改进U-net网络结构以适应电线识别任务,使用仿真数据进行网络训练。由于图像中电线所占的像素很少,因此采用聚焦损失函数以平衡大量负样本的影响。结果 在一个同时包含红外图像和可见光图像各4000张的电力巡线数据集上,与VGG16、多种特征的随机森林方法相比,本文方法的电线识别率达到了99%以上,而虚警率仅不到2%;同时,本文方法输出的像素分割结果中,电线基本都能被识别出来。结论 本文提出的全卷积网络电线识别方法能够提取到电线的光学图像特征,而且与传统机器学习方法相比能将电线从场景中精确提取出来,使得识别结果更加有判断的依据。
关键词
Power line recognition method via fully convolutional network

Liu Jia Wei,Li Yuan Xiang,Gong Zheng,Liu Xin Gang,Zhou Yongjun(School of Aeronautics and Astronautics, Shanghai Jiao Tong University;Avic Leihua Electronic Technology Research Institute)

Abstract
Objective There are tens of accidents per year for helicopters due to collisions with trees, wires, poles and man-made buildings at low altitude. Warning and avoiding wires are very important for the low altitude flight safety of helicopters and unmanned aerial vehicles. According to relevant studies, it is an effective way to identify wires by using optical images. Some traditional methods use manual filters to extract features of power lines, then detect lines by Hough transform. Machine learning methods like VGG16 and random forest can only get a classification result for a picture, which is hard confirm the correct. The full connection layer of the traditional convolutional neural network is very good at classification tasks, but it is unable to do pixel segmentation tasks due to the loss of location information. In contrast, the fully convolutional network has no full connection layer, which miss location information. One kind of fully convolutional network, U-net, is proposed to solve problems such as cells segmentation and retina segmentation. It works well on the condition of a small amount of samples and a small slice. A 3-channel image is input into the network, through the encoder and decoder, finally becomes a one-channel feature map via 1×1 kernel size convolution. In order to get final value between 0 and 1, Sigmoid activation function is used before every convolution layer. In this paper, a convolutional neural network recognition method based on U-net is proposed to detect power lines. Method First of all, we got a power line data set containing 8,000 images with 4,000 pairs of visible images and infrared images. The image size is 128×128, and each image has pixel ground truth label. According to the network receptive field calculation formula, the depth of our network is determined. Then, some adjustments were made on this basis network for choosing the best model. The basis network is name U-net-0 model. The U-net-1 model removes the lower pooling layer in the U-net-0 model, changes the step size of the convolution layer before the lower pooling layer to 2, and also removes the upper pooling layer, and changes the convolution layer after the upper pooling layer to the inverse convolution layer with step size of 2. Compared with U-net-0, U-net-2 model eliminates the upper and lower pooling layer and the convolution layer in the middle, reducing the network depth. In U-net-3 model, decoding is expected to be a dimensionality reduction process. Therefore, the number of convolution kernel of decoding part is limited, so that the number of parameters of feature graph output of each layer is not larger than that of the previous layer. In addition, pictures with complex backgrounds are used to generate a large number of paired synthetic data including power line images with and pixel labels, and then the generated synthetic data is used for network training. For each image, the power line contains a small number of pixels, so Focal Loss is used to balance the impact of a large number of negative samples. 4 models have same optimizer named ‘Adam’, which can automatically adjust the learning rate on the basis of SGD. The training procedure of each model is accelerated using a NVIDIA GTX 1080 TI device, which takes approximately 18 hours in 6,000 iterations with a batch size of 64. Loss, F-1 score and IoU are 3 evaluative criteria for trained models, because better model usually has lower loss, higher F-1 score and IoU. Each model is used on both visible images and infrared images, and combines the two results to make a judgment. No matter which result of the same pair includes the power line, the mixed result is finally considered identified. Result After testing 4 models on the data set, the number of correctly identified pixels and IoU on each image is counted. According to the statistical results, the IoU of most image recognition results exceeds 0.2, and the threshold of 30 pixels as the result classification is relatively good. If there are over 30 pixels identified on a image, the image might include power line. By this standard, the proposed method achieves a recognition rate over 99%, and the false alarms were less than 2%. Meanwhile, VGG16, which is trained on 3,800 pairs of images and tested on 200 pairs of images, only get recognition rate 95% and false alarm rate 37%. Random Forest (RF) is affected by feature extraction methods, so the recognition rate and false alarm rate fluctuate greatly. For example, RF with Local Binary Patterns (LBP) has recognition rate 63.5% and false alarm rate 36.3% on infrared images, and RF with Discrete Cosine Transform (DCT) gets recognition rate 92.95% and false alarm rate 13.95% on infrared images. Furthermore, although U-net-3 has more learnable parameters than U-net-2, its performance is much worse. Conclusion Our model has higher recognition rate and lower false alarm rate than other traditional methods on the same data set, the results show that it is more effective than others and can even extract power lines from background clearly. And all our models are trained on synthetic data and tested on the real data, which means better generalization performance. The comparison of the four models also shows that the number of parameters cannot completely determine the performance of the network, and the reasonable structure is very important. Unfortunately, our current models have a small receptive field and cannot be used for power line recognition of high resolution image. In the future, it will be studied to increase the receptive field of the model to adapt to larger images without greatly increasing the number of parameters.
Keywords
QQ在线


订阅号|日报