Current Issue Cover
全卷积网络电线识别方法

刘嘉玮1, 李元祥1, 龚政1, 刘心刚2, 周拥军1(1.上海交通大学航空航天学院, 上海 200240;2.中国航空工业集团公司雷华电子技术研究所, 无锡 214063)

摘 要
目的 电线预警对于直升机和无人飞行器的低空飞行安全至关重要,利用可见光和红外图像识别电线是一个有效途径。传统识别方法需要人工设计的滤波器提取电线的局部特征,再使用Hough变换等方法找出直线,支持向量机和随机森林等机器学习方法仅给出图像中有无电线的识别结果。本文提出一种基于全卷积网络的电线识别方法,能在自动学习特征提取器的同时得到电线的具体位置等信息。方法 首先利用复杂背景生成大量包含电线图像和像素标签的成对仿真数据;然后改进U-Net网络结构以适应电线识别任务,使用仿真数据进行网络训练。由于图像中电线所占的像素很少,因此采用聚焦损失函数以平衡大量负样本的影响。结果 在一个同时包含红外图像和可见光图像各4 000幅的电力巡线数据集上,与VGG(visual geometry group)16等多种特征的随机森林方法相比,本文方法的电线识别率达到了99%以上,而虚警率不到2%;同时,本文方法输出的像素分割结果中,电线基本都能被识别出来。结论 本文提出的全卷积网络电线识别方法能够提取电线的光学图像特征,而且与传统机器学习方法相比能将电线从场景中精确提取出来,使得识别结果更加有判断的依据。
关键词
Power line recognition method via fully convolutional network

Liu Jiawei1, Li Yuanxiang1, Gong Zheng1, Liu Xingang2, Zhou Yongjun1(1.School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China;2.Leihua Electronic Technology Research Institute, Aviation Industry Corporation of China, Ltd. (AVIC), Wuxi 214063, China)

Abstract
Objective Tens of accidents involving helicopters occur every year owing to collisions with trees, wires, poles, and man-made buildings at low altitude. Just in 2014—2016, there were 96 crashes caused by hitting power lines around the world. Thus, warnings and avoiding wires are important for the low-altitude flight safety of helicopters and unmanned aerial vehicles. According to relevant studies, utilization of optical images is an effective way to identify wires. Traditional methods use manual filters to extract features of power lines and then use Hough transform to detect the lines. Machine learning methods, such as VGG (visual geometry group) 16 and random forest (RF), can only obtain a classification result for a picture, which makes confirming accuracy difficult. The full connection layer of the traditional convolutional neural network (CNN) is effective at classification tasks. However, it cannot carry out pixel segmentation tasks because of the loss of location information. By contrast, the fully convolutional network has no full connection layer, which misses location information. One kind of fully convolutional network, U-Net, is proposed to solve problems such as cell segmentation and retina segmentation. U-Net works well under the conditions of a small amount of samples and a small slice. A three-channel image is input into the network. Through the encoder and decoder, it finally becomes a one-channel feature map via 1×1 kernel size convolution. To obtain the final value between 0 and 1, Sigmoid activation function is used before every convolution layer. In this study, a CNN recognition method based on U-net is proposed to detect power lines. Method First, we obtain a power line data set containing 8 000 images with 4 000 pairs of visible and infrared images. The image size is 128×128 pixels, with each image having a pixel ground truth label. The network receptive field calculation formula is used to determine the depth of our network. Next, adjustments are made on this basis network to choose the best model. The basis network is named the U-Net-0 model. The U-Net-1 model removes the lower pooling layer in the U-Net-0 model and changes the step size of the convolution layer before the lower pooling layer to 2. It also removes the upper pooling layer and changes the convolution layer after the upper pooling layer to the inverse convolution layer with a step size of 2. Compared with U-Net-0, the U-Net-2 model eliminates the upper and lower pooling layers and the convolution layer in the middle, thereby reducing the network depth. In the U-Net-3 model, decoding is expected to be a dimensionality reduction process. Therefore, the number of convolution kernels of the decoding part is limited, and the number of parameters of feature graph output of each layer is not larger than that of the previous layer. Pictures with complex backgrounds are likewise used to generate a large number of paired synthetic data, including power line images with pixel labels. The generated synthetic data are then used for network training. For each image, the power line contains a small number of pixels. Thus, focal loss is used to balance the impact of a large number of negative samples. The four models use the same optimizer named “Adam”, which can automatically adjust the learning rate on the basis of SGD (stochastic gradient descent). The training procedure of each model is accelerated using an NVIDIA GTX 1080 TI device, which takes approximately 18 hours in 6 000 iterations with a batch size of 64. Loss, F1 score, and intersection-over-union (IoU) are the three evaluative criteria for trained models. The best model usually has low loss and high F1 score and IoU. Each model is used on visible and infrared images. The two results are combined to make a judgment. The power line, regardless of which of the same pair includes it, is finally considered detected in the mixed result. Result After these four models are tested on the data set, the number of correctly identified pixels and IoU on each image is counted. According to the statistical results, the IoU of most image recognition results exceeds 0.2, and the threshold of 30 pixels as the result classification is relatively good. If more than 30 pixels are identified on an image, this image might include a power line. By this standard, the proposed method achieves a recognition rate over 99%, while the false alarms are less than 2%. Moreover, VGG16, which is trained on 3 800 pairs of images and tested on 200 pairs of images, only obtains a recognition rate of 95% and a false alarm rate of 37%. RF is affected by feature extraction methods. Thus, the recognition rate and false alarm rate fluctuate greatly. For example, RF with local binary patterns has a recognition rate of 63.5% and a false alarm rate of 36.3% on infrared images. In addition, RF with discrete cosine transform obtains a recognition rate of 92.95% and a false alarm rate of 13.95% on infrared images. Although U-Net-3 has more learnable parameters than U-Net-2, its performance is substantially worse. Conclusion Our models have higher recognition rates and lower false alarm rates than do other traditional methods on the same dataset. Results show that our models are more effective than other methods and can even clearly extract power lines from background. Our models are trained on synthetic data and tested on real data, which means better generalization performance. The comparison of the four models also shows that the number of parameters cannot completely determine the performance of the network and that the reasonable structure is important. However, our current models have a small receptive field and cannot be used for power line recognition in high-resolution images. In the future, the models will be further studied to increase their receptive field for adapting to larger images without greatly increasing the number of parameters.
Keywords

订阅号|日报