引入辅助损失的多场景车道线检测

陈立潮; 徐秀芝; 曹建芳; 潘理虎

发布时间： 2020-09-15
摘要点击次数： 2319
全文下载次数： 416
DOI: 10.11834/jig.190646
2020 | Volume 25 | Number 9

引入辅助损失的多场景车道线检测

陈立潮¹, 徐秀芝¹, 曹建芳^1,2, 潘理虎¹(1.太原科技大学计算机科学与技术学院, 太原 030024;2.忻州师范学院计算机系, 忻州 034099)

摘要

目的为解决实时车辆驾驶中因物体遮挡、光照变化和阴影干扰等多场景环境影响造成的车道线检测实时性和准确性不佳的问题，提出一种引入辅助损失的车道线检测模型。方法该模型改进了有效的残差分解网络（effcient residual factorized network，ERFNet），在ERFNet的编码器之后加入车道预测分支和辅助训练分支，使得解码阶段与车道预测分支、辅助训练分支并列，并且在辅助训练分支的卷积层之后，利用双线性插值来匹配输入图像的分辨率，从而对4条车道线和图像背景进行分类。通过计算辅助损失，将辅助损失以一定的权重协同语义分割损失、车道预测损失进行反向传播，较好地解决了梯度消失问题。语义分割得到每条车道线的概率分布图，分别在每条车道线的概率分布图上按行找出概率大于特定阈值的最大点的坐标，并按一定规则选取相应的坐标点，形成拟合的车道线。结果经过在CULane公共数据集上实验测试，模型在正常场景的F1指标为91.85%，与空间卷积神经网络（spatial convolutional neural network，SCNN）模型相比，提高了1.25%，比其他场景分别提高了1%~7%；9种场景的F1平均值为73.76%，比目前最好的残差网络——101-自注意力蒸馏（ResNet-101-self attention distillation，R-101-SAD）模型（71.80%）高出1.96%。在单个GPU上测试，每幅图像的平均运行时间缩短至原来的1/13，模型的参数量减少至原来的1/10。与平均运行时间最短的车道线检测模型ENet——自注意力蒸馏（ENet-self attention distillation，ENet-SAD）相比，单幅图像的平均运行时间减短了2.3 ms。结论在物体遮挡、光照变化、阴影干扰等多种复杂场景下，对于实时驾驶车辆而言，本文模型具有准确性高和实时性好等特点。

关键词

多场景车道线检测语义分割网络辅助损失梯度消失 CULane数据集

Multi-scenario lane line detection with auxiliary loss

Chen Lichao¹, Xu Xiuzhi¹, Cao Jianfang^1,2, Pan Lihu¹(1.School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China;2.Department of Computer, Xinzhou Teachers University, Xinzhou 034099, China)

Abstract

Objective In a real-time driving process, the vehicle must be positioned to complete the basic tasks of horizontal and vertical control. The premise of vehicle positioning problems is to understand road information. Road information includes all kinds of traffic signs, among which lane line is an important pavement information in the road scene. Such information is crucial for lane maintenancing, departure warning, and path planning; it is also important in research on advanced driving assistance systems. Therefore, lane line detection has become an important topic in real-time vehicle driving. Road scene images can be obtained using a vehicle camera, lidar, and other equipments, thus making it easy to obtain lane line images. However, lane line detection suffers from some difficulties. Traditional lane line detection methods usually design features manually. Starting from the bottom features, such as color, brightness, shape, and gray level, the method involves image processing via denoising, binarization, and graying. Then, the lane line features are extracted by combining edge detection, Hough transform, color threshold setting, perspective transform, and other methods. Afterward, the lane lines are fitted by straight or curve line models. These methods are simple and easy to implement, but the accuracy of lane line detection is poor under the influence of multiscene environment conditions, such as object occlusion, light change, and shadow interference; moreover, the manual design of features is time consuming and thus cannot meet the real-time requirements of vehicle driving. To solve these problems, this study proposes a lane detection model named efficient residual factorized network-auxiliary loss(ERFNet-AL), which embeds an auxiliary loss. Method The model improves the semantic segmentation network of ERFNet. After the encoder of ERFNet, a lane prediction branch and an auxiliary training branch are added to make the decoding phase parallel with the lane prediction and auxiliary training branches. After the convolution layer of the auxiliary training branch, bilinear interpolation is used to match the resolution of input images to classify four lane lines and the background of images. The training set images in the dataset are sent to the lane line detection model after preprocessing, such as clipping, rotating, scaling, and normalization. The features are extracted through the semantic segmentation network ERFNet, thereby obtaining the probability distribution of each lane line. The auxiliary training branch uses a convolution operation to extract features, and bilinear interpolation is used to replace the deconvolution layer after the convolution layer to match the resolution of the input images and classify the four lane lines and background. After using convolution, batch normalization, dropout layers, and other operations, the lane line prediction branch predicts the existence of lane lines or virtual lane lines and outputs the probability value of lane line classification. An output probability value greater than 0.5 indicates the existence of lane lines. If at least one lane line exists, then a probability distribution map of the corresponding lane lines must be determined. On the probability distribution map of each lane line, the coordinates of the largest point with a probability greater than a specific threshold is identified by row, and the corresponding coordinate points is selected in accordance with the rules of selecting points in the SCNN (spatial convolutional neural network) model. If the number of points found are greater than 2, then these points are connected to form a fitted lane line. Then, the cross-entropy loss between the predicted value of the auxiliary training branch output and the real label is calculated and used as the auxiliary loss. The weight of all four lane lines is 1, and the weight of the background is 0.4. The auxiliary, semantic segmentation, and lane prediction losses are weighted and summed in accordance with a certain weight, and the network parameters are adjusted via backpropagation. Among them, the total loss includes the main, auxiliary, and lane prediction losses. During training, the weights of ERFNet on the Cityscapes dataset are used as pretraining weights. During training, the model with the largest mean intersection over union is taken as the best model. Result After testing in nine scenarios of the CULane public dataset, the F1 index of the model in the normal scenario is found to be 91.85%, which is a 1.25% increase compared with that of the SCNN model (90.6%). Moreover, the F1 index in seven scenes, including crowded, night, no line, shadow, arrow, dazzle light, and curve scenarios, is increased by 1%~7%; the total average F1 value in nine scenarios is 73.76%, which is 1.96% higher than the best ResNet-101-self-attention distillation (SAD) model; the average run time of each image is 11.1 ms, which is 11 times shorter than the average running time of the spatial CNN model when tested on a single GPU of GeForce GTX 1 080; the parameter quantity of the model is only 2.49 MB, which is 7.3 times less than that of the SCNN model. On the CULane dataset, ENet with SAD is the lane line detection model with the shortest average run time of a single-image test. The average run time of this model is 13.4 ms, whereas that of our model is 11.1 ms. Compared with ENet with SAD, the average running time is reduced by 2.3 ms. When detecting lane lines at a crossroad scenario, the number of false positive is large, which may be due to the large number of lane lines at crossroads, whereas only four lane lines are detected in our experiment. Conclusion In various complex scenarios, such as object occlusion, lighting changes, and shadow interference, the model is minimally affected by the environment for real-time driving vehicles, and its accuracy and real-time performance are improved. The next work will aim to increase the number of lane lines, optimize the model, and improve the model's detection performance at crossroads.

Keywords

multi-scenario lane line detection semantic segmentation network auxiliary loss gradient disappear CULane dataset

在线采编平台

在线出版

年度会议

下载中心

年度信息