引入辅助损失的多场景车道线检测

陈立潮; 徐秀芝; 曹建芳; 潘理虎

doi:10.11834/jig.190646

图像理解和计算机视觉 | 浏览量 : 0 下载量: 22 CSCD: 1

PDF
导出
分享
收藏
专辑

引入辅助损失的多场景车道线检测
Multi-scenario lane line detection with auxiliary loss
2020年25卷第9期页码：1882-1893
收稿：2020-01-03，

修回：2020-3-9，

录用：2020-3-16，

纸质出版：2020-09-16
DOI： 10.11834/jig.190646
稿件说明：

移动端阅览

陈立潮, 徐秀芝, 曹建芳, 潘理虎. 引入辅助损失的多场景车道线检测[J]. 中国图象图形学报, 2020,25(9):1882-1893. DOI： 10.11834/jig.190646.

Lichao Chen, Xiuzhi Xu, Jianfang Cao, Lihu Pan. Multi-scenario lane line detection with auxiliary loss[J]. Journal of Image and Graphics, 2020, 25(9): 1882-1893. DOI： 10.11834/jig.190646.

摘要

目的

为解决实时车辆驾驶中因物体遮挡、光照变化和阴影干扰等多场景环境影响造成的车道线检测实时性和准确性不佳的问题，提出一种引入辅助损失的车道线检测模型。

方法

该模型改进了有效的残差分解网络（effcient residual factorized network，ERFNet），在ERFNet的编码器之后加入车道预测分支和辅助训练分支，使得解码阶段与车道预测分支、辅助训练分支并列，并且在辅助训练分支的卷积层之后，利用双线性插值来匹配输入图像的分辨率，从而对4条车道线和图像背景进行分类。通过计算辅助损失，将辅助损失以一定的权重协同语义分割损失、车道预测损失进行反向传播，较好地解决了梯度消失问题。语义分割得到每条车道线的概率分布图，分别在每条车道线的概率分布图上按行找出概率大于特定阈值的最大点的坐标，并按一定规则选取相应的坐标点，形成拟合的车道线。

结果

经过在CULane公共数据集上实验测试，模型在正常场景的F1指标为91.85%，与空间卷积神经网络（spatial convolutional neural network，SCNN）模型相比，提高了1.25%，比其他场景分别提高了1%~7%；9种场景的F1平均值为73.76%，比目前最好的残差网络——101-自注意力蒸馏（ResNet-101-self attention distillation，R-101-SAD）模型（71.80%）高出1.96%。在单个GPU上测试，每幅图像的平均运行时间缩短至原来的1/13，模型的参数量减少至原来的1/10。与平均运行时间最短的车道线检测模型ENet——自注意力蒸馏（ENet-self attention distillation，ENet-SAD）相比，单幅图像的平均运行时间减短了2.3 ms。

结论

在物体遮挡、光照变化、阴影干扰等多种复杂场景下，对于实时驾驶车辆而言，本文模型具有准确性高和实时性好等特点。

Abstract

Objective

In a real-time driving process

the vehicle must be positioned to complete the basic tasks of horizontal and vertical control. The premise of vehicle positioning problems is to understand road information. Road information includes all kinds of traffic signs

among which lane line is an important pavement information in the road scene. Such information is crucial for lane maintenancing

departure warning

and path planning; it is also important in research on advanced driving assistance systems. Therefore

lane line detection has become an important topic in real-time vehicle driving. Road scene images can be obtained using a vehicle camera

lidar

and other equipments

thus making it easy to obtain lane line images. However

lane line detection suffers from some difficulties. Traditional lane line detection methods usually design features manually. Starting from the bottom features

such as color

brightness

shape

and gray level

the method involves image processing via denoising

binarization

and graying. Then

the lane line features are extracted by combining edge detection

Hough transform

color threshold setting

perspective transform

and other methods. Afterward

the lane lines are fitted by straight or curve line models. These methods are simple and easy to implement

but the accuracy of lane line detection is poor under the influence of multiscene environment conditions

such as object occlusion

light change

and shadow interference; moreover

the manual design of features is time consuming and thus cannot meet the real-time requirements of vehicle driving. To solve these problems

this study proposes a lane detection model named efficient residual factorized network-auxiliary loss(ERFNet-AL)

which embeds an auxiliary loss.

Method

The model improves the semantic segmentation network of ERFNet. After the encoder of ERFNet

a lane prediction branch and an auxiliary training branch are added to make the decoding phase parallel with the lane prediction and auxiliary training branches. After the convolution layer of the auxiliary training branch

bilinear interpolation is used to match the resolution of input images to classify four lane lines and the background of images. The training set images in the dataset are sent to the lane line detection model after preprocessing

such as clipping

rotating

scaling

and normalization. The features are extracted through the semantic segmentation network ERFNet

thereby obtaining the probability distribution of each lane line. The auxiliary training branch uses a convolution operation to extract features

and bilinear interpolation is used to replace the deconvolution layer after the convolution layer to match the resolution of the input images and classify the four lane lines and background. After using convolution

batch normalization

dropout layers

and other operations

the lane line prediction branch predicts the existence of lane lines or virtual lane lines and outputs the probability value of lane line classification. An output probability value greater than 0.5 indicates the existence of lane lines. If at least one lane line exists

then a probability distribution map of the corresponding lane lines must be determined. On the probability distribution map of each lane line

the coordinates of the largest point with a probability greater than a specific threshold is identified by row

and the corresponding coordinate points is selected in accordance with the rules of selecting points in the SCNN (spatial convolutional neural network) model. If the number of points found are greater than 2

then these points are connected to form a fitted lane line. Then

the cross-entropy loss between the predicted value of the auxiliary training branch output and the real label is calculated and used as the auxiliary loss. The weight of all four lane lines is 1

and the weight of the background is 0.4. The auxiliary

semantic segmentation

and lane prediction losses are weighted and summed in accordance with a certain weight

and the network parameters are adjusted via backpropagation. Among them

the total loss includes the main

auxiliary

and lane prediction losses. During training

the weights of ERFNet on the Cityscapes dataset are used as pretraining weights. During training

the model with the largest mean intersection over union is taken as the best model.

Result

After testing in nine scenarios of the CULane public dataset

the F1 index of the model in the normal scenario is found to be 91.85%

which is a 1.25% increase compared with that of the SCNN model (90.6%). Moreover

the F1 index in seven scenes

including crowded

night

no line

shadow

arrow

dazzle light

and curve scenarios

is increased by 1%~7%; the total average F1 value in nine scenarios is 73.76%

which is 1.96% higher than the best ResNet-101-self-attention distillation (SAD) model; the average run time of each image is 11.1 ms

which is 11 times shorter than the average running time of the spatial CNN model when tested on a single GPU of GeForce GTX 1 080; the parameter quantity of the model is only 2.49 MB

which is 7.3 times less than that of the SCNN model. On the CULane dataset

ENet with SAD is the lane line detection model with the shortest average run time of a single-image test. The average run time of this model is 13.4 ms

whereas that of our model is 11.1 ms. Compared with ENet with SAD

the average running time is reduced by 2.3 ms. When detecting lane lines at a crossroad scenario

the number of false positive is large

which may be due to the large number of lane lines at crossroads

whereas only four lane lines are detected in our experiment.

Conclusion

In various complex scenarios

such as object occlusion

lighting changes

and shadow interference

the model is minimally affected by the environment for real-time driving vehicles

and its accuracy and real-time performance are improved. The next work will aim to increase the number of lane lines

optimize the model

and improve the model's detection performance at crossroads.

关键词

Keywords

references

Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet:a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481-2495[DOI:10.1109/tpami.2016.2644615]

Chen C Y. 2018. Research of Highway Rainy Weather Detection Based on Deep Semantic Segmentation. Chengdu: Southwest Jiaotong University

陈昌宇. 2018.基于深度分割网络的高速公路监控视频雨天检测算法研究.成都: 西南交通大学

Chen H S, Yao M H, Chen Z H and Yang Z. 2018. Efficient method of lane detection based on multi-frame blending and windows searching. Computer Science, 45(10):255-260

陈涵深, 姚明海, 陈志浩, 杨圳. 2018.基于多帧叠加和窗口搜索的快速车道检测.计算机科学, 45(10):255-260

Chen Z and Chen Z J. 2017. RBNET: a deep neural network for unified road and road boundary detection//Proceedings of the 24th International Conference on Neural Information Processing. Guangzhou: Springer: 677-687[ DOI:10.1007/978-3-319-70087-8_70 http://dx.doi.org/10.1007/978-3-319-70087-8_70 ]

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 3213-3223[ DOI:10.1109/cvpr.2016.350 http://dx.doi.org/10.1109/cvpr.2016.350 ]

Csurka G and Perronnin F. 2011. An efficient approach to semantic segmentation. International Journal of Computer Vision, 95(2):198-212[DOI:10.1007/s11263-010-0344-8]

Gao F, Mei K C, Gao Y, Lu S F and Xiao G. 2016. Algorithm of intersection background extraction and driveway calibration. Journal of Image and Graphics, 21(6):734-744

高飞, 梅凯城, 高炎, 卢书芳, 肖刚. 2016.城市交叉路口背景提取与车道标定算法.中国图象图形学报, 21(6):734-744 [DOI:10.11834/jig.20160606]

Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V and Garcia-Rodriguez J. 2017. A review on deep learning techniques applied to semantic segmentation[EB/OL].[2020-01-03] . https://arxiv.org/pdf/1704.06857.pdf https://arxiv.org/pdf/1704.06857.pdf

Ghafoorian M, Nugteren C, Baka N Booij O and Hofmann M. 2018. EL-GAN: embedding loss driven generative adversarial networks for lane detection//Proceedings of 2018 European Conference on Computer Vision. Munich: Springer: 256-272[ DOI:10.1007/978-3-030-11009-3_15 http://dx.doi.org/10.1007/978-3-030-11009-3_15 ]

Hou Y N, Ma Z, Liu C X and Loy C C. 2019. Learning lightweight lane detection CNNs by self attention distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE: 1013-1021[ DOI:10.1109/ICCV.2019.00110 http://dx.doi.org/10.1109/ICCV.2019.00110 ]

Jiang L B and Tai Q L. 2019. The lane line detection in complex scene based on instance segmentation. Machine Design and Manufacturing Engineering, 48(5):113-118

姜立标, 台啟龙. 2019.基于实例分割方法的复杂场景下车道线检测.机械设计与制造工程, 48(5):113-118 [DOI:10.3969/j.issn.2095-509X.2019.05.027]

Ling S Y, Ma Y, Huang C R and Zhai W L. 2017. Research on improved urban environment road detection algorithm based on Hough transform. Machine Design and Manufacturing Engineering, 46(12):71-75

凌诗韵, 马乐, 黄楚然, 翟伟良. 2017.基于Hough变换的城市环境道路识别优化算法研究.机械设计与制造工程, 46(12):71-75 [DOI:10.3969/j.issn.2095-509X.2017.12.017]

Pan X G, Shi J P, Luo P, Wang X G and Tang X O. 2018. Spatial as deep: spatial CNN for traffic scene understanding[EB/OL].[2020-01-03] . https://arxiv.org/pdf/1712.06080.pdf https://arxiv.org/pdf/1712.06080.pdf

Romera E,álvarez J M, Bergasa L M and Roberto A. 2017. Efficient ConvNet for real-time semantic segmentation[EB/OL].[2020-01-03] . https://arxiv.org/pdf/1712.06080.pdf https://arxiv.org/pdf/1712.06080.pdf

Romera E, Álvarez J M, Bergasa L M and Roberto A. 2018. ERFNet:efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1):263-272[DOI:10.1109/TITS.2017.2750080]

Shi Y T. 2018. Research on semantic segmentation of road scene based on deep neural networks. Chengdu: Southwest Jiaotong University

石永涛. 2018.基于深度神经网络的道路场景语义分割研究.成都: 西南交通大学

Tian X, Wang L and Ding Q. 2019. Review of image semantic segmentation based on deep learning. Journal of Software, 30(2):440-468

田萱, 王亮, 丁琪. 2019.基于深度学习的图像语义分割方法综述.软件学报, 30(2):440-468 [DOI:10.13328/j.cnki.jos.005659]

Trinh T H, Dai A M, Luong M T and Le Q V. 2018. Learning longer-term dependencies in RNNs with auxiliary losses[EB/OL].[2020-01-03] . https://arxiv.org/pdf/1803.00144.pdf https://arxiv.org/pdf/1803.00144.pdf

Xing Y, Lv C, Chen L, Wang H J, Wang H, Cao D P, Velenis E and Wang F Y. 2018. Advances in vision-based lane detection:algorithms, integration, assessment, and perspectives on ACP-based parallel vision. IEEE/CAA Journal of Automatica Sinica, 5(3):645-661[DOI:10.1109/JAS.2018.7511063]

Zhao W M and Zhang H W. 2017. A research on lane maintenance assist system based on machine vision. Digital Technology and Application, (11):63-64

赵文明, 张海文. 2017.基于机器视觉的车道保持辅助系统研究.数字技术与应用, (11):63-64 [DOI:10.19695/j.cnki.cn12-1369.2017.11.035]

Zhu W, Qu J Y and Wu R B. 2017. Straight convolutional neural networks algorithm based on batch normalization for image classification. Journal of Computer-Aided Design and Computer Graphics, 29(9):1650-1657

朱威, 屈景怡, 吴仁彪. 2017.结合批归一化的直通卷积神经网络图像分类算法.计算机辅助设计与图形学学报, 29(9):1650-1657 [DOI:10.3969/j.issn.1003-9775.2017.09.008]

文章被引用时，请邮件提醒。

提交

暂无数据