伪3D卷积神经网络与注意力机制结合的疲劳驾驶检测
Driving fatigue detection based on pseudo 3D convolutional neural network and attention mechanisms
- 2021年26卷第1期 页码:143-153
收稿:2020-03-20,
修回:2020-5-13,
录用:2020-5-20,
纸质出版:2021-01-16
DOI: 10.11834/jig.200079
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-03-20,
修回:2020-5-13,
录用:2020-5-20,
纸质出版:2021-01-16
移动端阅览
目的
2
复杂环境下的疲劳驾驶检测是一个具有挑战性的技术问题。为了充分利用驾驶员面部特征信息与时间特征,提出一种基于伪3D(Pseudo-3D,P3D)卷积神经网络(convolutional neural network,CNN)与注意力机制的驾驶疲劳检测方法。
方法
2
采用伪3D卷积模块进行时空特征学习;提出P3D-Attention模块,利用P3D的结构融合双通道注意力模块和适应的空间注意力模块,提高对重要通道特征的相关度,增加特征图的全局相关性,将多层深度卷积特征进行融合。利用双通道注意力模块分别在视频帧之间和每一帧的通道上施加关注,去除背景和噪声对识别的干扰,使用自适应空间注意模块使模型训练更快、收敛更好;使用2D全局平均池化层替代3D全局平均池化层获得更具表达能力的特征,进而提高网络收敛速度;运用softmax分类层进行分类。
结果
2
在公共数据集YawDD(a yawning detection dataset)上开展对比实验,本文方法在测试集上的F1-score检测准确率达到99.89%,在打哈欠类别上召回率达到100%;在数据集UTA-RLDD(University of Texas at Arlington real-life drowsiness dataset)上,本文方法在测试集上的F1-score检测准确率达到99.64%,在困倦类别上召回率达到100%;与Inception-V3融合LSTM(long short-term memory)的方法相比,本文方法模型大小为42.5 MB,是其模型大小的1/9,本文方法预测时间约660 ms,是其11%左右。
结论
2
提出一种基于伪3D卷积神经网络与注意力机制的驾驶疲劳检测方法,利用注意力机制进一步分析哈欠、眨眼和头部特征运动,将哈欠行为与说话行为动作很好地区分开来。
Objective
2
Fatigue driving is one of the main causes of traffic accidents. Drivers in fatigue state will have reduced alertness
weakened ability to deal with abnormal events
and inability to react traffic control and dangerous events
which will lead to accidents. The current technology for detecting fatigue driving behavior can be divided into three methods based on physiological parameters
vehicle behavior
and facial feature analysis. Detection methods based on physiological parameters require various sensors. These sensors use physiological signals to detect the driver's drowsiness
but they need contact with the driver's body
rely on expensive equipment
and are invasive. Detection methods based on vehicle behavior use vehicle behavior parameters
such as lane departure detection
steering wheel angle
and yaw angle information
to detect driving fatigue behavior
but they also depend on external factors such as road conditions. Detection methods based on facial feature analysis need to extract feature points from the driver's facial features and compare the driver's performance in fatigue or normal conditions by detecting fatigue behavior characteristics such as eye state
blinking
and yawning. Compared with the two earlier methods
this method has the advantages of noninvasiveness and easy implementation. In several current methods
spatiotemporal features cannot be well integrated
and interference of background and noise on recognition is not removed. This paper proposes a driving fatigue detection method based on pseudo 3D (P3D) convolutional neural network(CNN) and attention mechanism to solve these problems.
Method
2
The dataset is cropped into small videos of around 5 s each. The training video data interval is 90 video frames
and the picture resolution is set to 80×80×3. First
the feature map of each frame is fully extracted through the P3D module to generate a fixed-size feature set. Second
the P3D structure uses a 1×3×3 convolution kernel and a 3×1×1 convolution kernel to simulate 3×3×3 convolution in the spatial and time domains
decoupling 3×3×3 convolutions in time and space. Based on the feature that P3D decouples 3×3×3 convolutions in time and space
a module named P3D-Attention is proposed. The 3D convolutional neural network and attention mechanism are integrated to improve the correlation of important channel features
increase the global correlation of feature graphs
and remove the interference of background and noise on recognition by translating 3D temporal and spatial features into 2D features and embedding them in the dual-channel and spatial attention modules. The dual-channel attention module is used to apply attention on video frames and channels of each frame
which removes the interference of background and noise on recognition. For driving scenarios
this paper selects convolution kernels of different sizes to adapt to convolution features of different depths and uses the adaptive spatial attention module to make the model training converge faster and better. Afterward
2D global average pooling layer is used instead of 3D global average pooling layer to obtain more expressive features
improving network convergence speed. Finally
the softmax classification layer is used for classification.
Result
2
A comparative test is performed on the public dataset——a yawning detection dataset(YawDD). The detection accuracy of the method in this paper reaches 98.75%
and the recall rate of the yawning category reaches 100%.On the University of Texas at Arlington real-life drowsiness dataset(UTA-RLDD)
the F1-score detection accuracy of the method in this paper reaches 99.64% on the test set
and the recall rate reaches 100% in the drowsy category. In terms of running time and model size
experimental results show that compared with the long short-term memory(LSTM) fusion method using ImageNet-trained Inception_v3 model
the algorithm in this paper has evident advantages in terms of running time and predicts that a 5 s video will take 660 ms on average
which is 11% of it. In terms of the size of the unpruned model
the method of Inception-v3 plus LSTM has 396.15 MB
and the model size in this paper is 42.5 MB
which is 1/9 of it.
Conclusion
2
A driving fatigue detection method based on P3D convolutional neural network and attention mechanism is proposed. Attention mechanisms are used to remove background and noise from recognition interference
improve the accuracy of driving fatigue detection
distinguish yawning behavior from mouth opening and mouth closing behaviors such as talking
and analyze yawning behavior
blinking
and head characteristic movements. The further work of this paper will 1)verify whether features can be extracted through a smaller network structure
design a more efficient network structure
and further reduce the size of the model. 2)Future work will also focus on using 3D convolution to distinguish more complicated driving behaviors because distracted driving behavior not only needs to focus on predicting the driver's fatigue status.
Abtahi S, Omidyeganeh M, Shirmohammadi S and Hariri B. 2014. YawDD: a yawning detection dataset//Proceedings of the 5th ACM Multimedia Systems Conference. Singapore, Singapore: Association for Computing Machinery: 24-28[ DOI:10.1145/2557642.2563678 http://dx.doi.org/10.1145/2557642.2563678 ]
Alioua N, Amine A and Rziza M. 2014. Driver's fatigue detection based on yawning extraction. International Journal of Vehicular Technology, 2014:#678786[DOI:10.1155/2014/678786]
Balandong R P, Ahmad R F, Saad M N M and Malik A S. 2018. A review on EEG-based automatic sleepiness detection systems for driver. IEEE Access, 6:22908-22919[DOI:10.1109/ACCESS.2018.2811723]
Caffier P P, Erdmann U and Ullsperger P. 2003. Experimental evaluation of eye-blink parameters as a drowsiness measure. European Journal of Applied Physiology, 89(3):319-325[DOI:10.1007/s00421-003-0807-5]
Chui K T, Tsang K F, Chi H R, Ling B W K and Wu C K. 2016. An accurate ECG-Based transportation safety drowsiness detection scheme. IEEE Transactions on Industrial Informatics, 12(4):1438-1452[DOI:10.1109/TⅡ.2016.2573259]
Clavijo G L R, Patiño J O and León D M. 2015. Detection of visual fatigue by analyzing the blink rate//Proceedings of the 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA). Bogota, Colombia: IEEE: 1-5[ DOI:10.1109/STSIVA.2015.7330398 http://dx.doi.org/10.1109/STSIVA.2015.7330398 ]
Forsman P M, Vila B J, Short R A, Mott C G and Van Dongen H P A. 2013. Efficient driver drowsiness detection at moderate levels of drowsiness. Accident Analysis and Prevention, 50:341-350[DOI:10.1016/j.aap.2012.05.005]
Friedrichs F and Yang B. 2010. Camera-based drowsiness reference for driver state classification under real driving conditions//2010 IEEE Intelligent Vehicles Symp osium. San Diego, USA: IEEE: 101-106[ DOI:10.1109/IVS.2010.5548039 http://dx.doi.org/10.1109/IVS.2010.5548039 ]
Ghoddoosian R, Galib M and Athitsos V. 2019. A realistic dataset and baseline temporal model for early drowsiness detection[EB/OL ] .[2020-02-20 ] . https://arxiv.org/pdf/1904.07312.pdf https://arxiv.org/pdf/1904.07312.pdf
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI:10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]
Ji S W, Xu W, Yang M and Yu K. 2013. 3D Convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221-231[DOI:10.1109/TPAMI.2012.59]
Khandelwal S and Sigal L. 2019. AttentionRNN: a structured spatial attention mechanism[EB/OL ] .[2020-03-20 ] . http://arxiv.org/pdf/1905.09400.pdf http://arxiv.org/pdf/1905.09400.pdf
Li Z J, Chen L K, Peng J and Wu Y. 2017. Automatic detection of driver fatigue using driving operation information for transportation safety. Sensors, 17(6):#1212[DOI:10.3390/s17061212]
Lin M, Chen Q and Yan S C. 2014. Network in network//Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014-Conference Track Proceedings[EB/OL ] .[2020-03-20 ] . https://arxiv.org/pdf/1312.4400.pdf https://arxiv.org/pdf/1312.4400.pdf
Qiu Z F, Yao T and Mei T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 5534-5542[ DOI:10.1109/ICCV.2017.590 http://dx.doi.org/10.1109/ICCV.2017.590 ]
Sahayadhas A, Sundaraj K and Murugappan M. 2012. Detecting driver drowsiness based on sensors:a review. Sensors, 12(12):16937-16953[DOI:10.3390/s121216937]
Tefft B C. 2018. Acute sleep deprivation and culpable motor vehicle crash involvement. Sleep, 41(10):#zsy144[DOI:/10.1093/sleep/zsy144]
Tran D, Bourdev L, Fergus R, Torresani L and Paluri M. 2015. Learning spatiotemporal features with 3D convolutional networks//Proceedings of 2015 IEEE International Conference on Comp uter Vision (ICCV). Santiago, Chile: IEEE: 4489-4497[ DOI:10.1109/ICCV.2015.510 http://dx.doi.org/10.1109/ICCV.2015.510 ]
Wang P, Min J L and Hu J F. 2018. Ensemble classifier for driver's fatigue detection based on a single EEG channel. IET Intelligent Transport Systems, 12(10):1322-1328[DOI:10.1049/iet-its.2018.5290]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[ DOI:10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Xie Y Q, Chen K X and Murphey Y L. 2019. Real-time and robust driver yawning detection with deep neural networks//2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India: IEEE: 532-538[ DOI:10.1109/SSCI.2018.8628881 http://dx.doi.org/10.1109/SSCI.2018.8628881 ]
Xing Y, Lv C, Wang H J, Cao D P, Velenis E and Wang F Y. 2019. Driver activity recognition for intelligent vehicles:a deep learning approach. IEEE Transactions on Vehicular Technology, 68(6):5379-5390[DOI:10.1109/TVT.2019.2908425]
Zhang W W and Su J Y. 2017. Driver yawning detection based on long short term memory networks//2017 IEEE Symposium Series on Computational Intelligence (SSCI). Honolulu, USA: IEEE: 1-5[ DOI:10.1109/SSCI.2017.8285343 http://dx.doi.org/10.1109/SSCI.2017.8285343 ]
相关作者
相关机构
京公网安备11010802024621