用于驾驶员分心行为识别的姿态引导实例感知学习

李少凡; 高尚兵; 张莹莹

doi:10.11834/jig.220835

图像理解和计算机视觉 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

用于驾驶员分心行为识别的姿态引导实例感知学习
Pose-guided instance-aware learning for driver distraction recognition
2023年28卷第11期页码：3550-3561
纸质出版日期： 2023-11-16 ，
DOI： 10.11834/jig.220835
稿件说明：

移动端阅览

李少凡，高尚兵，张莹莹. 2023. 用于驾驶员分心行为识别的姿态引导实例感知学习. 中国图象图形学报， 28(11):3550-3561

Li Shaofan， Gao Shangbing， Zhang Yingying. 2023. Pose-guided instance-aware learning for driver distraction recognition. Journal of Image and Graphics， 28(11):3550-3561
李少凡，高尚兵，张莹莹. 2023. 用于驾驶员分心行为识别的姿态引导实例感知学习. 中国图象图形学报， 28(11):3550-3561 DOI： 10.11834/jig.220835.

Li Shaofan， Gao Shangbing， Zhang Yingying. 2023. Pose-guided instance-aware learning for driver distraction recognition. Journal of Image and Graphics， 28(11):3550-3561 DOI： 10.11834/jig.220835.

摘要

目的

基于图像的驾驶员分心行为识别可认为是一种二级图像子分类问题，与传统的图像分类不同，驾驶员分心识别任务中的各类区别比较微小，如区分一幅图像是在弄头发还是打电话完全取决于驾驶员手上是否有手机这个物体，即图像中的较小区域就决定了该图像的类别。对于那些图像差异较小的类别，通常的图像分类方法无法高精度地区分。因此，为了能够学习到不同驾驶行为之间微小的表征区别，提出了一种姿态引导的实例感知学习网络用于驾驶员行为识别。

方法

首先利用人体检测器检测到人体框，利用人体姿态估计获取具有辨识性的手部相关区域，将人体和手部区域的特征作为实例级别的特征，以此设计一种实例感知学习模块充分获取不同层级的上下文语义信息。其次利用手部相关特征构建双通道交互模块来对关键空间信息进行表征的同时，对视觉特征进行优化，组建成一个多分支的深度神经网络。最后将不同分支的结果进行融合。

结果

实验结果表明，本文方法在AUC（American University in Cairo）数据集和自建三客一危数据集上的测试准确率分别达到96.17%和96.97%，相较于未使用实例感知模块和通道交互的模型，准确率显著改善，在复杂数据集下识别效果提升明显。

结论

本文提出的姿态引导的实例感知学习网络，在一定程度上降低了环境的干扰，准确度高，能辅助驾驶员安全行车，减少交通事故的发生。

Abstract

Objective

Distracted driving is the main cause of traffic accidents. Data from the Department of Transportation show that about 2 million traffic accidents occur every year， of which over 80% are caused by distracted driving. In recent years， the advanced driver assistance system （ADAS） has been employed to collect data and to detect and recognize static and dynamic objects inside and outside vehicles. Driving behavior detection is the key technology in ADAS that effectively reminds drivers to avoid traffic accidents. Therefore， driver distraction recognition has broad research prospects in the fields of computer vision and autonomous driving. With the rapid development of deep learning and computer vision， many researchers have explored distracted driving detection in various ways. In recent years， deep learning has been widely used in detecting driver distraction. Compared with the traditional algorithms， deep learning methods shows great improvements in their performance and accuracy. Image-based driver distraction recognition can be considered a secondary image sub-classification problem. Unlike traditional image classification， the differences in driver distraction recognition tasks are very small， which means that a small area in an image determines the action class of that image. To solve this problem， this paper proposes a pose-guided instance-aware network for driver behavior recognition.

Method

First， the human body is detected by the you only look once version 5 （YOLOv5） object detector， and then the recognizable hand-related area is obtained by using the human body pose estimation high-resolution network （HRNet）. The features of the human body and the hand area are then used as instance-level features， and an instance-aware module is designed to fully obtain the contextual semantic information at different levels. Second， a dual-channel interaction module is constructed using the hand-related features to characterize key spatial information and to optimize visual features. In this way， a multi-branch deep neural network is formed， and the scores of different branches are fused. ResNet 50 is used as a backbone network， and the backbone convolutional networks are initialized with the pre-trained ImageNet weights. The input size of the image is scaled to 224 × 224. Network training uses the cross-entropy loss function to update the weight of the network model. The initial learning rate is set to 1E-2， and the batch size of the algorithm training is set to 64. The stochastic gradient descent （SGD） optimizer with a momentum of 0.99 is applied to the cross-entropy loss function. For the SGD， a multi-step learning rate with an initial value of 0.01 is reduced by a weight decay of 0.1 after 20 epochs. The model is trained using NVIDIA Tesla V100 （16 GB） in Centos 8.0. The implementation is based on Python 3.8 and the PyTorch 1.8 toolbox.

Result

Experimental results show that the proposed method has test accuracies of 96.17% and 96.97% on the American University in Cairo （AUC） and self-built large vehicle datasets， respectively. Compared with the model without instance-aware module and channel interaction， the accuracy of the proposed method is significantly improved， particularly in complex environments. Several experiments are also performed to analyze the effectiveness of the components of this method on two datasets. The highest accuracy is reported when combining the human， hand， and spatial branches. The accuracy has improved by 7.5% and more than 3% on the self-built large-scale vehicle driver dataset and the public AUC dataset， respectively. Results of ablation study also confirm the effectiveness of the proposed component in improving recognition accuracy.

Conclusion

This study proposes a pose-guided instance-aware network for driver distraction recognition. Combined with object detection and human pose estimation， the human body and hand regions are treated as instance-level features， and an instance-aware module is established. A dual-channel interaction module is then constructed using the hand-related regions to characterize key spatial information. Experimental results show that the proposed method outperforms the other models in terms of accuracy on both self-built complex environment datasets and public datasets. Compared with the traditional RGB-based model， the pose-guided method shows significant improvements in its performance in complex environments and effectively reduces the impact of complex backgrounds， different viewpoints， illuminations， and changes in human body scales. This method also reduces the interference of complex environments， shows high accuracy， assists drivers in driving safely， and reduces the occurrence of traffic accidents.

关键词

分心检测姿态估计目标检测实例特征多流网络

Keywords

distraction detectionpose estimationobject detectioninstance level featuremulti-stream network

references

Abouelnaga Y， Eraqi H M and Moustafa M N. 2018. Real-time distracted driver posture classification ［EB/OL］. ［2022-09-06］. https://arxiv.org/pdf/1706.09498.pdfhttps://arxiv.org/pdf/1706.09498.pdf

Alotaibi M and Alotaibi B. 2020. Distracted driver classification using deep learning. Signal， Image and Video Processing， 14（3）： 617-624 ［DOI： 10.1007/s11760-019-01589-zhttp://dx.doi.org/10.1007/s11760-019-01589-z］

Arefin M R， Makhmudkhujaev F， Chae O and Kim J. 2019. Aggregating CNN and HOG features for real-time distracted driver detection//Proceedings of 2019 IEEE International Conference on Consumer Electronics. Las Vegas， USA： IEEE： #8661970 ［DOI： 10.1109/icce.2019.8661970http://dx.doi.org/10.1109/icce.2019.8661970］

Behera A and Keidel A H. 2018. Latent body-pose guided DenseNet for recognizing driver’s fine-grained secondary activities//Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance. Auckland， New Zealand： IEEE： #8639158 ［DOI： 10.1109/avss.2018.8639158http://dx.doi.org/10.1109/avss.2018.8639158］

Cai C X， Gao S B， Zhou J and Huang Z H. 2020. Freeway anti-collision warning algorithm based on vehicle-road visual collaboration. Journal of Image and Graphics， 25（8）： 1649-1657

蔡创新，高尚兵，周君，黄子赫. 2020. 车路视觉协同的高速公路防碰撞预警算法. 中国图象图形学报， 25（8）： 1649-1657 ［DOI： 10.11834/jig.190633http://dx.doi.org/10.11834/jig.190633］

Chao Y W， Liu Y F， Liu X Y， Zeng H Y and Deng J. 2018. Learning to detect human-object interactions//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe， USA： IEEE： 381-389 ［DOI： 10.1109/wacv.2018.00048http://dx.doi.org/10.1109/wacv.2018.00048］

Deng J， Dong W， Socher R， Li L J， Li K and Fei-Fei L. 2009. ImageNet： a large-scale hierarchical image database///Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 248-255 ［DOI： 10.1109/cvpr.2009.5206848http://dx.doi.org/10.1109/cvpr.2009.5206848］

Donahue J， Hendricks L A， Guadarrama S， Rohrbach M， Venugopalan S， Darrell T and Saenko K. 2015. Long-term recurrent convolutional networks for visual recognition and description//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 2625-2634 ［DOI： 10.1109/CVPR.2015.7298878http://dx.doi.org/10.1109/CVPR.2015.7298878］

El Khatib A， Ou C J and Karray F. 2020. Driver inattention detection in the context of next-generation autonomous vehicles design： a survey. IEEE Transactions on Intelligent Transportation Systems， 21（11）： 4483-4496 ［DOI： 10.1109/tits.2019.2940874http://dx.doi.org/10.1109/tits.2019.2940874］

Eraqi H M， Abouelnaga Y， Saad M H and Moustafa M N. 2019. Driver distraction identification with an ensemble of convolutional neural networks. Journal of Advanced Transportation， 2019： #4125865 ［DOI： 10.1155/2019/4125865http://dx.doi.org/10.1155/2019/4125865］

Guo G D and Lai A. 2014. A survey on still image based human action recognition. Pattern Recognition， 47（10）： 3343-3361 ［DOI： 10.1016/j.patcog.2014.04.018http://dx.doi.org/10.1016/j.patcog.2014.04.018］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90］

Hu Y C， Lu M Q and Lu X B. 2019. Driving behaviour recognition from still images by using multi-stream fusion CNN. Machine Vision and Applications， 30（5）： 851-865 ［DOI： 10.1007/s00138-018-0994-zhttp://dx.doi.org/10.1007/s00138-018-0994-z］

Huang G， Liu Z， Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2261-2269 ［DOI： 10.1109/cvpr.2017.243http://dx.doi.org/10.1109/cvpr.2017.243］

Koay H V， Chuah J H， Chow C O， Chang Y L and Rudrusamy B. 2021. Optimally-weighted image-pose approach （OWIPA） for distracted driver detection and classification. Sensors， 21（14）： #4837 ［DOI： 10.3390/s21144837http://dx.doi.org/10.3390/s21144837］

Koesdwiady A， Bedawi S M， Ou C J and Karray F. 2017. End-to-end deep learning for driver distraction recognition//Proceedings of the 14th International Conference on Image Analysis and Recognition. Montreal， Canada： Springer： 11-18 ［DOI： 10.1007/978-3-319-59876-5_2http://dx.doi.org/10.1007/978-3-319-59876-5_2］

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90 ［DOI： 10.1145/3065386http://dx.doi.org/10.1145/3065386］

Li P H， Yang Y F， Grosu R， Wang G D， Li R， Wu Y H and Huang Z. 2022. Driver distraction detection using octave-like convolutional neural network. IEEE Transactions on Intelligent Transportation Systems， 23（7）： 8823-8833 ［DOI： 10.1109/tits.2021.3086411http://dx.doi.org/10.1109/tits.2021.3086411］

Li X X， Luo J， Duan C， Zhi Y and Yin P P. 2021. Real-time detection of fatigue driving based on face recognition. Journal of Physics： Conference Series， 1802（2）： #022044 ［DOI： 10.1088/1742-6596/1802/2/022044http://dx.doi.org/10.1088/1742-6596/1802/2/022044］

LRD M， Mukhopadhyay A and Biswas P. 2022. Distraction detection in automotive environment using appearance-based gaze estimation//Proceedings of the 27th International Conference on Intelligent User Interfaces. Helsinki， Finland： ACM： 38-41 ［DOI： 10.1145/3490100.3516463http://dx.doi.org/10.1145/3490100.3516463］

Moslemi N， Azmi R and Soryani M. 2019. Driver distraction recognition using 3D convolutional neural networks//Proceedings of the 4th International Conference on Pattern Recognition and Image Analysis. Tehran， Iran： IEEE： 145-151 ［DOI： 10.1109/PRIA.2019.8786012http://dx.doi.org/10.1109/PRIA.2019.8786012］

Pan J K， Liu Z Q and Wang J C. 2021. Fatigue driving detection based on ocular self-quotient image and gradient image co-occurrence matrix. Journal of Image and Graphics， 26（1）： 154-164

潘剑凯，柳政卿，王秋成. 2021. 基于眼部自商图—梯度图共生矩阵的疲劳驾驶检测. 中国图象图形学报， 26（1）： 154-164 ［DOI： 10.11834/jig.200258http://dx.doi.org/10.11834/jig.200258］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Sharma G， Jurie F and Schmid C. 2012. Discriminative spatial saliency for image classification//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 3506-3513 ［DOI： 10.1109/cvpr.2012.6248093http://dx.doi.org/10.1109/cvpr.2012.6248093］

Simonyan K and Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 568-576

Singh T， Mohadikar M， Gite S， Patil S， Pradhan B and Alamri A. 2021. Attention span prediction using head-pose estimation with deep neural networks. IEEE Access， 9： 142632-142643 ［DOI： 10.1109/ACCESS.2021.3120098http://dx.doi.org/10.1109/ACCESS.2021.3120098］

Sun K， Xiao B， Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 5686-5696 ［DOI： 10.1109/cvpr.2019.00584http://dx.doi.org/10.1109/cvpr.2019.00584］

Tian R R， Li L X， Chen M Y， Chen Y B and Witt G J. 2013. Studying the effects of driver distraction and traffic density on the probability of crash and near-crash events in naturalistic driving environment. IEEE Transactions on Intelligent Transportation Systems， 14（3）： 1547-1555 ［DOI： 10.1109/TITS.2013.2261988http://dx.doi.org/10.1109/TITS.2013.2261988］

Tran D， Bourdev L， Fergus R， Torresani L and Paluri M. 2015. Learning spatiotemporal features with 3D convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 4489-4497 ［DOI： 10.1109/ICCV.2015.510http://dx.doi.org/10.1109/ICCV.2015.510］

Tran D， Do H M， Lu J X and Sheng W H. 2020. Real-time detection of distracted driving using dual cameras//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas， USA： IEEE： 2014-2019 ［DOI： 10.1109/iros45743.2020.9340921http://dx.doi.org/10.1109/iros45743.2020.9340921］

Wang C C， Gao S B， Cai C X and Chen H L. 2022. Vehicle-road visual cooperative driving safety early warning algorithm for expressway scenes. Journal of Image and Graphics， 27（10）： 3058-3067

汪长春，高尚兵，蔡创新，陈浩霖. 2022. 高速公路场景的车路视觉协同行车安全预警算法. 中国图象图形学报， 27（10）： 3058-3067 ［DOI： 10.11834/jig.210290http://dx.doi.org/10.11834/jig.210290］

Wu M Y， Zhang X， Shen L L and Yu H. 2021. Pose-aware multi-feature fusion network for driver distraction recognition//Proceedings of the 25th International Conference on Pattern Recognition. Milan， Italy： IEEE： 1228-1235 ［DOI： 10.1109/icpr48806.2021.9413337http://dx.doi.org/10.1109/icpr48806.2021.9413337］

Yan C， Coenen F and Zhang B L. 2014. Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients. International Journal of Vehicular Technology， 2014： #719413 ［DOI： 10.1155/2014/719413http://dx.doi.org/10.1155/2014/719413］

Zhuang Y and Qi Y. 2021. Driving fatigue detection based on pseudo 3D convolutional neural network and attention mechanisms. Journal of Image and Graphics， 26（1）： 143-153

庄员，戚湧. 2021. 伪3D卷积神经网络与注意力机制结合的疲劳驾驶检测. 中国图象图形学报， 26（1）： 143-153 ［DOI： 10.11834/jig.200079http://dx.doi.org/10.11834/jig.200079］

文章被引用时，请邮件提醒。

提交

红外与可见光图像特征动态选择的目标检测网络

融合点云与图像的环境目标检测研究进展

联合深度学习和宽度学习的纹理样图自动提取

融合帧间时序关系的标准胎儿四腔心超声切面自动获取

GZMH：用于有丝分裂细胞核检测和分割的乳腺癌病理图像数据集