李少凡1,2, 高尚兵1,2, 张莹莹1,2(1. 淮阴工学院计算机与软件工程学院, 淮安 223001;2.
2. 江苏省物联网移动互联技术工程实验室, 淮安 223001)
目的 基于图像的驾驶员分心行为识别可认为是一种二级图像子分类问题，与传统的图像分类不同，驾驶员分心识别任务中的各类区别比较微小，如区分一幅图像是在弄头发还是打电话完全取决于驾驶员手上是否有手机这个物体，即图像中的较小区域就决定了该图像的类别。对于那些图像差异较小的类别，通常的图像分类方法无法高精度地区分。因此，为了能够学习到不同驾驶行为之间微小的表征区别，提出了一种姿态引导的实例感知学习网络用于驾驶员行为识别。方法 首先利用人体检测器检测到人体框，利用人体姿态估计获取具有辨识性的手部相关区域，将人体和手部区域的特征作为实例级别的特征，以此设计一种实例感知学习模块充分获取不同层级的上下文语义信息。其次利用手部相关特征构建双通道交互模块来对关键空间信息进行表征的同时，对视觉特征进行优化，组建成一个多分支的深度神经网络。最后将不同分支的结果进行融合。结果 实验结果表明，本文方法在AUC（American University in Cairo）数据集和自建三客一危数据集上的测试准确率分别达到96.17%和96.97%，相较于未使用实例感知模块和通道交互的模型，准确率显著改善，在复杂数据集下识别效果提升明显。结论 本文提出的姿态引导的实例感知学习网络，在一定程度上降低了环境的干扰，准确度高，能辅助驾驶员安全行车，减少交通事故的发生。
Pose-guided instance-aware learning for driver distraction recognition
Li Shaofan1,2, Gao Shangbing1,2, Zhang Yingying1,2(1. College of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian 223001, China;2.
2. Laboratory for Internet of Things and Mobile Internet Technology of Jiangsu Province, Huaian 223001, China)
Objective Distracted driving is the main cause of traffic accidents. Data from the Department of Transportation show that about 2 million traffic accidents occur every year，of which over 80% are caused by distracted driving. In recent years，the advanced driver assistance system（ADAS）has been employed to collect data and to detect and recognize static and dynamic objects inside and outside vehicles. Driving behavior detection is the key technology in ADAS that effectively reminds drivers to avoid traffic accidents. Therefore，driver distraction recognition has broad research prospects in the fields of computer vision and autonomous driving. With the rapid development of deep learning and computer vision，many researchers have explored distracted driving detection in various ways. In recent years，deep learning has been widely used in detecting driver distraction. Compared with the traditional algorithms，deep learning methods shows great improvements in their performance and accuracy. Image-based driver distraction recognition can be considered a secondary image subclassification problem. Unlike traditional image classification，the differences in driver distraction recognition tasks are very small，which means that a small area in an image determines the action class of that image. To solve this problem，this paper proposes a pose-guided instance-aware network for driver behavior recognition. Method First，the human body is detected by the you only look once version 5（YOLOv5）object detector，and then the recognizable hand-related area is obtained by using the human body pose estimation high-resolution network（HRNet）. The features of the human body and the hand area are then used as instance-level features，and an instance-aware module is designed to fully obtain the contextual semantic information at different levels. Second，a dual-channel interaction module is constructed using the handrelated features to characterize key spatial information and to optimize visual features. In this way，a multi-branch deep neural network is formed，and the scores of different branches are fused. ResNet 50 is used as a backbone network，and the backbone convolutional networks are initialized with the pre-trained ImageNet weights. The input size of the image is scaled to 224×224. Network training uses the cross-entropy loss function to update the weight of the network model. The initial learning rate is set to 1E-2，and the batch size of the algorithm training is set to 64. The stochastic gradient descent （SGD）optimizer with a momentum of 0. 99 is applied to the cross-entropy loss function. For the SGD，a multi-step learning rate with an initial value of 0. 01 is reduced by a weight decay of 0. 1 after 20 epochs. The model is trained using NVIDIA Tesla V100（16 GB）in Centos 8. 0. The implementation is based on Python 3. 8 and the PyTorch 1. 8 toolbox. Result Experimental results show that the proposed method has test accuracies of 96. 17% and 96. 97% on the American University in Cairo（AUC）and self-built large vehicle datasets，respectively. Compared with the model without instanceaware module and channel interaction，the accuracy of the proposed method is significantly improved，particularly in complex environments. Several experiments are also performed to analyze the effectiveness of the components of this method on two datasets. The highest accuracy is reported when combining the human，hand，and spatial branches. The accuracy has improved by 7. 5% and more than 3% on the self-built large-scale vehicle driver dataset and the public AUC dataset， respectively. Results of ablation study also confirm the effectiveness of the proposed component in improving recognition accuracy. Conclusion This study proposes a pose-guided instance-aware network for driver distraction recognition. Combined with object detection and human pose estimation，the human body and hand regions are treated as instance-level features，and an instance-aware module is established. A dual-channel interaction module is then constructed using the handrelated regions to characterize key spatial information. Experimental results show that the proposed method outperforms the other models in terms of accuracy on both self-built complex environment datasets and public datasets. Compared with the traditional RGB-based model，the pose-guided method shows significant improvements in its performance in complex environments and effectively reduces the impact of complex backgrounds，different viewpoints，illuminations，and changes in human body scales. This method also reduces the interference of complex environments，shows high accuracy，assists drivers in driving safely，and reduces the occurrence of traffic accidents.