Current Issue Cover
融合环境特征与改进YOLOv4的安全帽佩戴检测

葛青青, 张智杰, 袁珑, 李秀梅, 孙军梅(杭州师范大学信息科学与技术学院, 杭州 311121)

摘 要
目的 在施工现场,安全帽是最为常见和实用的个人防护用具,能够有效防止和减轻意外带来的头部伤害。但在施工现场的安全帽佩戴检测任务中,经常出现难以检测到小目标,或因为复杂多变的环境因素导致检测准确率降低等情况。针对这些问题,提出一种融合环境特征与改进YOLOv4(you only look once version 4)的安全帽佩戴检测方法。方法 为补充卷积池化等过程中丢失的特征,在保证YOLOv4得到的3种不同大小的输出特征图与原图经过特征提取得到的特征图感受野一致的情况下,将两者相加,融合高低层特征,捕捉更多细节信息;对融合后的特征图采用3×3卷积操作,以减小特征图融合后的混叠效应,保证特征稳定性;为适应施工现场的各种环境,利用多种数据增强方式进行环境模拟,并采用对抗训练方法增强模型的泛化能力和鲁棒性。结果 提出的改进YOLOv4方法在开源安全帽佩戴检测数据集(safety helmet wearing dataset,SHWD)上进行测试,平均精度均值(mean average precision,mAP)达到91.55%,较当前流行的几种目标检测算法性能有所提升,其中相比于YOLOv4,mAP提高了5.2%。此外,改进YOLOv4方法在融合环境特征进行数据增强后,mAP提高了4.27%,在各种真实环境条件下进行测试时都有较稳定的表现。结论 提出的融合环境特征与改进YOLOv4的安全帽佩戴检测方法,以改进模型和数据增强的方式提升模型准确率、泛化能力和鲁棒性,为安全帽佩戴检测提供了有效保障。
关键词
Safety helmet wearing detection method of fusing environmental features and improved YOLOv4

Ge Qingqing, Zhang Zhijie, Yuan Long, Li Xiumei, Sun Junmei(School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China)

Abstract
Objective National production safety data in 2019 has showed that 95% of production safety accidents were caused by unsafe behaviors of workers, including improperly wearing protection supplies. Therefore, safety helmet wearing detection has played a vital role in safety production. An end-to-end detection algorithm with high accuracy and strong generalization ability is of great significance to ensure operators' personal safety and reduce safety accidents. Safety helmet wearing detection has belonged to the category of target detection. Early target detection algorithms have been mostly conducted via manual feature construction. With the development of deep learning, target detection has been divided into "two-stage detection" and "one-stage detection" and these series of detectors greatly improved the detection speed and detection accuracy. However, the current deep learning algorithms have failed to ensure accurate detection of small targets, and are not generally applicable in various scenarios, resulting in poor generalization ability and week anti-interference ability. To solve these problems, a safety helmet wearing detection method that combines environmental characteristics and improved you only look once version 4 (YOLOv4) has been proposed to achieve efficient detection of safety helmets wearing. Method Based on YOLOv4, cross stage partial darknet53 (CSPDarknet53) has been as backbone network, path aggregation network (PANET) and spatial pyramid pooling (SPP) as neck. The feature maps have been achieved with three different sizes of YOLOv4. With the input size 608×608 pixels, the resulting resolutions of YOLO head have been 76×76 pixels, 38×38 pixels, and 19×19 pixels respectively. Due to the great difference between the high-level and low-level feature map information, the given input original image has extracted feature to achieve the same resolution as the YOLO head. For the input original image, a 3×3 convolution operation has been conducted via a batch normalization (BN) layer for normalization operation and ReLU with unilateral suppression and sparse activation as the activation function. The above process has been iterated until the resolution of the feature map is consistent with the size of the corresponding YOLO head. Then, under the condition that the receptive field is consistent, the output feature maps with three different sizes of YOLOv4 have been added to the feature maps obtained by feature extraction from the original image, thereby fusing high-level features with low-level features to capture more detailed information. After that, 3×3 convolution operation has been used for the fused feature maps to reduce the aliasing effect after feature map fusion to get three sizes of output. The feature map obtained by feature extraction of the original image has represented a shallow network with high resolution and more detailed features to predict the location information while YOLO head represents a deep network with low resolution and more semantic features, which helps to decide the category information. The model can achieve higher accuracy in detecting large and small targets via combining the two feature maps. Moreover, data enhancement techniques have been used, such as random cropping, CutMix, simulating environment corrupted with noise and using adversarial samples for adversarial training, to add small disturbances to the training data. The training data has been enhanced to improve the generalization ability of the model and the robustness of the model has been improved. Result The improved YOLOv4 has been tested on the open source safety helmet wearing dataset (SHWD). The mean average precision (mAP) has reached 91.55% and the recall reached 98.62%. Compared with the existing CornerNet-Lite, Faster region convolutional neural network (RCNN), YOLOv3, YOLOv4 and other algorithms, the proposed method can achieve improved performance in mAP and recall. When compared with traditional YOLOv4, this improved YOLOv4 can increase mAP and recall by 5.2% and 5.79% respectively. The data enhancement methods adopted in this paper has improved the mAP of CornerNet-Lite, Faster RCNN, YOLOv3, YOLOv4 and Improved YOLOv4 from 2% to 5%. The improved YOLOv4 has increased mAP by 4.27% from 91.55% to 95.82%. In addition, the proposed method has more stable performance after data enhancement when testing under different environmental conditions. For instance, the detection performance of night images has been highly improved with mAP increasing from 67.73% to 84.10%. The comparison of experimental results via adding adversarial samples in the training set has shown that the recall of the proposed model has increased by 0.29% and the mAP has increased by 0.56%. Conclusion The method which fuses environmental features and improved YOLOv4 has been proposed for safety helmet wearing detection. The proposed method has used convolutional neural networks to extract convolutional features, and improve the efficiency of feature extraction and target detection greatly. Moreover, the effective combined information of high and low layers by fusing different feature maps can improve the detection accuracy of small targets. The multiple data enhancement methods have been used to improve the robustness of the model in complex scenarios. The end-to-end training of the detection algorithm has been realized and achieved the accuracy, generalization ability and robustness of the model improvement for the effective detection of safety helmet wearing.
Keywords

订阅号|日报