双目机器视觉及RetinaNet模型的路侧行人感知定位
Roadside pedestrian detection and location based on binocular machine vision and RetinaNet
- 2021年26卷第12期 页码:2941-2952
收稿:2020-04-13,
修回:2020-10-12,
录用:2020-10-19,
纸质出版:2021-12-16
DOI: 10.11834/jig.200109
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-04-13,
修回:2020-10-12,
录用:2020-10-19,
纸质出版:2021-12-16
移动端阅览
目的
2
行人感知是自动驾驶中必不可少的一项内容,是行车安全的保障。传统激光雷达和单目视觉组合的行人感知模式,设备硬件成本高且多源数据匹配易导致误差产生。对此,本文结合双目机器视觉技术与深度学习图像识别技术,实现对公共路权环境下路侧行人的自动感知与精准定位。
方法
2
利用双目道路智能感知系统采集道路前景图像构建4种交通环境下的行人识别模型训练库;采用RetinaNet深度学习模型进行目标行人自动识别;通过半全局块匹配(semi-global block matching,SGBM)算法实现行人道路前景图像对的视差值计算;通过计算得出的视差图分别统计U-V方向的视差值,提出结合行人识别模型和U-V视差的测距算法,实现目标行人的坐标定位。
结果
2
实验统计2.5 km连续测试路段的行人识别结果,对比人工统计结果,本文算法的召回率为96.27%。与YOLOv3(you only look once)和Tiny-YOLOv3方法在4种交通路况下进行比较,平均F值为96.42%,比YOLOv3和Tiny-YOLOv3分别提高0.9%和3.03%;同时,实验利用标定块在室内分别拍摄3 m、4 m和5 m不同距离的20对双目图像,验证测距算法,计算标准偏差皆小于0.01。
结论
2
本文提出的结合RetinaNet目标识别模型与改进U-V视差算法能够实现对道路行人的检测,可以为自动驾驶的安全保障提供技术支持,具有一定的应用价值。
Objective
2
Deep learning has been widely used in the field of computer vision. The application of target recognition on driverless vehicles field via using the extraction based on convolutional neural networks (CNNs). However
the environment of traffic road is complex and changeable
it is difficult to achieve obstacle detection under the actual traffic conditions. The variable characteristic of yielded traffic pedestrian makes pedestrian detection more prominent in road obstacle detection. 1) Currently
most pedestrian recognition models are trained and tested based on a simple background
and few researches have been done on the recognition effect of pedestrian targets in complex road traffic realities. Image parallax has been customized in target ranging based on the development of binocular stereo vision. Image pairs have been captured via binocular stereo vision cameras. Parallax value for left and right images have been calculated based stereo matching algorithms. The depth maps have been obtained based on disparity maps further. Ultimately
the detection of road obstacles is implemented. 2) The difficulties to extract
match and track image sequence feature points and reconstruct projection scenes have been resolving. A new algorithm has been proposed to extract obstacle coordinate information on U-V histograms via counting disparity values in the U-V direction. The two-dimensional plane information in the original image has been converted into line segment information in the U-V direction via calculating the U-V parallax image. Least squares method
Hough transform and other line extraction methods have been used to extract road and obstacle-related line segments further. 3) This type of method is simple to calculate and is conducive to real-time performance
but has a large impact on noise in complex environments. The methodology which combines deep learning and modifies U-V parallax algorithm has proposed to realize the detection of road pedestrians (including recognition and location of pedestrian) that improve the driving safety of vehicles on the road.
Method
2
The binocular road intelligent perception system has been used to collect road pedestrian foreground images. The training dataset has been established based on the data collected under four types of roadways. RetinaNet model has been utilized on pedestrian recognition. A deep residual network (ResNet) has been as a feature extraction network. The feature pyramid network (FPN) has been used to form multi-scale features to strengthen the feature network containing multi-scale target information. The two feature networks have been applied respectively. Two fully convolutional network (FCN) subnetworks with the same structure with different parameters have been used to implement tasks including the target box category classification and bounding box position regression. Pedestrian data library has been established to feed RetinaNet network for training and testing in training phase. The trials-based batch size has been set to 24 and learning rate has been to be 0.000 1. The accomplishment completion of training process has reached 100 epochs. Random 400 samples have been chosen from training samples as validation data to test the model performance in each time of training. Counting iteration loss value in each epoch and selected the model corresponding to the minimum value as pedestrian recognition model. The horizontal gradient filtering has been conducted on the left image
and then calculates the Birchfield and Tomasi (BT) cost value of the left and right images have been calculated subsequently. The cost value of the left and right images has been fused
and the current cost value has been substituted replaced based on the sum of the cost value of the area around the pixel via traversing pixel by pixel. The cost value has been optimize using semi-global matching (SGM) cost aggregation algorithm. The disparity corresponding to the lowest matching error has been opted to calculate the image disparity based on winner takes all (WTA). The false parallax value has been eliminated via confidence detection
and the parallax holes have been supplemented via sub-pixel interpolation. The left and right consistency has been used to eliminate the parallax error caused by the left and right occlusion. The disparity map has presented noisy due to the interference of the complicated environment of the traffic road. First
the median filtering has been used to perform preliminary denoising processing on the disparity map to obtain a better disparity map. The parallax statistical range has narrowed to inside bounding box to remove irrelevant parallax interference as much as possible. Next
through traversing all the parallax values within the target pedestrian rectangular bounding box to find. The maximum parallax value has replaced all other parallax values in the bounding box. The number of disparities in the U-V direction has been re-counted based on the improved disparity map. At last
the coordinate positions of pedestrians have been obtained. The improved U-V parallax algorithm has filled the parallax holes inside of the bounding box and replaced the noise parallax with the maximum parallax value to improve the accuracy of pedestrian positioning.
Result
2
Compared with the artificial statistical results
the recall rate is 96.27% based on the experimental statistics of the pedestrian recognition results of the self-training RetinaNet model of the 2 500 m continuous test section. In comparison of the you only look once v3 (YOLOv3) and Tiny-YOLOv3 methods under four traffic conditions
the average F-value can reach 96.42%
0.9% higher than YOLOv3
and 3.03% higher than Tiny-YOLOv3. A calibration block to shoot 20 pairs of binocular images at different distances of 3 m
4 m
and 5 m in the laboratory to verify the distance measurement algorithm. The calculated standard deviation has been less than 0.01.
Conclusion
2
In this study
RetinaNet model combined with U-V parallax algorithm have been proposed to identify and positioning the pedestrians. Effectively pedestrian detection in the traffic environment has been proposed
and it is significance for the safety of driverless vehicles.
Benacer I, Hamissi A and Khouas A. 2015. A novel stereovision algorithm for obstacles detection based on U-V-disparity approach//2015 IEEE International Symposium on Circuits and Systems (ISCAS). Lisbon, Portugal: IEEE: 369-372[ DOI: 10.1109/ISCAS.2015.7168647 http://dx.doi.org/10.1109/ISCAS.2015.7168647 ]
Birchfield S and Tomasi C. 1999. Depth discontinuities by pixel-to-pixel stereo. International Journal of Computer Vision, 35(3): 269-293[DOI:10.1023/A:1008160311296]
He J J, Zhang Y P and Yao T Z. 2020. Robust pedestrian detection based on parallel channel cascade network//Proceedings of 2019 Science and Information Conference. Las Vegas, USA: Springer: 205-221[ DOI: 10.1007/978-3-030-17795-9_15] http://dx.doi.org/10.1007/978-3-030-17795-9_15] .
Hou H, Guo P, Zheng B and Wang J. 2021. An effective method for lane detection in complex situations//2021 9th International Symposium on Next Generation Electronics (ISNE). Changsha, China: IEEE, 2021: 1-4[ DOI: 10.1109/ISNE48910.2021.9493597 http://dx.doi.org/10.1109/ISNE48910.2021.9493597 ]
Howal S, Jadhav A, Arthshi C, Nalavade S and Shinde S. 2019. Object detection for autonomous vehicle using tensorflow//Proceedings of International Conference on Intelligent Computing, Information and Control Systems. Secunderabad, India: Springer: 86-93[ DOI: 10.1007/978-3-030-30465-2_11 http://dx.doi.org/10.1007/978-3-030-30465-2_11 ]
Ko S, Kim B and Kim J D. 2020. Deep learning-based algorithm for object identification in multimedia//Proceedings of International Conference on Ubiquitous Information Technologies and Applications International Conference on Computer Science and Its Applications. Algiers, Algeria: Springer: 505-511[ DOI: 10.1007/978-981-13-9341-9_87 http://dx.doi.org/10.1007/978-981-13-9341-9_87 ]
Lebed E V. 2017. The accuracy of statistical computing of the standard deviation of a random variable. IOP Conference Series: Earth and Environmental Science, 90(1): #012150. [DOI:10.1088/1755-1315/90/1/012150]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2999-3007[ DOI: 10.1109/iccv.2017.324 http://dx.doi.org/10.1109/iccv.2017.324 ]
Liu Q H, Wei M S and Chen C P. 2020. A note on the matrix-scaled total least squares problems with multiple solutions. Applied Mathematics Letters, 103: 106181[DOI:10.1016/j.aml.2019.106181]
Malekabadi A J, Khojastehpour M and Emadi B. 2019. Comparison of block-based stereo and semi-global algorithm and effects of pre-processing and imaging parameters on tree disparity map. Scientia Horticulturae, 247: 264-274[DOI:10.1016/j.scienta.2018.12.033].
Marin G, Agresti G, Minto L and Zanuttigh P. 2019. A multi-camera dataset for depth estimation in an indoor scenario. Data in Brief, 27: #104619[DOI:10.1016/j.dib.2019.104619]
Shi H, Zhu H, Wang J, Yu S Y and Fu Z F. 2016. Segment-based adaptive window and multi-feature fusion for stereo matching. Journal of Algorithms&Computational Technology, 10(1): 3-11[DOI:10.1177/1748301815618299]
Shrivastava S. 2019. Stereo vision based object detection using v-disparity and 3D density-based clustering//Proceedings of 2019 Science and Information Conference. Las Vegas, USA: Springer: 408-419[ DOI: 10.1007/978-3-030-17798-0_33 http://dx.doi.org/10.1007/978-3-030-17798-0_33 ]
Wang L, Fan X Y, Chen J H, Cheng J, Tan J and Ma X L. 2020a. 3D object detection based on sparse convolution neural network and feature fusion for autonomous driving in smart cities. Sustainable Cities and Society, 54: #102002[DOI:10.1016/j.scs.2019.102002]
Wang Q B, Liang Y Q, Wang Z T, Li W Y, Jiang Z G and Zhao Y J. 2020b. Deep learning and binocular stereovision to achieve fast detection and location of target//Proceedings of 2019 Chinese Intelligent Systems Conference, CISC. Haikou, China: Springer: 306-313[ DOI: 10.1007/978-981-32-9686-2_36 http://dx.doi.org/10.1007/978-981-32-9686-2_36 ]
Wang R B, Li L H, Jin L S, Guo L and Zhao Y B. 2007. Study on binocular vision based obstacledetection technology for intelligent vehicle. Journal of Image and Graphics, 12(12): 2158-2163
王荣本, 李琳辉, 金立生, 郭烈, 赵一兵. 2007. 基于双目视觉的智能车辆障碍物探测技术研究. 中国图象图形学报, 12(12): 2158-2163[DOI:10.3969/j.issn.1006-8961.2007.12.020]
Wang R Q, Jiang Y L and Lou J G. 2020c. TDCF: two-stage deep recommendation model based on mSDA and DNN. Expert Systems with Applications, 145: #113116
Xie C, Li P and Sun Y R. 2019. Pedestrian detection and location algorithm based on deep learning//Proceedings of 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). Changsha, China: IEEE: 582-585[ DOI: 10.1109/icitbs.2019.00145 http://dx.doi.org/10.1109/icitbs.2019.00145 ]
Xing J S, Tan W L and Bai J. 2019. Design of object edge detection system based on FPGA//Proceedings of 2019 Chinese Intelligent Systems Conference. Haikou, China: Springer: 194-202[ DOI: 10.1007/978-981-32-9698-5_22 http://dx.doi.org/10.1007/978-981-32-9698-5_22 ]
Xu J, Wang W, Wang H Y and Guo J H. 2020. Multi-model ensemble with rich spatial information for object detection. Pattern Recognition, 99: #107098[DOI:10.1016/j.patcog.2019.107098]
Yang R J, Wang F and Qin H. 2018. Research of pedestrian detection and location system based on stereo images. Application Research of Computers, 35(5): 1591-1595, 1600
杨荣坚, 王芳, 秦浩. 2018. 基于双目图像的行人检测与定位系统研究. 计算机应用研究, 35(5): 1591-1595, 1600[DOI:10.3969/j.issn.1001-3695.2018.05.068]
Zeng J X, Fang Q, Fu X and Leng L. 2019. Multi-scale pedestrian detection algorithm with multi-layer features. Journal of Image and Graphics, 24(10): 1683-1691
曾接贤, 方琦, 符祥, 冷璐. 2019. 融合多层特征的多尺度行人检测. 中国图象图形学报, 24(10): 1683-1691
相关作者
相关机构
京公网安备11010802024621