使用聚合通道特征的嵌入式实时人体头肩检测
Embedded real-time human head-shoulder detection based on aggregated channel features
- 2019年24卷第4期 页码:523-535
收稿:2018-06-19,
修回:2018-8-10,
纸质出版:2019-04-16
DOI: 10.11834/jig.180387
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-06-19,
修回:2018-8-10,
纸质出版:2019-04-16
移动端阅览
目的
2
头肩检测由于抗遮挡能力强、计算需求低,常用于复杂场景中的人体检测。针对嵌入式头肩检测中常用的运动检测和手工模型匹配方法检测精度较低、对不同姿态和人体外观适应性较差的问题,提出了使用聚合通道特征的嵌入式实时人体头肩检测方法。
方法
2
首先分析多种行人检测与人体姿态数据集,从中生成多姿态、多视角的人体头肩样本集;然后基于图像的聚合通道特征,使用AdaBoost算法通过多个阶段的训练,得到基于增强决策树的头肩图像分类器;接下来,在快速特征金字塔算法的基础上,针对ARM-Linux平台,利用多核并行和单指令多数据流技术,加速图像特征金字塔的计算;最后,进行多线程的滑动窗口检测,利用头肩图像分类器识别每个检测窗口,并通过非极大值抑制(NMS)算法优化检测结果。
结果
2
重新标注了INRIA验证数据集中的头肩样本,采用本文训练得到的头肩图像分类器进行检测,通过样本漏检率、每图片平均误检率以及ROC(receiver operating characteristic)曲线评估检测效果。对INRIA数据集中高度≥50像素的头肩目标的对数平均漏检率为16.61%。此外,采集了不同场景中多种姿态、视角下的头肩图像,以验证分类器的适应性,结果表明该分类器能够良好检测多姿态、多视角、受遮挡以及不同光照情况下的头肩目标。但由于检测器感受野局限于头肩区域,对少量疑似头肩样本的图像区域存在误检测。在嵌入式平台(树莓派3B)中,经过优化的头肩检测程序,对640×480像素分辨率的图像,特征计算耗时约213 ms;对单个包含正样本的检测窗口,分类耗时约2 ms。整体检测效率能够满足视频流实时检测的需求。
结论
2
本文基于聚合通道特征进行人体头肩检测,采用种类丰富、标注准确的头肩训练样本,使用AdaBoost算法学习头肩图像的聚合通道特征,得到的头肩图像分类器适应性强,硬件性能要求低,能够良好检测多视角、多姿态的人体头肩图像,并具备在嵌入式平台上实时检测视频流的能力,应用场景广泛。
Objective
2
Human body detection is a key subject of computer vision and has important research relevance in areas
such as intelligent video surveillance
unmanned driving
and intelligent robots. Head-shoulder detection is often used in embedded systems due to its strong anti-masking capabilities
attitude adaptability
and low computational requirements. Commonly used embedded head-shoulder detection methods mainly include motion detection and matching; however
these two methods have low detection accuracy and poor adaptability to different postures and human appearances. To improve the head-shoulder detection accuracy
an embedded real-time human head-shoulder detection method based on aggregated channel features (ACFs) is proposed.
Method
2
A variety of pedestrian detection and human pose datasets
namely
Caltech Pedestrian dataset
INRIA Pedestrian dataset
and MPⅡ Human Pose dataset
are analyzed to generate human head-shoulder sample. Suitable samples in MPⅡ Human Pose dataset are filtered. Then
head-shoulder areas are clipped accurately on the basis of the positions of head and neck joints
and a human head-shoulder dataset with varied head-shoulder poses and perspectives
named MPⅡ-HS
is generated. The MPⅡ-HS dataset is used as positive training samples. Images from Caltech and INRIA Pedestrian datasets
which do not contain humans are used as negative training samples. AdaBoost algorithm with multiple stages
which consist of one channel of gradient amplitude
six channels of gradient direction
and three channels in YUV color space
is used to train a head-shoulder classifier for a 40×40 pixels image based on ACFs. The trained classifier is an enhanced decision tree composed of 4 096 binary decision trees with a maximum depth of five. The final score of the classifier is the sum of the scores of every binary decision tree. The classification will end early if the score sum reaches a lower threshold to speed up detection. Image feature pyramid is calculated based on fast feature pyramid algorithm. For the Linux ARM platform
multi-core parallel techniques and single-instruction multiple-data instruction set are used to accelerate the calculation of image feature pyramid. Finally
sliding-window detection is applied in multiple threads where each thread handles one row of detection windows. The trained head-shoulder image classifier identifies candidate head-shoulder targets in every detection window
and candidate detection results are merged via non-maximum suppression algorithm.
Result
2
To estimate the accuracy of the proposed head-shoulder detector
head-shoulder targets in the validation set of INRIA Pedestrian dataset are re-labeled and named as INRIA-HS. The trained head-shoulder image classifier is applied. The detection results are evaluated by miss rate (MR) and false positive per picture (FPPI) in the receiver operating characteristic curve. The log-average MS for head-shoulder targets with a height of ≥ 50 pixels in INRIA-HS dataset is 16.61%
that of MR is lower than 20%
whereas the FPPI is 0.1. In addition
the head-shoulder images of various poses and perspectives in different scenes are collected in actual scenes to verify the adaptability of the proposed classifier. Results show that the proposed classifier can detect multi-pose
multi-perspective
and occluded head-shoulder target under different illumination conditions. However
the receptive field of the proposed classifier is limited to the head-shoulder area; thus
some image areas similar to that of the head-shoulder but not similar to the human body may be misclassified as positive. Thus
the FPPI of the proposed head-shoulder detection is slightly higher than that of the ACF classifier trained for all human body detection. However
the proposed head-shoulder classifier is suitable for occluded human in indoor and crowded scenes. In the embedded platform with quad-core ARM Cortex-A53 with 1.4 GHz Raspberry Pi 3B
the proposed optimized head-shoulder detection program takes approximately 178 ms for a 640×480 pixels image. For a single detection window containing positive samples
the classification takes approximately 2 ms. The overall detection speed can satisfy the demands of real-time detection of video streams.
Conclusion
2
Human head-shoulders are detected based on ACF. The generated head-shoulder dataset MPⅡ-HS has rich and varied head-shoulder samples with accurate annotations. The AdaBoost algorithm is used to learn the ACF of head-shoulder images. The trained head-shoulder image classifier has strong adaptability to different human poses or appearances. It benefits from the struct of classifier
and its hardware performance requirement is low. These advantages allow human head-shoulder detection accuracy possible on an embedded platform in a wide range.
Dollar P, Wojek C, Schiele B, et al. Pedestrian detection:an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4):743-761.[DOI:10.1109/TPAMI.2011.155]
Jing Y, Liu L, Huang H Z, et al. Fast human detection based on HOG feature of head-shoulder model[J]. Video Engineering, 2014, 38(15):227-230.
景阳, 刘琳, 黄鸿志, 等.基于头肩HOG特征的快速行人检测[J].电视技术, 2014, 38(15):227-230. [DOI:10.3969/j.issn.1002-8692.2014.15.056]
Lin J, Ruan X G, Yu N G. Human recognition approach based on head-shoulder model[J]. Computer Measurement&Control, 2016, 24(12):205-208.
林佳, 阮晓钢, 于乃功.基于头肩模型的人体识别方法[J].计算机测量与控制, 2016, 24(12):205-208. [DOI:10.16526/j.cnki.11-4762/tp.2016.12.059]
Zhang H, An G C, Zhang F J, et al. Moving human detection algorithm with merging of multiple color space[J].Journal of Image and Graphics, 2011, 16(10):1944-1950.
张欢, 安国成, 张凤军, 等.多颜色空间融合的人体检测算法研究[J].中国图象图形学报, 2011, 16(10):1944-1950. [DOI:10.11834/jig.20111023]
Li N, Gong Y, Xu J L, et al. Semantic feature-based visual attention model for pedestrian detection[J]. Journal of Image and Graphics, 2016, 21(6):723-733.
黎宁, 龚元, 许莙苓, 等.视觉注意机制下结合语义特征的行人检测[J].中国图象图形学报, 2016, 21(6):723-733. [DOI:10.11834/jig.20160605]
Qi M B, Li J, Jiang J G, et al. Pedestrian detection based on improved feature and GPU acceleration[J]. Journal of Image and Graphics, 2018, 23(8):1171-1180.
齐美彬, 李佶, 蒋建国, 等.改进特征与GPU加速的行人检测[J].中国图象图形学报, 2018, 23(8):1171-1180. [DOI:10.11834/jig.170517]
Xiao H J, Zhao S G, Zhang L. Head-shoulder detection HOG-LBP-based feature fusion[J]. Microcomputer&Its Applications, 2015, 34(5):43-46, 50.
肖华军, 赵曙光, 张乐.基于HOG-LBP特征融合的头肩检测研究[J].微型机与应用, 2015, 34(5):43-46, 50. [DOI:10.3969/j.issn.1674-7720.2015.05.015]
Qin S, Xie G, Rao Q, et al. Head-shoulder detection based on LW-PGD and SVM in video[J]. Application Research of Computers, 2014, 31(3):949-952.
钦爽, 谢刚, 饶钦, 等.视频中基于LW-PGD和SVM的头肩部检测[J].计算机应用研究, 2014, 31(3):949-952. [DOI:10.3969/j.issn.1001-3695.2014.03.075]
Dollár P, Appel R, Belongie S, et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8):1532-1545.[DOI:10.1109/TPAMI.2014.2300479]
Liu T R, Stathaki T. Fast head-shoulder proposal for scare-aware pedestrian detection[C]//Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments. Island of Rhodes, Greece: ACM, 2017: 319-324.[ DOI:10.1145/3056540.3076202 http://dx.doi.org/10.1145/3056540.3076202 ]
Oren M, Papageorgiou C, Sinha P, et al. Pedestrian detection using wavelet templates[C]//Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Juan, Puerto Rico, USA: IEEE, 1997: 193-199.[ DOI:10.1109/CVPR.1997.609319 http://dx.doi.org/10.1109/CVPR.1997.609319 ]
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005, 1: 886-893.[ DOI:10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]
Andriluka M, Pishchulin L, Gehler P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]//Proceedings of 2014 IEEE Conference on computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 3686-3693.[ DOI:10.1109/CVPR.2014.471 http://dx.doi.org/10.1109/CVPR.2014.471 ]
Benenson R, Omran M, Hosang J, et al. Ten years of pedestrian detection, what have we learned?[C]//Proceedings of the Computer Vision-ECCV 2014 Workshops. Zurich, Switzerland: Springer, 2014: 613-627.[ DOI:10.1007/978-3-319-16181-5_47 http://dx.doi.org/10.1007/978-3-319-16181-5_47 ]
Dollár P, Belongie S J, Perona P. The fastest pedestrian detector in the west[C]//Proceedings of the British Machine Vision Conference 2010. Aberystwyth, UK: BMVA Press, 2010: #7.[ DOI:10.5244/C.24.68 http://dx.doi.org/10.5244/C.24.68 ]
Appel R, Fuchs T, Dollár P, et al. Quickly boosting decision trees: pruning underachieving features early[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, GA, USA: JMLR.org, 2013: Ⅲ-594-Ⅲ-602.
Zhang L L, Lin L, Liang X D, et al. Is faster R-CNN doing well for pedestrian detection?[C]//Proceedings of the 14th European Conference on Computer Vision - ECCV 2016. Amsterdam, The Netherlands: Springer, 2016: 443-457.[ DOI:10.1007/978-3-319-46475-6_28 http://dx.doi.org/10.1007/978-3-319-46475-6_28 ]
Du X Z, El-Khamy M, Lee J, et al. Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection[C]//Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision. Santa Rosa, CA, USA: IEEE, 2017: 953-961.[ DOI:10.1109/WACV.2017.111 http://dx.doi.org/10.1109/WACV.2017.111 ]
相关作者
相关机构
京公网安备11010802024621