陆泽早,彭刚,何顶新(华中科技大学自动化学院图像信息处理与智能控制教育部重点实验室, 武汉 430074)
目的 头肩检测由于抗遮挡能力强、计算需求低，常用于复杂场景中的人体检测。针对嵌入式头肩检测中常用的运动检测和手工模型匹配方法检测精度较低、对不同姿态和人体外观适应性较差的问题，提出了使用聚合通道特征的嵌入式实时人体头肩检测方法。方法 首先分析多种行人检测与人体姿态数据集，从中生成多姿态、多视角的人体头肩样本集；然后基于图像的聚合通道特征，使用AdaBoost算法通过多个阶段的训练，得到基于增强决策树的头肩图像分类器；接下来，在快速特征金字塔算法的基础上，针对ARM-Linux平台，利用多核并行和单指令多数据流技术，加速图像特征金字塔的计算；最后，进行多线程的滑动窗口检测，利用头肩图像分类器识别每个检测窗口，并通过非极大值抑制（NMS）算法优化检测结果。结果 重新标注了INRIA验证数据集中的头肩样本，采用本文训练得到的头肩图像分类器进行检测，通过样本漏检率、每图片平均误检率以及ROC（receiver operating characteristic）曲线评估检测效果。对INRIA数据集中高度≥50像素的头肩目标的对数平均漏检率为16.61%。此外，采集了不同场景中多种姿态、视角下的头肩图像，以验证分类器的适应性，结果表明该分类器能够良好检测多姿态、多视角、受遮挡以及不同光照情况下的头肩目标。但由于检测器感受野局限于头肩区域，对少量疑似头肩样本的图像区域存在误检测。在嵌入式平台（树莓派3B）中，经过优化的头肩检测程序，对640×480像素分辨率的图像，特征计算耗时约213 ms；对单个包含正样本的检测窗口，分类耗时约2 ms。整体检测效率能够满足视频流实时检测的需求。结论 本文基于聚合通道特征进行人体头肩检测，采用种类丰富、标注准确的头肩训练样本，使用AdaBoost算法学习头肩图像的聚合通道特征，得到的头肩图像分类器适应性强，硬件性能要求低，能够良好检测多视角、多姿态的人体头肩图像，并具备在嵌入式平台上实时检测视频流的能力，应用场景广泛。
Embedded real-time human head-shoulder detection based on aggregated channel features
Lu Zezao,Peng Gang,He Dingxin(Key Laboratory of Image Processing and Intelligent Control of Ministry of Education, School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China)
Objective Human body detection is a key subject of computer vision and has important research relevance in areas, such as intelligent video surveillance, unmanned driving, and intelligent robots. Head-shoulder detection is often used in embedded systems due to its strong anti-masking capabilities, attitude adaptability, and low computational requirements. Commonly used embedded head-shoulder detection methods mainly include motion detection and matching; however, these two methods have low detection accuracy and poor adaptability to different postures and human appearances. To improve the head-shoulder detection accuracy, an embedded real-time human head-shoulder detection method based on aggregated channel features (ACFs) is proposed. Method A variety of pedestrian detection and human pose datasets, namely, Caltech Pedestrian dataset, INRIA Pedestrian dataset, and MPⅡ Human Pose dataset, are analyzed to generate human head-shoulder sample. Suitable samples in MPⅡ Human Pose dataset are filtered. Then, head-shoulder areas are clipped accurately on the basis of the positions of head and neck joints, and a human head-shoulder dataset with varied head-shoulder poses and perspectives, named MPⅡ-HS, is generated. The MPⅡ-HS dataset is used as positive training samples. Images from Caltech and INRIA Pedestrian datasets, which do not contain humans are used as negative training samples. AdaBoost algorithm with multiple stages, which consist of one channel of gradient amplitude, six channels of gradient direction, and three channels in YUV color space, is used to train a head-shoulder classifier for a 40×40 pixels image based on ACFs. The trained classifier is an enhanced decision tree composed of 4 096 binary decision trees with a maximum depth of five. The final score of the classifier is the sum of the scores of every binary decision tree. The classification will end early if the score sum reaches a lower threshold to speed up detection. Image feature pyramid is calculated based on fast feature pyramid algorithm. For the Linux ARM platform, multi-core parallel techniques and single-instruction multiple-data instruction set are used to accelerate the calculation of image feature pyramid. Finally, sliding-window detection is applied in multiple threads where each thread handles one row of detection windows. The trained head-shoulder image classifier identifies candidate head-shoulder targets in every detection window, and candidate detection results are merged via non-maximum suppression algorithm. Result To estimate the accuracy of the proposed head-shoulder detector, head-shoulder targets in the validation set of INRIA Pedestrian dataset are re-labeled and named as INRIA-HS. The trained head-shoulder image classifier is applied. The detection results are evaluated by miss rate (MR) and false positive per picture (FPPI) in the receiver operating characteristic curve. The log-average MS for head-shoulder targets with a height of ≥ 50 pixels in INRIA-HS dataset is 16.61%, that of MR is lower than 20%, whereas the FPPI is 0.1. In addition, the head-shoulder images of various poses and perspectives in different scenes are collected in actual scenes to verify the adaptability of the proposed classifier. Results show that the proposed classifier can detect multi-pose, multi-perspective, and occluded head-shoulder target under different illumination conditions. However, the receptive field of the proposed classifier is limited to the head-shoulder area; thus, some image areas similar to that of the head-shoulder but not similar to the human body may be misclassified as positive. Thus, the FPPI of the proposed head-shoulder detection is slightly higher than that of the ACF classifier trained for all human body detection. However, the proposed head-shoulder classifier is suitable for occluded human in indoor and crowded scenes. In the embedded platform with quad-core ARM Cortex-A53 with 1.4 GHz Raspberry Pi 3B, the proposed optimized head-shoulder detection program takes approximately 178 ms for a 640×480 pixels image. For a single detection window containing positive samples, the classification takes approximately 2 ms. The overall detection speed can satisfy the demands of real-time detection of video streams. Conclusion Human head-shoulders are detected based on ACF. The generated head-shoulder dataset MPⅡ-HS has rich and varied head-shoulder samples with accurate annotations. The AdaBoost algorithm is used to learn the ACF of head-shoulder images. The trained head-shoulder image classifier has strong adaptability to different human poses or appearances. It benefits from the struct of classifier, and its hardware performance requirement is low. These advantages allow human head-shoulder detection accuracy possible on an embedded platform in a wide range.