别倩1,2, 王晓1,2, 徐新1,2, 赵启军3, 王正4, 陈军4, 胡瑞敏4(1.武汉科技大学计算机科学与技术学院, 武汉 430065;2.武汉科技大学智能信息处理与实时工业系统湖北省重点实验室, 武汉 430065;3.四川大学视觉合成图形图像技术国家级重点实验室, 成都 610065;4.武汉大学多媒体网络通信工程湖北省重点实验室, 武汉 430072)
Visible-infrared cross-modal pedestrian detection: a summary
Bie Qian1,2, Wang Xiao1,2, Xu Xin1,2, Zhao Qijun3, Wang Zheng4, Chen Jun4, Hu Ruimin4(1.School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China;2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan 430065, China;3.National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China;4.Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China)
The precision of pedestrian detection is focused on instances-relevant location on given input images. However， due to the perception of visible images to light changes，visible images are challenged for lower visibility conditions like extreme weathers. Hence，visible images-based pedestrian detection is merely suitable for the development of temporal applications like autonomous driving and video surveillance. The infrared image can provide a clear pedestrian profile for such low-visibility scenes according to the temperature difference between the human body and the environment. Under the circumstances of sufficient light，visible images can also provide more information-lacked in infrared images like hair， face，and other related features. Visible and infrared images can provide visual information-added in common. However， the key challenges of visible and infrared images is to utilize the two modalities-between and their modality-specific noise mutually. To generate temperature information，the difference is leaked out that the visible image consists of color information in red，green，and blue（RGB）channels，while the infrared image has one channel only. And，imaging mechanismbased wavelength range of the two is different as well. The emerging deep learning technique based cross-modal pedestrian detection approaches have been developing dramatically. Our summary aims to review and analyze some popular researches on cross-modal pedestrian detection in recent years. It can be segmented into two categories：1）the difference between two different modalities and 2）the cross-modal detectors application to the real scene. The application of cross-modal pedestrian detectors to the actual scene can be divided into three types：cost analysis-related data annotation，real-time detection，and cost-analysis of applications. The research aspects between two modalities can be divided into：the misalignment and the inadequate fusion. The misalignment of two modalities shows that the visible-infrared image pairs are required to be strictly aligned，and the features from different modalities are called to match at corresponding positions. The inadequate fusion of two modalities is required to maximize the mutual benefits between two modalities. The early research on the insufficient fusion of two-modality is related to the study of the fusion stage（when to fuse）of two-modality. The later studies on the insufficient fusion of two-modality data are focused on the study of the fusion methods（how to fuse）of two-modality. The fusion stage can be divided into three steps：image，feature，and decision. Similarly，the fusion methods can be segmented into three categories：image，feature，and detection. Subsequently，we introduce some commonly used crossmodal pedestrian detection datasets，including the Korea Advanced Institute of Science and Technology （KAIST），the forward looking infrared radiometer（FLIR），the computer vision center-14（CVC-14），and the low-light visible-infrared parred（LLVIP）. Then，we introduce some evaluation metrics method for cross-modal pedestrian detectors，including missed rate（MR），mean average precision（mAP），and a pair of visible and thermal images in temporal （speed）. Finally，we summarize the challenges to be resolved in the field of cross-modal pedestrian detection and our predictions are focused on the future direction analysis of cross-modal pedestrian detection. 1）In the real world，due to the different parallax and field of view of two different sensors，the problem of misalignment of visible-infrared modality feature modules is more concerned about. However，the problem of unaligned modality features is possible to sacrifice the performance of the detector and hinder the use of unaligned data in datasets，and is not feasible to the application of dual sensors in real life to some extent. Thus，the problem of two modalities’position is to be resolved as a key research direction. 2）At present，the datasets of cross-modal pedestrian detection are all captured on sunny days，and current advanced cross-modal pedestrian detection methods are only based on all-day pedestrian detection on sunny days. However，to realize the cross-modal pedestrian detection system throughout all day and all weathers，it is required to optimize and beyond day and night data on sunny days. We also need to focus on the data under extreme weather conditions. 3）Recent studies on cross-modal pedestrian detection are focused on datasets captured by vehicle-mounted cameras. Compared to datasets captured from the monitoring perspective，the scenes of vehicle-mounted datasets are changeable，which can suppress over-fitting effectively. However，the nighttime images in the vehicle-mounted datasets may be brighter than those of the surveillance perspective datasets because of their headlight brightness at night. Therefore，we predict that multiple visual-angles datasets can be used to train the cross-modal pedestrian detector at the same time. It can not only increase the robustness of the model in darker scenes，but also suppress over-fitting at a certain scene. 4）Autonomous driving systems and robot systems are required to be quick responded for detection results. Although many models have fast inference ability on GPU（graphics processing unit），the inference speed on real devices need to be optimized，so real-time detection will be the continuous development direction of cross-modal pedestrian detection as well. 5）There is still a large gap in cross-modal pedestrian detection technology for small scale and partial or severe occluded pedestrians. However，driving systems-assisted detection and occlusion can be as a very common problem in life for small targets of pedestrians at a distance to alert drivers to slow down in advance. The cross-modal pedestrian detection technology can be forecasted and recognized for small scale targets and occlusion as the direction of future research.