Visible-infrared cross-modal pedestrian detection: a summary
- Vol. 28, Issue 5, Pages: 1287-1307(2023)
Published: 16 May 2023
DOI: 10.11834/jig.220670
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 May 2023 ,
移动端阅览
别倩, 王晓, 徐新, 赵启军, 王正, 陈军, 胡瑞敏. 2023. 红外—可见光跨模态的行人检测综述. 中国图象图形学报, 28(05):1287-1307
Bie Qian, Wang Xiao, Xu Xin, Zhao Qijun, Wang Zheng, Chen Jun, Hu Ruimin. 2023. Visible-infrared cross-modal pedestrian detection: a summary. Journal of Image and Graphics, 28(05):1287-1307
可见光图像在光照充足的条件下可以提供一系列辅助检测行人的信息,如颜色和纹理等信息,但在低照度场景下表现并不理想。红外图像虽然不能提供颜色和纹理信息,但红外图像根据热辐射差异成像而不依赖于光照条件这一特性,使其可以在低照度场景下有效区分行人区域与背景区域并提供清晰的行人轮廓信息。由于红外和可见光两种模态之间直观的互补性,同时使用红外和可见光图像的行人检测任务被认为是一个很有前景的研究方向,受到了广泛关注,大幅促进了在安防(如安全监控和自动驾驶)和疫情防控等领域应用的发展。本文对红外—可见光跨模态的行人检测工作进行全面梳理,并对未来方向进行深入思考。首先,该课题具有独特性质。可见光图像对应三通道的颜色信息而红外图像对应单通道的温差信息,如何在两种模态存在本质差异的前提下,充分利用二者的互补性是红外—可见光跨模态行人检测领域的核心挑战和主要任务。其次,近几年红外—可见光跨模态行人检测研究针对的问题可分为两类,即模态差异大和实际应用难。针对模态差异大的问题,可分为图像未对准和融合不充分两类问题。针对实际应用难的问题,又分为标注成本、实时检测和硬件成本3类问题。本文依次对跨模态行人检测的主要研究方向展开细致且全面的描述并进行相应的总结。然后,详细地介绍与跨模态行人检测相关的数据集和评价指标,并以不同的评价指标对相关方法在不同层面上进行比较。最后,对跨模态行人检测领域存在的且尚未解决的问题进行讨论,并提出对未来相关工作方向的一些思考。
The precision of pedestrian detection is focused on instances-relevant location on given input images. However, due to the perception of visible images to light changes, visible images are challenged for lower visibility conditions like extreme weathers. Hence, visible images-based pedestrian detection is merely suitable for the development of temporal applications like autonomous driving and video surveillance. The infrared image can provide a clear pedestrian profile for such low-visibility scenes according to the temperature difference between the human body and the environment. Under the circumstances of sufficient light, visible images can also provide more information-lacked in infrared images like hair, face, and other related features. Visible and infrared images can provide visual information-added in common. However, the key challenges of visible and infrared images is to utilize the two modalities-between and their modality-specific noise mutually. To generate temperature information, the difference is leaked out that the visible image consists of color information in red, green, and blue (RGB) channels, while the infrared image has one channel only. And, imaging mechanism-based wavelength range of the two is different as well. The emerging deep learning technique based cross-modal pedestrian detection approaches have been developing dramatically. Our summary aims to review and analyze some popular researches on cross-modal pedestrian detection in recent years. It can be segmented into two categories: 1) the difference between two different modalities and 2) the cross-modal detectors application to the real scene. The application of cross-modal pedestrian detectors to the actual scene can be divided into three types: cost analysis-related data annotation, real-time detection, and cost-analysis of applications. The research aspects between two modalities can be divided into: the misalignment and the inadequate fusion. The misalignment of two modalities shows that the visible-infrared image pairs are required to be strictly aligned, and the features from different modalities are called to match at corresponding positions. The inadequate fusion of two modalities is required to maximize the mutual benefits between two modalities. The early research on the insufficient fusion of two-modality is related to the study of the fusion stage (when to fuse) of two-modality. The later studies on the insufficient fusion of two-modality data are focused on the study of the fusion methods (how to fuse) of two-modality. The fusion stage can be divided into three steps: image, feature, and decision. Similarly, the fusion methods can be segmented into three categories: image, feature, and detection. Subsequently, we introduce some commonly used cross-modal pedestrian detection datasets, including the Korea Advanced Institute of Science and Technology(KAIST), the forward looking infrared radiometer(FLIR), the computer vision center-14(CVC-14), and the low-light visible-infrared parred(LLVIP). Then, we introduce some evaluation metrics method for cross-modal pedestrian detectors, including missed rate (MR), mean average precision (mAP), and a pair of visible and thermal images in temporal (speed). Finally, we summarize the challenges to be resolved in the field of cross-modal pedestrian detection and our predictions are focused on the future direction analysis of cross-modal pedestrian detection. 1) In the real world, due to the different parallax and field of view of two different sensors, the problem of misalignment of visible-infrared modality feature modules is more concerned about. However, the problem of unaligned modality features is possible to sacrifice the performance of the detector and hinder the use of unaligned data in datasets, and is not feasible to the application of dual sensors in real life to some extent. Thus, the problem of two modalities’ position is to be resolved as a key research direction. 2) At present, the datasets of cross-modal pedestrian detection are all captured on sunny days, and current advanced cross-modal pedestrian detection methods are only based on all-day pedestrian detection on sunny days. However, to realize the cross-modal pedestrian detection system throughout all day and all weathers, it is required to optimize and beyond day and night data on sunny days. We also need to focus on the data under extreme weather conditions. 3) Recent studies on cross-modal pedestrian detection are focused on datasets captured by vehicle-mounted cameras. Compared to datasets captured from the monitoring perspective, the scenes of vehicle-mounted datasets are changeable, which can suppress over-fitting effectively. However, the nighttime images in the vehicle-mounted datasets may be brighter than those of the surveillance perspective datasets because of their headlight brightness at night. Therefore, we predict that multiple visual-angles datasets can be used to train the cross-modal pedestrian detector at the same time. It can not only increase the robustness of the model in darker scenes, but also suppress over-fitting at a certain scene. 4) Autonomous driving systems and robot systems are required to be quick responded for detection results. Although many models have fast inference ability on GPU(graphics processing unit), the inference speed on real devices need to be optimized, so real-time detection will be the continuous development direction of cross-modal pedestrian detection as well. 5) There is still a large gap in cross-modal pedestrian detection technology for small scale and partial or severe occluded pedestrians. However, driving systems-assisted detection and occlusion can be as a very common problem in life for small targets of pedestrians at a distance to alert drivers to slow down in advance. The cross-modal pedestrian detection technology can be forecasted and recognized for small scale targets and occlusion as the direction of future research.
跨模态行人检测可见光图像红外图像深度学习行人检测
cross-modal pedestrian detectionvisible imageinfrared imagedeep learningpedestrian detection
Andreas M, Chris B, Thomas B M, Albert C and Sergio E. 2013. Tri-modal person re-identification with RGB, depth and thermal features//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. Portland, USA: IEEE: 301-307 [DOI: 10.1109/cvprw.2013.52http://dx.doi.org/10.1109/cvprw.2013.52]
Beleznai C, Steininger D, Croonen G and Broneder E. 2018. Multi-modal human detection from aerial views by fast shape-aware clustering and classification//Proceedings of the 10th IAPR Workshop on Pattern Recognition in Remote Sensing. Beijing, China: IEEE: 1-6 [DOI: 10.1109/prrs.2018.8486236http://dx.doi.org/10.1109/prrs.2018.8486236]
Blondel P, Potelle A, Pégard C and Lozano R. 2014. Fast and viewpoint robust human detection in uncluttered environments//Proceedings of 2014 IEEE Visual Communications and Image Processing Conference. Valletta, Malta: IEEE: 522-525 [DOI: 10.1109/vcip.2014.7051621http://dx.doi.org/10.1109/vcip.2014.7051621]
Bondi E, Jain R, Aggrawal P, Anand S, Hannaford R, Kapoor A, Piavis J, Shah S, Joppa L, Dilkina B and Tambe M. 2020. BIRDSAI: a dataset for detection and tracking in aerial thermal infrared videos//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 1736-1745 [DOI: 10.1109/wacv45572.2020.9093284http://dx.doi.org/10.1109/wacv45572.2020.9093284]
Bozcan I and Kayacan E. 2020. AU-AIR: a multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE: 8504-8510 [DOI: 10.1109/ICRA40945.2020.9196845http://dx.doi.org/10.1109/ICRA40945.2020.9196845]
Burt P J and Adelson E H. 1983. The Laplacian pyramid as a compact image code//Fischler M A and Firschein O, eds. Readings in Computer Vision. Los Altos, USA: Morgan Kaufmann: 671-679 [DOI: 10.1016/B978-0-08-051581-6.50065-9http://dx.doi.org/10.1016/B978-0-08-051581-6.50065-9]
Candès E, Demanet L, Donoho D and Ying L X. 2006. Fast discrete curvelet transforms. Multiscale Modeling and Simulation, 5(3): 861-899 [DOI: 10.1137/05064182Xhttp://dx.doi.org/10.1137/05064182X]
Cao Y P, Guan D Y, Huang W L, Yang J X, Cao Y L and Qiao Y. 2019a. Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Information Fusion, 46: 206-217 [DOI: 10.1016/j.inffus.2018.06.005http://dx.doi.org/10.1016/j.inffus.2018.06.005]
Cao Y P, Guan D Y, Wu Y L, Yang J X, Cao Y L and Yang M Y. 2019b. Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS Journal of Photogrammetry and Remote Sensing, 150: 70-79 [DOI: 10.1016/j.isprsjprs.2019.02.005http://dx.doi.org/10.1016/j.isprsjprs.2019.02.005]
Choi E J and Park D J. 2010. Human detection using image fusion of thermal and visible image with new joint bilateral filter//Proceedings of the 5th International Conference on Computer Sciences and Convergence Information Technology. Seoul, Korea (South): IEEE: 882-885 [DOI: 10.1109/iccit.2010.5711182http://dx.doi.org/10.1109/iccit.2010.5711182]
Choi H, Kim S, Park K and Sohn K. 2016. Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks//Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico: IEEE: 621-626 [DOI: 10.1109/icpr.2016.7899703http://dx.doi.org/10.1109/icpr.2016.7899703]
Connah D, Drew M S and Finlayson G D. 2015. Spectral edge: gradient-preserving spectral mapping for image fusion. Journal of the Optical Society of America A, 32(12): 2384-2396 [DOI: 10.1364/josaa.32.002384http://dx.doi.org/10.1364/josaa.32.002384]
Davis J W and Sharma V. 2007. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, 106(2/3): 162-182 [DOI: 10.1016/j.cviu.2006.06.010http://dx.doi.org/10.1016/j.cviu.2006.06.010]
de Oliveira D C and Wehrmeister M A. 2016. Towards real-time people recognition on aerial imagery using convolutional neural networks//The 19th IEEE International Symposium on Real-Time Distributed Computing. York, UK: IEEE: 27-34 [DOI: 10.1109/isorc.2016.14http://dx.doi.org/10.1109/isorc.2016.14]
de Oliveira D C and Wehrmeister M A. 2018. Using deep learning and low-cost RGB and thermal cameras to detect pedestrians in aerial images captured by multirotor UAV. Sensors, 18(7): #2244 [DOI: 10.3390/s18072244http://dx.doi.org/10.3390/s18072244]
Deng Z and Latecki L J. 2017. Amodal detection of 3D objects: inferring 3D bounding boxes from 2D ones in RGB-depth images//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: 398-406 [DOI: 10.1109/cvpr.2017.50http://dx.doi.org/10.1109/cvpr.2017.50]
Dollar P, Wojek C, Schiele B and Perona P. 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4): 743-761 [DOI: 10.1109/tpami.2011.155http://dx.doi.org/10.1109/tpami.2011.155]
Du D W, Qi Y K, Yu H Y, Yang Y F, Duan K W, Li G R, Zhang W G, Huang Q M and Tian Q. 2018. The unmanned aerial vehicle benchmark: object detection and tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 375-391 [DOI: 10.1007/978-3-030-01249-6_23http://dx.doi.org/10.1007/978-3-030-01249-6_23]
Fang Q Y and Wang Z K. 2022. Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery. Pattern Recognition, 130: #108786 [DOI: 10.1016/j.patcog.2022.108786http://dx.doi.org/10.1016/j.patcog.2022.108786]
Farooq A, Jalal A and Kamal S. 2015. Dense RGB-D map-based human tracking and activity recognition using skin joints features and self-organizing map. KSII Transactions on Internet and Information Systems, 9(5): 1856-1869 [DOI: 10.3837/tiis.2015.05.017http://dx.doi.org/10.3837/tiis.2015.05.017]
French G, Finlayson G and Mackiewicz M. 2018. Multi-spectral pedestrian detection via image fusion and deep neural networks. Journal of Imaging Science and Technology, 62(5): #50406 [DOI: 10.2352/J.ImagingSci.Technol.2018.62.5.050406http://dx.doi.org/10.2352/J.ImagingSci.Technol.2018.62.5.050406]
F. A. Group. 2018. Flir Thermal Dataset for Algorithm Training [EB/OL]. [2023-02-26]. https://www.flir.in/oem/adas/adas-dataset-form/https://www.flir.in/oem/adas/adas-dataset-form/.
Golcarenarenji G, Martinez-Alpiste I, Wang Q and Alcaraz-Calero J M. 2021. Efficient real-time human detection using unmanned aerial vehicles optical imagery. International Journal of Remote Sensing, 42(7): 2440-2462 [DOI: 10.1080/01431161.2020.1862435http://dx.doi.org/10.1080/01431161.2020.1862435]
Gonzlez A, Fang Z J, Socarras Y, Serrat J, Vázquez D, Xu J L and López A M. 2016. Pedestrian detection at day/night time with visible and fir cameras: a comparison. Sensors, 16(6): #820 [DOI: 10.3390/s16060820http://dx.doi.org/10.3390/s16060820]
Guan D Y, Cao Y P, Yang J X, Cao Y L and Tisse C L. 2018. Exploiting fusion architectures for multispectral pedestrian detection and segmentation. Applied Optics, 57(18): 108-116 [DOI: 10.1364/ao.57.00d108http://dx.doi.org/10.1364/ao.57.00d108]
Guan D Y, Cao Y P, Yang J X, Cao Y L and Yang M Y. 2019a. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion, 50: 148-157 [DOI: 10.1016/j.inffus.2018.11.017http://dx.doi.org/10.1016/j.inffus.2018.11.017]
Guan D Y, Luo X, Cao Y P, Yang J X, Cao Y L, Vosselman G and Yang M Y. 2019b. Unsupervised domain adaptation for multispectral pedestrian detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE: 434-443 [DOI: 10.1109/cvprw.2019.00057http://dx.doi.org/10.1109/cvprw.2019.00057]
Gupta S, Girshick R, Arbeláez P and Malik J. 2014. Learning rich features from RGB-D images for object detection and segmentation//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 345-360 [DOI: 10.1007/978-3-319-10584-0_23http://dx.doi.org/10.1007/978-3-319-10584-0_23]
He R, Wu X, Sun Z A and Tan T N. 2017. Learning invariant deep representation for NIR-VIS face recognition//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 2000-2006 [DOI: 10.5555/3298483.3298529http://dx.doi.org/10.5555/3298483.3298529]
Hou Y L, Song Y Y, Hao X L, Shen Y, Qian M Y and Chen H J. 2018. Multispectral pedestrian detection based on deep convolutional neural networks. Infrared Physics and Technology, 94: 69-77 [DOI: 10.1016/j.infrared.2018.08.029http://dx.doi.org/10.1016/j.infrared.2018.08.029]
Hsieh M R, Lin Y L and Hsu W H. 2017. Drone-based object counting by spatially regularized regional proposal network//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4165-4173 [DOI: 10.1109/iccv.2017.446http://dx.doi.org/10.1109/iccv.2017.446]
Hwang S, Park J, Kim N, Choi Y and Kweon I S. 2015. Multispectral pedestrian detection: benchmark dataset and baseline//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1037-1045 [DOI: 10.1109/CVPR.2015.7298706http://dx.doi.org/10.1109/CVPR.2015.7298706]
Jia X Y, Zhu C, Li M Z, Tang W Q and Zhou W L. 2021. LLVIP: a visible-infrared paired dataset for low-light vision//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE: 3489-3497 [DOI: 10.1109/iccvw54120.2021.00389http://dx.doi.org/10.1109/iccvw54120.2021.00389]
Jiang T T, Liu Y, Ma X and Sun J L. 2021. Multi-path collaborative salient object detection based on RGB-T images. Journal of Image and Graphics, 26(10): 2388-2399
蒋亭亭, 刘昱, 马欣, 孙景林. 2021. 多支路协同的RGB-T图像显著性目标检测. 中国图象图形学报, 26(10): 2388-2399 [DOI: 10.11834/jig.200317http://dx.doi.org/10.11834/jig.200317]
Kim J U, Park S and Ro Y M. 2022a. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(3): 1510-1523 [DOI: 10.1109/tcsvt.2021.3076466http://dx.doi.org/10.1109/tcsvt.2021.3076466]
Kim J U, Park S and Ro Y M. 2022b. Towards versatile pedestrian detector with multisensory-matching and multispectral recalling memory//Proceedings of the 36th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 1157-1165 [DOI: 10.1609/aaai.v36i1.20001http://dx.doi.org/10.1609/aaai.v36i1.20001]
König D, Adam M, Jarvers C, Layher G, Neumann H and Teutsch M. 2017. Fully convolutional region proposal networks for multispectral person detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 243-250 [DOI: 10.1109/cvprw.2017.36http://dx.doi.org/10.1109/cvprw.2017.36]
Kruthiventi S S S, Sahay P and Biswal R. 2017. Low-light pedestrian detection from RGB images using multi-modal knowledge distillation//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE: 4207-4211 [DOI: 10.1109/icip.2017.8297075http://dx.doi.org/10.1109/icip.2017.8297075]
Kumar S V A, Yaghoubi E, Das A, Harish B S and Proença H. 2021. The P-DESTRE: a fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices. IEEE Transactions on Information Forensics and Security, 16: 1696-1708 [DOI: 10.1109/tifs.2020.3040881http://dx.doi.org/10.1109/tifs.2020.3040881]
Lee Y, Bui T D and Shin J. 2018. Pedestrian detection based on deep fusion network using feature correlation//Proceedings of 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Honolulu, USA: IEEE: 694-699 [DOI: 10.23919/apsipa.2018.8659688http://dx.doi.org/10.23919/apsipa.2018.8659688]
Lee Y, Chen J C, Tseng C W and Lai S H. 2016. Accurate and robust face recognition from RGB-D images with a deep learning approach//Proceedings of 2016 British Machine Vision Conference. York, UK: BMVA Press: #3 [DOI: 10.5244/c.30.123http://dx.doi.org/10.5244/c.30.123]
Leykin A, Ran Y and Hammoud R. 2007. Thermal-visible video fusion for moving target tracking and pedestrian classification//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8 [DOI: 10.1109/cvpr.2007.383444http://dx.doi.org/10.1109/cvpr.2007.383444]
Li C Y, Song D, Tong R F and Tang M. 2018. Multispectral pedestrian detection via simultaneous detection and segmentation//Proceedings of British Machine Vision Conference 2018. Newcastle, UK: BMVA Press: #225
Li C Y, Song D, Tong R F and Tang M. 2019. Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition, 85: 161-171 [DOI: 10.1016/j.patcog.2018.08.005http://dx.doi.org/10.1016/j.patcog.2018.08.005]
Li D G, Wei X, Hong X P and Gong Y H. 2020a. Infrared-visible cross-modal person re-identification with an X modality//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI: 4610-4617 [DOI: 10.1609/aaai.v34i04.5891http://dx.doi.org/10.1609/aaai.v34i04.5891]
Li G F, Yang Y F and Qu X D. 2020b. Deep learning approaches on pedestrian detection in hazy weather. IEEE Transactions on Industrial Electronics, 67(10): 8889-8899 [DOI: 10.1109/tie.2019.2945295http://dx.doi.org/10.1109/tie.2019.2945295]
Li J H, Zhang P, Wang X W and Huang S Z. 2020. Infrared small-target detection algorithms: a survey. Journal of Image and Graphics, 25(9): 1739-1753
李俊宏, 张萍, 王晓玮, 黄世泽. 2020. 红外弱小目标检测算法综述. 中国图象图形学报, 25(9): 1739-1753 [DOI: 10.11834/jig.190574http://dx.doi.org/10.11834/jig.190574]
Li Y T. 2021. Multispectral pedestrian detection in autonomous driving: a review. IEIE Transactions on Smart Processing and Computing, 10(1): 10-16 [DOI: 10.5573/ieiespc.2021.10.1.010http://dx.doi.org/10.5573/ieiespc.2021.10.1.010]
Lin C Z, Lu J W, Wang G and Zhou J. 2018. Graininess-aware deep feature learning for pedestrian detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 745-761 [DOI: 10.1007/978-3-030-01240-3_45http://dx.doi.org/10.1007/978-3-030-01240-3_45]
Liu J J, Zhang S T, Wang S and Metaxas N D. 2016a. Multispectral deep neural networks for pedestrian detection//Proceedings of British Machine Vision Conference 2016. York, UK: BMVA Press:73.1-73.13
Liu T S, Lam K M, Zhao R and Qiu G P. 2022. Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(1): 315-329 [DOI: 10.1109/tcsvt.2021.3060162http://dx.doi.org/10.1109/tcsvt.2021.3060162]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016b. SSD: single shot multiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Liu W, Liao S C, Ren W Q, Hu W D and Yu Y N. 2019. High-level semantic feature detection: a new perspective for pedestrian detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5182-5191 [DOI: 10.1109/cvpr.2019.00533http://dx.doi.org/10.1109/cvpr.2019.00533]
Liu W J, Dong L B and Qu H C. 2021. Small-scale pedestrian detection based on improved R-FCN model. Journal of Image and Graphics, 26(10): 2400-2410
刘万军, 董利兵, 曲海成. 2021. 改进R-FCN模型的小尺度行人检测. 中国图象图形学报, 26(10): 2400-2410 [DOI: 10.11834/jig.200287http://dx.doi.org/10.11834/jig.200287]
Lyu C, Heyer P, Munir A, Platisa L, Micheloni C, Goossens B and Philips W. 2021. Visible-thermal pedestrian detection via unsupervised transfer learning//Proceedings of the 5th International Conference on Innovation in Artificial Intelligence. Xiamen, China: ACM: 158-163 [DOI: 10.1145/3461353.3461369http://dx.doi.org/10.1145/3461353.3461369]
Ouyang W L and Wang X G. 2013. Joint deep learning for pedestrian detection//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 2056-2063 [DOI: 10.1109/iccv.2013.257http://dx.doi.org/10.1109/iccv.2013.257]
Park K, Kim S and Sohn K. 2018. Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognition, 80: 143-155 [DOI: 10.1016/j.patcog.2018.03.007http://dx.doi.org/10.1016/j.patcog.2018.03.007]
Ranchin T and Wald L. 1993. The wavelet transform for the analysis of remotely sensed images. International Journal of Remote Sensing, 14(3): 615-619 [DOI: 10.1080/01431169308904362http://dx.doi.org/10.1080/01431169308904362]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6517-6525 [DOI: 10.1109/cvpr.2017.690http://dx.doi.org/10.1109/cvpr.2017.690]
Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement [EB/OL]. [2018-04-08]. https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
San-Biagio M, Crocco M, Cristani M, Martelli S and Murino V. 2012. Low-level multimodal integration on Riemannian manifolds for automatic pedestrian detection//Proceedings of the 15th International Conference on Information Fusion. Singapore , Singapore: IEEE: 2223-2229
Shao Y H, Zhang X P, Chu H Y, Zhang X Q, Zhang D and Rao Y B. 2022. AIR-YOLOv3: aerial infrared pedestrian detection via an improved YOLOv3 with network pruning. Applied Sciences, 12(7): #3627 [DOI: 10.3390/app12073627http://dx.doi.org/10.3390/app12073627]
Shao Z F, Cheng G, Ma J Y, Wang Z Y, Wang J M and Li D R. 2021. Real-time and accurate UAV pedestrian detection for social distancing monitoring in COVID-19 pandemic. IEEE Transactions on Multimedia, 24: 2069-2083 [DOI: 10.1109/TMM.2021.3075566http://dx.doi.org/10.1109/TMM.2021.3075566]
Song S R and Xiao J X. 2016. Deep sliding shapes for Amodal 3D object detection in RGB-D images//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 808-816 [DOI: 10.1109/cvpr.2016.94http://dx.doi.org/10.1109/cvpr.2016.94]
Sun Y M, Cao B, Zhu P F and Hu Q H. 2022. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Transactions on Circuits sand Systems for Video Technology, 32(10): 6700-6713 [DOI: 10.1109/TCSVT.2022.3168279http://dx.doi.org/10.1109/TCSVT.2022.3168279]
Takumi K, Watanabe K, Ha Q S, Tejero-De-Pablos A, Ushiku Y and Harada T. 2017. Multispectral object detection for autonomous vehicles//Proceedings of on Thematic Workshops of ACM Multimedia 2017. California, USA: ACM: 35-43 [DOI: 10.1145/3126686.3126727http://dx.doi.org/10.1145/3126686.3126727]
Tian Y L, Luo P, Wang X G and Tang X O. 2015. Deep learning strong parts for pedestrian detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, USA: IEEE: 1904-1912 [DOI: 10.1109/iccv.2015.221http://dx.doi.org/10.1109/iccv.2015.221]
Vandersteegen M, van Beeck K and Goedemé T. 2018. Real-time multispectral pedestrian detection with a single-pass deep neural network//Proceedings of the 15th International Conference on Image Analysis and Recognition. Póvoa de Varzim, Portugal: Springer: 419-426 [DOI: 10.1007/978-3-319-93000-8_47http://dx.doi.org/10.1007/978-3-319-93000-8_47]
Vladimir V K, Vladimir A K, Jiri H, Walter G K and Vladimir M. 2019. Thermal GAN: multimodal color-to-thermal image translation for person re-identification in multispectral dataset. Computer Vision ECCV 2018 Workshops, 606-624 [ DOI:10.1007/978-3-030-11024-6_46]
Wagner J, Fischer V, Herman M and Behnke S. 2016. Multispectral pedestrian detection using deep fusion convolutional neural networks//The 24th European Symposium on Artificial Neural Networks. Bruges, Belgium: ESANN: 509-514
Wanchaitanawong N, Tanaka M, Shibata T and Okutomi M. 2021. Multi-modal pedestrian detection with large misalignment based on modal-wise regression and multi-modal IoU//Proceedings of the 17th International Conference on Machine Vision and Applications. Aichi, Japan: IEEE: 1-6 [DOI: 10.23919/mva51890.2021.9511366http://dx.doi.org/10.23919/mva51890.2021.9511366]
Wang C Q, Luo D, Liu Y, Xu B and Zhou Y J. 2022. Near-surface pedestrian detection method based on deep learning for UAVs in low illumination environments. Optical Engineering, 61(2): #023103 [DOI: 10.1117/1.OE.61.2.023103http://dx.doi.org/10.1117/1.OE.61.2.023103]
Wu A C, Lin C Z and Zheng W S. 2022. Single-modality self-supervised information mining for cross-modality person re-identification. Journal of Image and Graphics, 27(10): 2843-2859
吴岸聪, 林城梽, 郑伟诗. 2022. 面向跨模态行人重识别的单模态自监督信息挖掘. 中国图象图形学报, 27(10): 2843-2859 [DOI: 10.11834/jig.211050http://dx.doi.org/10.11834/jig.211050]
Wu F, Jing X Y, Feng Y J, Ji Y M and Wang R C. 2021. Spectrum-aware discriminative deep feature learning for multi-spectral face recognition. Pattern Recognition, 111: #107632 [DOI: 10.1016/j.patcog.2020.107632http://dx.doi.org/10.1016/j.patcog.2020.107632]
Xu D, Ouyang W L, Ricci E, Wang X G and Sebe N. 2017. Learning cross-modal deep representations for robust pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4236-4244 [DOI: 10.1109/cvpr.2017.451http://dx.doi.org/10.1109/cvpr.2017.451]
Xu D F, Anguelov D and Jain A. 2018. Pointfusion: deep sensor fusion for 3D bounding box estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 244-253 [DOI: 10.1109/cvpr.2018.00033http://dx.doi.org/10.1109/cvpr.2018.00033]
Xu M, Yu X S, Chen D Y, Wu C D, Jia T and Ru J Y. 2018. Pedestrian detection in complex thermal infrared surveillance scene. Journal of Image and Graphics, 23(12): 1829-1837
许茗, 于晓升, 陈东岳, 吴成东, 贾同, 茹敬雨. 2018. 复杂热红外监控场景下行人检测. 中国图象图形学报, 23(12): 1829-1837 [DOI: 10.11834/jig.180299http://dx.doi.org/10.11834/jig.180299]
Xu X W, Zhang X Y, Yu B, Hu X S, Rowen C, Hu J T and Shi Y Y. 2019. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2): 392-403 [DOI: 10.1109/tpami.2019.2932429http://dx.doi.org/10.1109/tpami.2019.2932429]
Yang F, Fan H, Chu P, Blasch E and Ling H B. 2019. Clustered object detection in aerial images//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8311-8320 [DOI: 10.1109/iccv.2019.00840http://dx.doi.org/10.1109/iccv.2019.00840]
Yang X X, Qian Y Q, Zhu H J, Wang C X and Yang M. 2022. BAANet: learning bi-directional adaptive attention gates for multispectral pedestrian detection//Proceedings of 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE: 2920-2926 [DOI: 10.1109/ICRA46639.2022.9811999http://dx.doi.org/10.1109/ICRA46639.2022.9811999]
Ye M, Lan X Y, Li J W and Yuen P. 2018. Hierarchical discriminative learning for visible thermal person re-identification//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI: 7501-7508 [DOI: 10.1609/aaai.v32i1.12293http://dx.doi.org/10.1609/aaai.v32i1.12293]
Zhang H, Fromont E, Lefevre S and Avignon B. 2020a. Multispectral fusion for object detection with cyclic fuse-and-refine blocks//Proceedings of 2020 IEEE International Conference on Image Processing. Abu Dhabi, United Arab Emirates: IEEE: 276-280 [DOI: 10.1109/icip40778.2020.9191080http://dx.doi.org/10.1109/icip40778.2020.9191080]
Zhang H, Fromont E, Lefevre S and Avignon B. 2021a. Guided attentive feature fusion for multispectral pedestrian detection//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 72-80 [DOI: 10.1109/wacv48630.2021.00012http://dx.doi.org/10.1109/wacv48630.2021.00012]
Zhang H, Fromont E, Lefevre S and Avignon B. 2021b. Deep active learning from multispectral data through cross-modality prediction inconsistency//Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage, USA: IEEE: 449-453 [DOI: 10.1109/icip42928.2021.9506322http://dx.doi.org/10.1109/icip42928.2021.9506322]
Zhang H J, Sun M S, Li Q, Liu L L, Liu M and Ji Y Z. 2021c. An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing, 421: 173-182 [DOI: 10.1016/j.neucom.2020.08.074http://dx.doi.org/10.1016/j.neucom.2020.08.074]
Zhang L, Liu Z Y, Zhang S F, Yang X, Qiao H, Huang K Z and Hussain A. 2019b. Cross-modality interactive attention network for multispectral pedestrian detection. Information Fusion, 50: 20-29 [DOI: 10.1016/j.inffus.2018.09.015http://dx.doi.org/10.1016/j.inffus.2018.09.015]
Zhang L, Zhu X Y, Chen X Y, Yang X, Lei Z and Liu Z Y. 2019a. Weakly aligned cross-modal learning for multispectral pedestrian detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5126-5136 [DOI: 10.1109/ICCV.2019.00523http://dx.doi.org/10.1109/ICCV.2019.00523]
Zhang Y T, Yin Z S, Nie L Z and Huang S. 2020b. Attention based multi-layer fusion of multispectral images for pedestrian detection. IEEE Access, 8: 165071-165084 [DOI: 10.1109/access.2020.3022623http://dx.doi.org/10.1109/access.2020.3022623]
Zhao C R, Qi D, Dou S G, Tu Y P, Sun T L, Bai S, Jiang X Y, Bai X and Miao D Q. 2021. Key technology for intelligent video surveillance: a review of person re-identification. Science in China (Information Sciences), 51(12): 1979-2015
赵才荣, 齐鼎, 窦曙光, 涂远鹏, 孙添力, 柏松, 蒋忻洋, 白翔, 苗夺谦. 2021. 智能视频监控关键技术: 行人再识别研究综述. 中国科学: 信息科学), 51(12): 1979-2015 [DOI: 10.1360/SSI-2021-021http://dx.doi.org/10.1360/SSI-2021-021]
Zhao Y Q, Rao Y, Dong S P and Zhang J Y. 2020. Survey on deep learning object detection. Journal of Image and Graphics, 25(4): 629-654
赵永强, 饶元, 董世鹏, 张君毅. 2020. 深度学习目标检测方法综述. 中国图象图形学报, 25(4): 629-654 [DOI: 10.11834/jig.190307http://dx.doi.org/10.11834/jig.190307]
Zhen Y, Wang Z L and Wu F. 2021. Pedestrian detection method based on density and score refinement. Journal of Image and Graphics, 26(2): 425-437
甄烨, 王子磊, 吴枫. 2021. 融合密度和精细分数的行人检测. 中国图象图形学报, 26(2): 425-437 [DOI: 10.11834/jig.200060http://dx.doi.org/10.11834/jig.200060]
Zheng W S and Wu A C. 2018. Asymmetric person re-identification: cross-view person tracking in a large camera network. Scientia Sinica Informationis, 48(5): 545-563
郑伟诗, 吴岸聪. 2018. 非对称行人重识别: 跨摄像机持续行人追踪. 中国科学: 信息科学), 48(5): 545-563 [DOI: 10.1360/N112018-00017http://dx.doi.org/10.1360/N112018-00017]
Zheng Y, Izzat I H and Ziaee S. 2019. GFD-SSD: gated fusion double SSD for multispectral pedestrian detection [EB/OL]. [2019-03-16]. https://arxiv.org/ftp/arxiv/papers/1903/1903.06999.pdfhttps://arxiv.org/ftp/arxiv/papers/1903/1903.06999.pdf
Zhou K L, Chen L S and Cao X. 2020. Improving multispectral pedestrian detection by addressing modality imbalance problems//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 787-803 [DOI: 10.1007/978-3-030-58523-5_46http://dx.doi.org/10.1007/978-3-030-58523-5_46]
Zhu P F, Wen L Y, Bian X, Ling H B and Hu Q H. 2018. Vision meets drones: a challenge [EB/OL]. [2020-04-20]. https://arxiv.org/pdf/1804.07437.pdfhttps://arxiv.org/pdf/1804.07437.pdf
Zhu P F, Wen L Y, Du D W, Bian X, Hu Q H and Ling H B. 2020. Vision meets drones: past, present and future [EB/OL]. [2020-01-16]. https://arxiv.org/pdf/2001.06303v1.pdfhttps://arxiv.org/pdf/2001.06303v1.pdf
Zhuang Y F, Pu Z Y, Hu J and Wang Y H. 2022. Illumination and temperature-aware multispectral networks for edge-computing-enabled pedestrian detection. IEEE Transactions on Network Science and Engineering, 9(3): 1282-1295 [DOI: 10.1109/tnse.2021.3139335http://dx.doi.org/10.1109/tnse.2021.3139335]
相关文章
相关作者
相关机构