融合点云与图像的环境目标检测研究进展

贾明达; 杨金明; 孟维亮; 郭建伟; 张吉光; 张晓鹏

doi:10.11834/jig.240030

场景识别与跨模态学习 | 浏览量 : 0 下载量: 2283 CSCD: 1

PDF
导出
分享
收藏
专辑

融合点云与图像的环境目标检测研究进展
Survey on the fusion of point clouds and images for environmental object detection
2024年29卷第6期页码：1765-1784
收稿日期：2024-01-11，

修回日期：2024-02-11，

纸质出版日期：2024-06-16
DOI： 10.11834/jig.240030
稿件说明：

移动端阅览

贾明达，杨金明，孟维亮，郭建伟，张吉光，张晓鹏. 2024. 融合点云与图像的环境目标检测研究进展. 中国图象图形学报， 29(06):1765-1784 DOI： 10.11834/jig.240030.

Jia Mingda， Yang Jinming， Meng Weiliang， Guo Jianwei， Zhang Jiguang， Zhang Xiaopeng. 2024. Survey on the fusion of point clouds and images for environmental object detection. Journal of Image and Graphics， 29(06):1765-1784 DOI： 10.11834/jig.240030.

摘要

在数字仿真技术应用领域，特别是在自动驾驶技术的发展中，目标检测是至关重要的一个环节，它涉及对周围环境中物体的感知，为智能装备的决策和规划提供了关键信息。近年来，随着传感器技术的进步，图像和点云成为两种主要的感知数据源，它们各自在基于深度学习技术的目标检测方法研究中具有独特的优势。为了更加全面地对现有基于点云和图像的目标检测方法进行研究，本文对基于图像、点云及两者联合的3类目标检测算法进行系统的梳理和总结，旨在探索如何将这两种数据源融合起来，促进提高目标检测的准确性、稳定性和鲁棒性，并对融合点云和图像的环境目标检测发展方向进行展望。

Abstract

In the field of digital simulation technology applications， especially in the development of autonomous driving， object detection is a crucial component. It involves the perception of objects in the surrounding environment， which provides essential information for the decision-making process and planning of intelligent systems. Traditional object detection methods typically involve steps such as feature extraction， object classification， and position regression on images. However， these methods are limited by manually designed features and the performance of classifiers， which restrict their effectiveness in complex scenes and for objects with significant variations. The advent of deep learning technology has led to the widespread adoption of object detection methods based on deep neural networks. Notably， the convolutional neural network （CNN） has emerged as one of the most prominent approaches in this field. By leveraging multiple layers of convolution and pooling operations， CNNs are capable of automatically extracting meaningful feature representations from image data. In addition to image data， light detection and ranging （LiDAR） data play a crucial role in object detection tasks， particularly for 3D object detection. LiDAR data represent objects through a set of unordered and discrete points on their surfaces. Accurately detecting point cloud clusters representing objects and providing their pose estimation from these unordered points is a challenging task. LiDAR data， with their unique characteristics， offer high-precision obstacle detection and distance measurement， which contributes to the perception of surrounding roadways， vehicles， and pedestrian targets. In real-world autonomous driving and related environmental perception scenarios， using a single modality often presents numerous challenges. For instance， while image data can provide a wide variety of high-resolution visual information such as color， texture， and shape， it is susceptible to lighting conditions. In addition， models may struggle to handle occlusions caused by objects obstructing the view due to inherent limitations in camera perspectives. Fortunately， LiDAR exhibits exceptional performance in challenging lighting conditions and excels at accurately spatially locating objects in diverse and harsh weather scenarios. However， it possesses certain limitations. Specifically， the low resolution of LiDAR input data results in sparse point cloud when detecting distant targets. Extracting semantic information from LiDAR data is also more challenging than that from image data. Thus， an increasing number of researchers are emphasizing multimodal environmental object detection. A robust multimodal perception algorithm can offer richer feature information， enhanced adaptability to diverse environments， and improved detection accuracy. Such capabilities empower the perception system to deliver reliable results across various environmental conditions. Certainly， multimodal object detection algorithms also face certain limitations and pressing challenges that require immediate attention. One challenge is the difficulty in data annotation. Annotating point cloud and image data is relatively complex and time consuming， particularly for large-scale datasets. Moreover， accurately labeling point cloud data is challenging due to their sparsity and the presence of noisy points. Addressing these issues is crucial for further advancements in multimodal object detection. Moreover， the data structure and feature representation of point cloud and image data， as two distinct perception modalities， differ significantly. The current research focus lies in effectively integrating the information from the two modalities and extracting accurate and comprehensive features that can be utilized effectively. Furthermore， processing large-scale point cloud data are equally challenging. Point cloud data typically encompass a substantial number of 3D coordinates， which necessitates greater demands on computing resources and algorithmic efficiency compared with pure image data. This study aims to summarize and refine existing approaches to facilitate researchers in gaining a deeper and more efficient understanding of object detection algorithms that integrate images and point clouds. It classifies object detection algorithms based on multimodal fusion of point clouds， images， and combinations of both. Furthermore， we analyze the strengths and weaknesses of various methods while discussing potential solutions. Moreover， we provide a comprehensive review of the development of object detection algorithms that fuse point clouds and images， with considerations of aspects such as data collection， representation， and model design. Ultimately， we give a perspective on the future development direction of environmental target detection， and the goal is to enhance overall capabilities in autonomous systems.

关键词

Keywords

references

Ahmad J and Del Bue A . 2023 . mmFUSION： multimodal fusion for 3D objects detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2311.04058.pdf https://arxiv.org/pdf/2311.04058.pdf

Aldoma A ， Marton Z C ， Tombari F ， Wohlkinger W ， Potthast C ， Zeisl B ， Rusu R B ， Gedikli S and Vincze M . 2012 . Tutorial： point cloud library： three-dimensional object recognition and 6DOF pose estimation . IEEE Robotics and Automation Magazine ， 19 （ 3 ）： 80 - 91 ［ DOI： 10.1109/MRA.2012.2206675 http://dx.doi.org/10.1109/MRA.2012.2206675 ］

Ali W ， Abdelkarim S ， Zidan M ， Zahran M and Sallab A E . 2018 . YOLO3D： end-to-end real-time 3D oriented object bounding box detection from LiDAR point cloud // Proceedings of 2018 European Conference on Computer Vision . Munich， Germany ： Springer： 716 - 728 ［ DOI： 10.1007/978-3-030-11015-4_54 http://dx.doi.org/10.1007/978-3-030-11015-4_54 ］

Arnold E ， Al-Jarrah O Y ， Dianati M ， Fallah S ， Oxtoby D and Mouzakitis A . 2019 . A survey on 3D object detection methods for autonomous driving applications . IEEE Transactions on Intelligent Transportation Systems ， 20 （ 10 ）： 3782 - 3795 ［ DOI： 10.1109/TITS.2019.2892405 http://dx.doi.org/10.1109/TITS.2019.2892405 ］

Arora H ， Loeff N ， Forsyth D A and Ahuja N . 2007 . Unsupervised segmentation of objects using efficient learning // Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition . Minneapolis， USA ： IEEE： 1 - 7 ［ DOI： 10.1109/CVPR.2007.383011 http://dx.doi.org/10.1109/CVPR.2007.383011 ］

Bao W T ， Xu B and Chen Z Z . 2020 . MonoFENet： monocular 3D object detection with feature enhancement networks . IEEE Transactions on Image Processing ， 29 ： 2753 - 2765 ［ DOI： 10.1109/TIP.2019.2952201 http://dx.doi.org/10.1109/TIP.2019.2952201 ］

Beltr􀅡n J ， Guindel C ， Moreno F M ， Cruzado D ， García F and De La Escalera A . 2018 . BirdNet： a 3D object detection framework from LiDAR information // Proceedings of the 21st International Conference on Intelligent Transportation Systems （ITSC） . Maui， USA ： IEEE： 3517 - 3523 ［ DOI： 10.1109/ITSC.2018.8569311 http://dx.doi.org/10.1109/ITSC.2018.8569311 ］

Bewley A ， Sun P ， Mensink T ， Anguelov D and Sminchisescu C . 2020 . Range conditioned dilated convolutions for scale invariant 3D object detection // Proceedings of the 4th Conference on Robot Learning . Cambridge， USA ： PMLR： 627 - 641

Bochkovskiy A ， Wang C Y and Liao H Y M . 2020 . YOLOv4： optimal speed and accuracy of object detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2004.10934.pdf https://arxiv.org/pdf/2004.10934.pdf

Brazil G and Liu X M . 2019 . M 3 D-RPN： monocular 3D region proposal network for object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 9286 - 9295 ［ DOI： 10.1109/ICCV.2019.00938 http://dx.doi.org/10.1109/ICCV.2019.00938 ］

Brazil G ， Pons-Moll G ， Liu X M and Schiele B . 2020 . Kinematic 3D object detection in monocular video // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 135 - 152 ［ DOI： 10.1007/978-3-030-58592-1_9 http://dx.doi.org/10.1007/978-3-030-58592-1_9 ］

Caesar H ， Bankiti V ， Lang A H ， Vora S ， Liong V E ， Xu Q ， Krishnan A ， Pan Y ， Baldan G and Beijbom O . 2020 . nuScenes： a multimodal dataset for autonomous driving // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11618 - 11628 ［ DOI： 10.1109/CVPR42600.2020.01164 http://dx.doi.org/10.1109/CVPR42600.2020.01164 ］

Cai H X ， Zhang Z Y ， Zhou Z Y ， Li Z Y ， Ding W B and Zhao J H . 2023 . BEVFusion 4 D： learning LiDAR-camera fusion under bird’s-eye-view via cross-modality guidance and temporal aggregation ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2303.17099.pdf https://arxiv.org/pdf/2303.17099.pdf

Cao J L ， Li Y L ， Sun H Q ， Xie J ， Huang K Q and Pang Y W . 2022 . A survey on deep learning based visual object detection . Journal of Image and Graphics ， 27 （ 6 ）： 1697 - 1722

曹家乐，李亚利，孙汉卿，谢今，黄凯奇，庞彦伟 . 2022 . 基于深度学习的视觉目标检测技术综述 . 中国图象图形学报， 27 （ 6 ）： 1697 - 1722 ［ DOI： 10.11834/jig.220069 http://dx.doi.org/10.11834/jig.220069 ］

Catmull E . 1998 . Computer display of curved surfaces . Seminal graphics ： pioneering efforts that shaped the field ， 1 ： 35 - 41 ［ DOI： 10.1145/280811.280920 http://dx.doi.org/10.1145/280811.280920 ］

Chang J L and Wetzstein G . 2019 . Deep optics for monocular depth estimation and 3D object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 10192 - 10201 ［ DOI： 10.1109/ICCV.2019.01029 http://dx.doi.org/10.1109/ICCV.2019.01029 ］

Charles R Q ， Su H ， Kaichun M and Guibas L J . 2017 . PointNet： deep learning on point sets for 3D classification and segmentation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 77 - 85 ［ DOI： 10.1109/CVPR.2017.16 http://dx.doi.org/10.1109/CVPR.2017.16 ］

Chen X Z ， Ma H M ， Wan J ， Li B and Xia T . 2017 . Multi-view 3D object detection network for autonomous driving // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 6526 - 6534 ［ DOI： 10.1109/CVPR.2017.691 http://dx.doi.org/10.1109/CVPR.2017.691 ］

Chen Y ， Liu S ， Shen X and Jia J . 2019 . Fast point R-CNN // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 9774 - 9783 ［ DOI： 10.1109/ICCV.2019.00987 http://dx.doi.org/10.1109/ICCV.2019.00987 ］

Cui Y D ， Chen R ， Chu W B ， Chen L ， Tian D X ， Li Y and Cao D P . 2022 . Deep learning for image and point cloud fusion in autonomous driving： a review . IEEE Transactions on Intelligent Transportation Systems ， 23 （ 2 ）： 722 - 739 ［ DOI： 10.1109/TITS.2020.3023541 http://dx.doi.org/10.1109/TITS.2020.3023541 ］

Dalal N and Triggs B . 2005 . Histograms of oriented gradients for human detection // Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition （CVPR’05） . San Diego， USA ： IEEE： 886 - 893 ［ DOI： 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ］

Deng J J ， Shi S S ， Li P W ， Zhou W G ， Zhang Y Y and Li H Q . 2021 . Voxel R-CNN： towards high performance voxel-based 3D object detection //Proceedings of the 35th AAAI Conference on Artificial Intelligence. ［s.l.］： AAAI： 1201 - 1209 ［ DOI： 10.1609/aaai.v35i2.16207 http://dx.doi.org/10.1609/aaai.v35i2.16207 ］

Ding M Y ， Huo Y Q ， Yi H W ， Wang Z ， Shi J P ， Lu Z W and Luo P . 2020a . Learning depth-guided convolutions for monocular 3D object detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11669 - 11678 ［ DOI： 10.1109/CVPR42600.2020.01169 http://dx.doi.org/10.1109/CVPR42600.2020.01169 ］

Ding Z Z ， Hu Y H ， Ge R Z ， Huang L ， Chen S J ， Wang Yand Liao J . 2020b . 1st place solution for Waymo open dataset challenge—3D detection and domain adaptation ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2006.15505.pdf https://arxiv.org/pdf/2006.15505.pdf

Dong Z C ， Ji H ， Huang X F ， Zhang W K ， Zhan X and Chen J B . 2023 . PeP： a Point enhanced Painting method for unified point cloud tasks ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2310.07591.pdf https://arxiv.org/pdf/2310.07591.pdf

Duan K W ， Bai S ， Xie L X ， Qi H G ， Huang Q M and Tian Q . 2019 . CenterNet： keypoint triplets for object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 6568 - 6577 ［ DOI： 10.1109/ICCV.2019.00667 http://dx.doi.org/10.1109/ICCV.2019.00667 ］

Engelcke M ， Rao D ， Wang D Z ， Tong C H and Posner I . 2017 . Vote3Deep： fast object detection in 3D point clouds using efficient convolutional neural networks // Proceedings of 2017 IEEE International Conference on Robotics and Automation （ICRA） . Singapore， Singapore ： IEEE： 1355 - 1361 ［ DOI： 10.1109/ICRA.2017.7989161 http://dx.doi.org/10.1109/ICRA.2017.7989161 ］

Fan L ， Xiong X ， Wang F ， Wang N Y and Zhang Z X . 2021 . RangeDet： in defense of range view for LiDAR-based 3D object detection // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 2898 - 2907 ［ DOI： 10.1109/ICCV48922.2021.00291 http://dx.doi.org/10.1109/ICCV48922.2021.00291 ］

Feng D ， Haase-Schütz C ， Rosenbaum L ， Hertlein H ， Gläser C ， Timm F ， Wiesbeck W and Dietmayer K . 2021 . Deep multi-modal object detection and semantic segmentation for autonomous driving： datasets， methods， and challenges . IEEE Transactions on Intelligent Transportation Systems ， 22 （ 3 ）： 1341 - 1360 ［ DOI： 10.1109/TITS.2020.2972974 http://dx.doi.org/10.1109/TITS.2020.2972974 ］

Fu H ， Gong M M ， Wang C H ， Batmanghelich K and Tao D C . 2018 . Deep ordinal regression network for monocular depth estimation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 2002 - 2011 ［ DOI： 10.1109/CVPR.2018.00214 http://dx.doi.org/10.1109/CVPR.2018.00214 ］

Geiger A ， Lenz P and Urtasun R . 2012 . Are we ready for autonomous driving？ The KITTI vision benchmark suite // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence， USA ： IEEE： 3354 - 3361 ［ DOI： 10.1109/CVPR.2012.6248074 http://dx.doi.org/10.1109/CVPR.2012.6248074 ］

Girshick R . 2015 . Fast R-CNN // Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV） . Santiago， Chile ： IEEE： 1440 - 1448 ［ DOI： 10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ］

Girshick R ， Donahue J ， Darrell T and Malik J . 2014 . Rich feature hierarchies for accurate object detection and semantic segmentation // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus， USA ： IEEE： 580 - 587 ［ DOI： 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ］

Godard C ， Aodha O M and Brostow G J . 2017 . Unsupervised monocular depth estimation with left-right consistency // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 6602 - 6611 ［ DOI： 10.1109/CVPR.2017.699 http://dx.doi.org/10.1109/CVPR.2017.699 ］

Guo Y L ， Wang H Y ， Hu Q Y ， Liu H ， Liu L and Bennamoun M . 2021 . Deep learning for 3D point clouds： a survey . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 12 ）： 4338 - 4364 ［ DOI： 10.1109/TPAMI.2020.3005434 http://dx.doi.org/10.1109/TPAMI.2020.3005434 ］

He C H ， Zeng H ， Huang J Q ， Hua X S and Zhang L . 2020 . Structure aware single-stage 3D object detection from point cloud // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11870 - 11879 ［ DOI： 10.1109/CVPR42600.2020.01189 http://dx.doi.org/10.1109/CVPR42600.2020.01189 ］

He K M ， Gkioxari G ， Doll􀅡r P and Girshick R . 2017 . Mask R-CNN // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 2980 - 2988 ［ DOI： 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ］

He T and Soatto S . 2019 . Mono3D++： monocular 3D vehicle detection with two-scale 3D hypotheses and task priors // Proceedings of the 33rd AAAI Conference on Artificial Intelligence . Honolulu， USA ： AAAI： 8409 - 8416 ［ DOI： 10.1609/aaai.v33i01.33018409 http://dx.doi.org/10.1609/aaai.v33i01.33018409 ］

Hu H T ， Wang F Y ， Su J W ， Hu L F ， Feng T P ， Zhang Z K and Zhang W Z . 2023 . EA-BEV： edge-aware bird’s-eye-view projector for 3D object detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2303.17895.pdf https://arxiv.org/pdf/2303.17895.pdf

Huang T T ， Liu Z ， Chen X W and Bai X . 2020 . EPNet： enhancing point features with image semantics for 3d object detection // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 35 - 52 ［ DOI： 10.1007/978-3-030-58555-6_3 http://dx.doi.org/10.1007/978-3-030-58555-6_3 ］

Huang Z ， Wang Y C and Li D Y . 2023 . A survey of 3D object detection algorithms . Chinese Journal of Intelligent Science and Technology ， 5 （ 1 ）： 7 - 31

黄哲，王永才，李德英 . 2023 . 3D目标检测方法研究综述 . 智能科学与技术学报， 5 （ 1 ）： 7 - 31 ［ DOI： 10.11959/j.issn.2096-6652.202312 http://dx.doi.org/10.11959/j.issn.2096-6652.202312 ］

Jin S ， Li X P ， Yang F and Zhang W G . 2023 . 3D object detection in road scenes by pseudo-LiDAR point cloud augmentation . Journal of Image and Graphics ， 28 （ 11 ）： 3520 - 3535

晋帅，李煊鹏，杨凤，张为公 . 2023 . 伪激光点云增强的道路场景三维目标检测 . 中国图象图形学报， 28 （ 11 ）： 3520 - 3535 ［ DOI： 10.11834/jig.220986 http://dx.doi.org/10.11834/jig.220986 ］

Krispel G ， Opitz M ， Waltner G ， Possegger H and Bischof H . 2020 . FuseSeg： LiDAR point cloud segmentation fusing multi-modal data // Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision （WACV） . Snowmass， USA ： IEEE： 1863 - 1872 ［ DOI： 10.1109/WACV45572.2020.9093584 http://dx.doi.org/10.1109/WACV45572.2020.9093584 ］

Ku J ， Mozifian M ， Lee J ， Harakeh A and Waslander S L . 2018 . Joint 3D proposal generation and object detection from view aggregation // Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Madrid， Spain ： IEEE： 1 - 8 ［ DOI： 10.1109/IROS.2018.8594049 http://dx.doi.org/10.1109/IROS.2018.8594049 ］

Ku J ， Pon A D and Waslander S L . 2019 . Monocular 3D object detection leveraging accurate proposals and shape reconstruction // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 11859 - 11868 ［ DOI： 10.1109/CVPR.2019.01214 http://dx.doi.org/10.1109/CVPR.2019.01214 ］

Lang A H ， Vora S ， Caesar H ， Zhou L B ， Yang J and Beijbom O . 2019 . PointPillars： fast encoders for object detection from point clouds // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 12689 - 12697 ［ DOI： 10.1109/CVPR.2019.01298 http://dx.doi.org/10.1109/CVPR.2019.01298 ］

Law H and Deng J . 2020 . CornerNet： detecting objects as paired keypoints . International Journal of Computer Vision ， 128 （ 3 ）： 642 - 656 ［ DOI： 10.1007/s11263-019-01204-1 http://dx.doi.org/10.1007/s11263-019-01204-1 ］

Li B . 2017 . 3D fully convolutional network for vehicle detection in point cloud // Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Vancouver， Canada ： IEEE： 1513 - 1518 ［ DOI： 10.1109/IROS.2017.8205955 http://dx.doi.org/10.1109/IROS.2017.8205955 ］

Li B Y， Ouyang W L， Sheng L， Zeng X Y and Wang X G. 2019 . G S3D ： an efficient 3D object detection framework for autonomous driving //Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 1019 - 1028 ［ DOI： 10.1109/CVPR.2019.00111 http://dx.doi.org/10.1109/CVPR.2019.00111 ］

Li P X ， Zhao H C ， Liu P F and Cao F D . 2020 . RTM3D： real-time monocular 3d detection from object keypoints for autonomous driving // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 644 - 660 ［ DOI： 10.1007/978-3-030-58580-8_38 http://dx.doi.org/10.1007/978-3-030-58580-8_38 ］

Li X ， Ma T ， Hou Y N ， Shi B T ， Yang Y C ， Liu Y Q ， Wu X J ， Chen Q ， Li Y K ， Qiao Y and He L . 2023a . LoGoNet： towards accurate 3D object detection with local-to-global cross- modal fusion // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 17524 - 17534 ［ DOI： 10.1109/CVPR52729.2023.01681 http://dx.doi.org/10.1109/CVPR52729.2023.01681 ］

Li Z Q ， Wang W H ， Li H Y ， Xie E Z ， Sima C H ， Lu T ， Qiao Y and Dai J F . 2022 . BEVFormer： learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 1 - 18 ［ DOI： 10.1007/978-3-031-20077-9_1 http://dx.doi.org/10.1007/978-3-031-20077-9_1 ］

Liang M ， Yang B ， Chen Y ， Hu R and Urtasun R . 2019 . Multi-task multi-sensor fusion for 3D object detection // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 7337 - 7345 ［ DOI： 10.1109/CVPR.2019.00752 http://dx.doi.org/10.1109/CVPR.2019.00752 ］

Liang M ， Yang B ， Wang S L and Urtasun R . 2018 . Deep continuous fusion for multi-sensor 3D object detection // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 663 - 678 ［ DOI： 10.1007/978-3-030-01270-0_39 http://dx.doi.org/10.1007/978-3-030-01270-0_39 ］

Liang Z D ， Zhang M ， Zhang Z H ， Zhao X and Pu S L . 2020 . RangeRCNN： towards fast and accurate 3D object detection with range image representation ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2009.00206.pdf https://arxiv.org/pdf/2009.00206.pdf

Liang Z D ， Zhang Z H ， Zhang M ， Zhao X and Pu S L . 2021 . RangeIoUDet： range image based real-time 3D object detector optimized by intersection over union // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 7136 - 7145 ［ DOI： 10.1109/CVPR46437.2021.00706 http://dx.doi.org/10.1109/CVPR46437.2021.00706 ］

Lin T Y ， Goyal P ， Girshick R ， He K M and Doll􀅡r P . 2017 . Focal loss for dense object detection // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 2999 - 3007 ［ DOI： 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ］

Liu L J ， Lu J W ， Xu C J ， Tian Q and Zhou J . 2019 . Deep fitting degree scoring network for monocular 3D object detection // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 1057 - 1066 ［ DOI： 10.1109/CVPR.2019.00115 http://dx.doi.org/10.1109/CVPR.2019.00115 ］

Liu W ， Anguelov D ， Erhan D ， Szegedy C ， Reed S ， Fu C Y and Berg A C . 2016 . SSD： single shot MultiBox detector // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 21 - 37 ［ DOI： 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ］

Liu X ， Li H ， Cheng Y Z ， Kong X Z and Chen S M . 2024 . 3D multi-object tracking based on image and point cloud multi-information perception association . Journal of Image and Graphics ， 29 （ 1 ）： 163 - 178

刘祥，李辉，程远志，孔祥振，陈双敏 . 2024 . 图像与点云多重信息感知关联的三维多目标跟踪 . 中国图象图形学报， 29 （ 1 ）： 163 - 178 ［ DOI： 10.11834/jig.221003 http://dx.doi.org/10.11834/jig.221003 ］

Liu Z C ， Wu Z Z and Tóth R . 2020 . SMOKE： single-stage monocular 3D object detection via keypoint estimation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . Seattle， USA ： IEEE： 4289 - 4298 ［ DOI： 10.1109/CVPRW50498.2020.00506 http://dx.doi.org/10.1109/CVPRW50498.2020.00506 ］

Liu Z J ， Tang H T ， Amini A ， Yang X Y ， Mao H Z ， Rus D L and Han S . 2023 . BEVFusion： multi-task multi-sensor fusion with unified bird’s-eye view representation // Proceedings of 2023 IEEE International Conference on Robotics and Automation （ICRA） . London， UK ： IEEE： 2774 - 2781 ［ DOI： 10.1109/ICRA48891.2023.10160968 http://dx.doi.org/10.1109/ICRA48891.2023.10160968 ］

Lu H H ， Chen X S ， Zhang G Y ， Zhou Q H ， Ma Y B and Zhao Y . 2019 . Scanet： spatial-channel attention network for 3D object detection // Proceedings of ICASSP 2019—2019 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP） . Brighton， UK ： IEEE： 1992 - 1996 ［ DOI： 10.1109/ICASSP.2019.8682746 http://dx.doi.org/10.1109/ICASSP.2019.8682746 ］

Lu Y ， Ma X Z ， Yang L ， Zhang T Z ， Liu Y T ， Chu Q ， Yan J J and Ouyang W L . 2021 . Geometry uncertainty projection network for monocular 3D object detection // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 3091 - 3101 ［ DOI： 10.1109/ICCV48922.2021.00310 http://dx.doi.org/10.1109/ICCV48922.2021.00310 ］

Luo S J ， Dai H ， Shao L and Ding Y . 2021 . M 3 DSSD： monocular 3D single stage object detector // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 6141 - 6150 ［ DOI： 10.1109/CVPR46437.2021.00608 http://dx.doi.org/10.1109/CVPR46437.2021.00608 ］

Ma T ， Yang X M ， Zhou H B ， Li X ， Shi B T ， Liu J J ， Yang Y C ， Liu Z Z ， He L ， Qiao Y ， Li Y K and Li H S . 2023 . DetZero： rethinking offboard 3D object detection with long-term sequential point clouds // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France ： IEEE： 6713 - 6724 ［ DOI： 10.1109/ICCV51070.2023.00620 http://dx.doi.org/10.1109/ICCV51070.2023.00620 ］

Ma X Z ， Wang Z H ， Li H J ， Zhang P B ， Ouyang W L and Fan X . 2019 . Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 6850 - 6859 ［ DOI： 10.1109/ICCV.2019.00695 http://dx.doi.org/10.1109/ICCV.2019.00695 ］

Manhardt F ， Kehl W and Gaidon A . 2019 . ROI-10D： monocular lifting of 2D detection to 6D pose and metric shape // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 2064 - 2073 ［ DOI： 10.1109/CVPR.2019.00217 http://dx.doi.org/10.1109/CVPR.2019.00217 ］

Mao J G ， Shi S S ， Wang X G and Li H S . 2022 . 3 D object detection for autonomous driving： a comprehensive survey ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2206.09474.pdf https://arxiv.org/pdf/2206.09474.pdf

Meyer G P ， Charland J ， Pandey S ， Laddha A ， Gautam S ， Vallespi-Gonzalez C and Wellington C K . 2021 . LaserFlow： efficient and probabilistic object detection and motion forecasting . IEEE Robotics and Automation Letters ， 6 （ 2 ）： 526 - 533 ［ DOI： 10.1109/LRA.2020.3047793 http://dx.doi.org/10.1109/LRA.2020.3047793 ］

Meyer G P ， Laddha A ， Kee E ， Vallespi-Gonzalez C and Wellington C K . 2019 . LaserNet： an efficient probabilistic 3D object detector for autonomous driving // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 12669 - 12678 ［ DOI： 10.1109/CVPR.2019.01296 http://dx.doi.org/10.1109/CVPR.2019.01296 ］

Naiden A ， Paunescu V ， Kim G ， Jeon B M and Leordeanu M . 2019 . Shift R-CNN： deep monocular 3d object detection with closed-form geometric constraints // Proceedings of 2019 IEEE international conference on image processing （ICIP） . Taipei， China ： IEEE： 61 - 65 ［ DOI： 10.1109/ICIP.2019.8803397 http://dx.doi.org/10.1109/ICIP.2019.8803397 ］

Ng P C and Henikoff S . 2003 . SIFT： predicting amino acid changes that affect protein function . Nucleic Acids Research ， 31 （ 13 ）： 3812 - 3814 ［ DOI： 10.1093/nar/gkg509 http://dx.doi.org/10.1093/nar/gkg509 ］

Ngiam J ， Caine B ， Han W ， Yang B ， Chai Y N ， Sun P ， Zhou Y ， Yi X ， Alsharif O ， Nguyen P ， Chen Z F ， Shlens J and Vasudevan V . 2019 . StarNet： targeted computation for object detection in point clouds ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/1908.11069.pdf https://arxiv.org/pdf/1908.11069.pdf

Pan X R ， Xia Z F ， Song S J ， Li L E and Huang G . 2021 . 3D object detection with pointformer // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 7459 - 7468 ［ DOI： 10.1109/CVPR46437.2021.00738 http://dx.doi.org/10.1109/CVPR46437.2021.00738 ］

Qi C R ， Liu W ， Wu C X ， Su H and Guibas L J . 2018 . Frustum PointNets for 3D object detection from RGB-D data // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 918 - 927 ［ DOI： 10.1109/CVPR.2018.00102 http://dx.doi.org/10.1109/CVPR.2018.00102 ］

Qi C R ， Yi L ， Su H and Guibas L J . 2017 . Pointnet++： deep hierarchical feature learning on point sets in a metric space // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 5105 - 5114

Qin Z Y ， Wang J L and Lu Y . 2022 . MonoGRNet： a general framework for monocular 3D object detection . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 44 （ 9 ）： 5170 - 5184 ［ DOI： 10.1109/TPAMI.2021.3074363 http://dx.doi.org/10.1109/TPAMI.2021.3074363 ］

Ren S Q ， He K M ， Girshick R and Sun J . 2017 . Faster R-CNN： towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 39 （ 6 ）： 1137 - 1149 ［ DOI： 10.1109/TPAMI.2016.2577031 http://dx.doi.org/10.1109/TPAMI.2016.2577031 ］

Roddick T ， Kendall A and Cipolla R . 2019 . Orthographic feature transform for monocular 3D object detection // Proceedings of the 30th British Machine Vision Conference 2019 . Cardiff， UK ： BMVC： #285

Shi S S ， Guo C X ， Jiang L ， Wang Z ， Shi J P ， Wang X G and Li H S . 2020a . PV-RCNN： point-voxel feature set abstraction for 3D object detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 10526 - 10535 ［ DOI： 10.1109/CVPR42600.2020.01054 http://dx.doi.org/10.1109/CVPR42600.2020.01054 ］

Shi S S ， Wang X G and Li H S . 2019 . PointRCNN： 3D object proposal generation and detection from point cloud // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 770 - 779 ［ DOI： 10.1109/CVPR.2019.00086 http://dx.doi.org/10.1109/CVPR.2019.00086 ］

Shi W J and Rajkumar R . 2020 . Point-GNN： graph neural network for 3D object detection in a point cloud // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 1708 - 1716 ［ DOI： 10.1109/CVPR42600.2020.00178 http://dx.doi.org/10.1109/CVPR42600.2020.00178 ］

Shi X P ， Ye Q ， Chen X Z ， Chen C R ， Chen Z X and Kim T K . 2021 . Geometry-based distance decomposition for monocular 3D object detection // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 15152 - 15161 ［ DOI： 10.1109/ICCV48922.2021.01489 http://dx.doi.org/10.1109/ICCV48922.2021.01489 ］

Shin K ， Kwon Y P and Tomizuka M . 2019 . RoarNet： a robust 3D object detection based on RegiOn approximation refinement // Proceedings of 2019 IEEE Intelligent Vehicles Symposium （IV） . Paris， France ： IEEE： 2510 - 2515 ［ DOI： 10.1109/IVS.2019.8813895 http://dx.doi.org/10.1109/IVS.2019.8813895 ］

Simonelli A ， Buló S R ， Porzi L ， Ricci E and Kontschieder P . 2020 . Towards generalization across depth for monocular 3D object detection // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 767 - 782 ［ DOI： 10.1007/978-3-030-58542-6_46 http://dx.doi.org/10.1007/978-3-030-58542-6_46 ］

Sindagi V A ， Zhou Y and Tuzel O . 2019 . MVX-Net： multimodal VoxelNet for 3D object detection // Proceedings of 2019 International Conference on Robotics and Automation （ICRA） . Montreal， Canada ： IEEE： 7276 - 7282 ［ DOI： 10.1109/ICRA.2019.8794195 http://dx.doi.org/10.1109/ICRA.2019.8794195 ］

Sun P ， Kretzschmar H ， Dotiwalla X ， Chouard A ， Patnaik V ， Tsui P ， Guo J ， Zhou Y ， Chai Y N ， Caine B ， Vasudevan V ， Han W ， Ngiam J ， Zhao H ， Timofeev A ， Ettinger S ， Krivokon M ， Gao A ， Joshi A ， Zhang Y ， Shlens J ， Chen Z F and Anguelov D . 2020 . Scalability in perception for autonomous driving： Waymo open dataset // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 2443 - 2451 ［ DOI： 10.1109/CVPR42600.2020.00252 http://dx.doi.org/10.1109/CVPR42600.2020.00252 ］

Sun P ， Wang W Y ， Chai Y N ， Elsayed G ， Bewley A ， Zhang X ， Sminchisescu C and Anguelov D . 2021 . RSN： range sparse net for efficient， accurate LiDAR 3D object detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 5721 - 5730 ［ DOI： 10.1109/CVPR46437.2021.00567 http://dx.doi.org/10.1109/CVPR46437.2021.00567 ］

Tan M X ， Pang R M and Le Q V . 2020 . EfficientDet： scalable and efficient object detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 10778 - 10787 ［ DOI： 10.1109/CVPR42600.2020.01079 http://dx.doi.org/10.1109/CVPR42600.2020.01079 ］

Tian H ， Chen Y ， Dai J ， Zhang Z and Zhu X . 2021 . Unsupervised object detection with LiDAR clues // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 5958 - 5968 ［ DOI： 10.1109/CVPR46437.2021.00590 http://dx.doi.org/10.1109/CVPR46437.2021.00590 ］

Tian Z ， Shen C H ， Chen H and He T . 2019 . FCOS： fully convolutional one-stage object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 9626 - 9635 ［ DOI： 10.1109/ICCV.2019.00972 http://dx.doi.org/10.1109/ICCV.2019.00972 ］

Viola P and Jones M . 2001 . Rapid object detection using a boosted cascade of simple features // Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . CVPR 2001. Kauai， USA ： IEEE： I-I ［ DOI： 10.1109/CVPR.2001.990517 http://dx.doi.org/10.1109/CVPR.2001.990517 ］

Vora S ， Lang A H ， Helou B and Beijbom O . 2020 . PointPainting： sequential fusion for 3D object detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 4603 - 4611 ［ DOI： 10.1109/CVPR42600.2020.00466 http://dx.doi.org/10.1109/CVPR42600.2020.00466 ］

Wang C H ， Chen H W and Fu L C . 2021a . VPFNet： voxel-pixel fusion network for multi-class 3d object detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2111.00966.pdf https://arxiv.org/pdf/2111.00966.pdf

Wang C W ， Ma C ， Zhu M and Yang X K . 2021b . PointAugmenting： cross-modal augmentation for 3D object detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 11789 - 11798 ［ DOI： 10.1109/CVPR46437.2021.01162 http://dx.doi.org/10.1109/CVPR46437.2021.01162 ］

Wang D Z and Posner I . 2015 . Voting for voting in online point cloud object detection // Robotics： Science and Systems ［ DOI： 10.15607/RSS.2015.XI.035 http://dx.doi.org/10.15607/RSS.2015.XI.035 ］

Wang G J ， Tian B ， Zhang Y C ， Chen L ， Cao D P and Wu J . 2020 . Multi-view adaptive fusion network for 3D object detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2011.00652.pdf https://arxiv.org/pdf/2011.00652.pdf

Wang L ， Du L ， Ye X Q ， Fu Y W ， Guo G D ， Xue X Y ， Feng J F and Zhang L . 2021c . Depth-conditioned dynamic message propagation for monocular 3D object detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 454 - 463 ［ DOI： 10.1109/CVPR46437.2021.00052 http://dx.doi.org/10.1109/CVPR46437.2021.00052 ］

Wang L ， Zhang X Y ， Song Z Y ， Bi J F ， Zhang G X ， Wei H Y ， Tang L Y ， Yang L ， Li J ， Jia C Y and Zhao L J . 2023 . Multi-modal 3D object detection in autonomous driving： a survey and taxonomy . IEEE Transactions on Intelligent Vehicles ， 8 （ 7 ）： 3781 - 3798 ［ DOI： 10.1109/TIV.2023.3264658 http://dx.doi.org/10.1109/TIV.2023.3264658 ］

Wang T ， Zhu X E ， Pang J M and Lin D H . 2021d . FCOS3D： fully convolutional one-stage monocular 3D object detection. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops . Montreal， Canada ： IEEE ： 913 - 922 ［ DOI： 10.1109/ICCVW54120.2021.00107 http://dx.doi.org/10.1109/ICCVW54120.2021.00107 ］

Wang Y ， Chao W L ， Garg D ， Hariharan B ， Campbell M and Weinberger K Q . 2019a . Pseudo-LiDAR from visual depth estimation： bridging the gap in 3D object detection for autonomous driving // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 8437 - 8445 ［ DOI： 10.1109/CVPR.2019.00864 http://dx.doi.org/10.1109/CVPR.2019.00864 ］

Wang Z X and Jia K . 2019 . Frustum ConvNet： sliding frustums to aggregate local point-wise features for Amodal 3D object detection // Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Macau， China ： IEEE： 1742 - 1749 ［ DOI： 10.1109/IROS40897.2019.8968513 http://dx.doi.org/10.1109/IROS40897.2019.8968513 ］

Wu H ， Deng J H ， Wen C L ， Li X ， Wang C and Li J . 2022a . CasA： a cascade attention network for 3-D object detection from LiDAR point cloudS . IEEE Transactions on Geoscience and Remote Sensing ， 60 ： # 5704511 ［ DOI： 10.1109/TGRS.2022.3203163 http://dx.doi.org/10.1109/TGRS.2022.3203163 ］

Wu H ， Wen C L ， Li W ， Li X ， Yang R G and Wang C . 2023a . Transformation-equivariant 3D object detection for autonomous driving // Proceedings of the 37th AAAI Conference on Artificial Intelligence . Washington， USA ： AAAI： 2795 - 2802 ［ DOI： 10.1609/aaai.v37i3.25380 http://dx.doi.org/10.1609/aaai.v37i3.25380 ］

Wu H ， Wen C L ， Shi S S ， Li X and Wang C . 2023b . Virtual sparse convolution for multimodal 3D object detection // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 21653 - 21662 ［ DOI： 10.1109/CVPR52729.2023.02074 http://dx.doi.org/10.1109/CVPR52729.2023.02074 ］

Wu X P ， Peng L ， Yang H H ， Xie L ， Huang C X ， Deng C Q ， Liu H F and Cai D . 2022b . Sparse fuse dense： towards high quality 3D detection with depth completion // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 5408 - 5417 ［ DOI： 10.1109/CVPR52688.2022.00534 http://dx.doi.org/10.1109/CVPR52688.2022.00534 ］

Xiao W P ， Peng Y ， Liu C ， Gao J T ， Wu Y Q and Li X M . 2023 . Balanced sample assignment and objective for single-model multi-class 3D object detection . IEEE Transactions on Circuits and Systems for Video Technology ， 33 （ 9 ）： 5036 - 5048 ［ DOI： 10.1109/TCSVT.2023.3248656 http://dx.doi.org/10.1109/TCSVT.2023.3248656 ］

Xie E Z ， Yu Z D Zhou D Q ， Philion J ， Anandkumar A ， Fidler S ， Luo P and Alvarez J M . 2022 . M 2 BEV： multi-camera joint 3D detection and segmentation with unified birds-eye view representation ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2204.05088.pdf https://arxiv.org/pdf/2204.05088.pdf

Xie L ， Xiang C ， Yu Z X ， Xu G D ， Yang Z ， Cai D and He X F . 2020 . PI-RCNN： an efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module // Proceedings of 2020 AAAI Conference on Artificial Intelligence . New York， USA ： AAAI： 12460 - 12467 ［ DOI： 10.1609/aaai.v34i07.6933 http://dx.doi.org/10.1609/aaai.v34i07.6933 ］

Xu B and Chen Z Z . 2018 . Multi-level fusion based 3D object detection from monocular images // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 2345 - 2353 ［ DOI： 10.1109/CVPR.2018.00249 http://dx.doi.org/10.1109/CVPR.2018.00249 ］

Xu D F ， Anguelov D and Jain A . 2018 . PointFusion： deep sensor fusion for 3D bounding box estimation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 244 - 253 ［ DOI： 10.1109/CVPR.2018.00033 http://dx.doi.org/10.1109/CVPR.2018.00033 ］

Xu H T ， Xu J M and Xu W W . 2019 . Survey of 3D modeling using depth cameras . Virtual Reality and Intelligent Hardware ， 1 （ 5 ）： 483 - 499 ［ DOI： 10.1016/j.vrih.2019.09.003 http://dx.doi.org/10.1016/j.vrih.2019.09.003 ］

Yan J J ， Liu Y F ， Sun J J ， Jia F ， Li S L ， Wang T C and Zhang X Y . 2023 . Cross modal Transformer： towards fast and robust 3D object detection ［EB/OL］. ［ 2023-12-21 ］. https://arxiv.org/pdf/2301.01283.pdf https://arxiv.org/pdf/2301.01283.pdf

Yan Y ， Mao Y X and Li B . 2018 . SECOND： sparsely embedded convolutional detection . Sensors （Basel）， 18 （ 10 ）：# 3337 ［ DOI： 10.3390/s18103337 http://dx.doi.org/10.3390/s18103337 ］

Yang B ， Luo W J and Urtasun R . 2018b . PIXOR： real-time 3D object detection from point clouds // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 7652 - 7660 ［ DOI： 10.1109/CVPR.2018.00798 http://dx.doi.org/10.1109/CVPR.2018.00798 ］

Yang C Y ， Chen Y T ， Tian H ， Tao C X ， Zhu X Z ， Zhang Z X ， Huang G ， Li H Y ， Qiao Y ， Lu L W ， Zhou J and Dai J F . 2022 . BEVFormer v2： adapting modern image backbones to bird’s-eye-view recognition via perspective supervision // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 17830 - 17839 ［ DOI： 10.1109/CVPR52729.2023.01710 http://dx.doi.org/10.1109/CVPR52729.2023.01710 ］

Yang Z T ， Sun Y N ， Liu S and Jia J Y . 2020 . 3DSSD： point-based 3D single stage object detector // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11037 - 11045 ［ DOI： 10.1109/CVPR42600.2020.01105 http://dx.doi.org/10.1109/CVPR42600.2020.01105 ］

Yang Z T ， Sun Y N ， Liu S ， Shen X Y and Jia J Y . 2018c . IPOD： intensive point-based object detector for point cloud ［EB/OL］. ［ 2018-12-13 ］. https://arxiv.org/pdf/1812.05276.pdf https://arxiv.org/pdf/1812.05276.pdf

Yang Z T ， Sun Y N ， Liu S ， Shen X Y and Jia J Y . 2019 . STD： sparse-to-dense 3D object detector for point cloud // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 1951 - 1960 ［ DOI： 10.1109/ICCV.2019.00204 http://dx.doi.org/10.1109/ICCV.2019.00204 ］

Yin T W ， Zhou X Y and Krähenbühl P . 2021 . Center-based 3D object detection and tracking // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 11779 - 11788 ［ DOI： 10.1109/CVPR46437.2021.01161 http://dx.doi.org/10.1109/CVPR46437.2021.01161 ］

Yoo J H ， Kim Y ， Kim J and Choi J W . 2020 . 3D-CVF： generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 720 - 736 ［ DOI： 10.1007/978-3-030-58583-9_43 http://dx.doi.org/10.1007/978-3-030-58583-9_43 ］

Zeng Y M ， Hu Y ， Liu S C ， Ye J ， Han Y H ， Li X W and Sun N H . 2018 . RT 3 D： real-time 3-D vehicle detection in LiDAR point cloud for autonomous driving. IEEE Robotics and Automation Letters ， 3 （ 4 ）： 3434 - 3440 ［ DOI： 10.1109/LRA.2018.2852843 http://dx.doi.org/10.1109/LRA.2018.2852843 ］

Zhao X ， Liu Z ， Hu R L and Huang K Q . 2019 . 3D object detection using scale invariant and feature reweighting networks // Proceedings of the 33rd AAAI Conference on Artificial Intelligence . Honolulu， USA ： AAAI： 9267 - 9274 ［ DOI： 10.1609/aaai.v33i01.33019267 http://dx.doi.org/10.1609/aaai.v33i01.33019267 ］

Zhou X Y ， Zhuo J C and Krähenbühl P . 2019 . Bottom-up object detection by grouping extreme and center points // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 850 - 859 ［ DOI： 10.1109/CVPR.2019.00094 http://dx.doi.org/10.1109/CVPR.2019.00094 ］

Zhou Y and Tuzel O . 2018 . VoxelNet： end-to-end learning for point cloud based 3D object detection // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 4490 - 4499 ［ DOI： 10.1109/CVPR.2018.00472 http://dx.doi.org/10.1109/CVPR.2018.00472 ］

Zhou Y S ， He Y ， Zhu H Z ， Wang C ， Li H Y and Jiang Q H . 2021 . Monocular 3D object detection： an extrinsic parameter free approach // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 7552 - 7562 ［ DOI： 10.1109/CVPR46437.2021.00747 http://dx.doi.org/10.1109/CVPR46437.2021.00747 ］

Zhu H Q ， Deng J J ， Zhang Y ， Ji J M ， Mao Q Y ， Li H Q and Zhang Y Y . 2023 . VPFNet： improving 3D object detection with virtual point based LiDAR and stereo data fusion . IEEE Transactions on Multimedia ， 25 ： 5291 - 5304 ［ DOI： 10.1109/TMM.2022.3189778 http://dx.doi.org/10.1109/TMM.2022.3189778 ］

文章被引用时，请邮件提醒。

提交

面向计算机视觉的数据生成与应用研究进展

SLAM新机遇—高斯溅射技术

面向复杂动态场景的无人移动视觉技术研究进展

目标检测技术在开放环境中的挑战与进展

改进实时目标检测Transformer的持刀危险行为检测算法