Research progress of unmanned mobile vision technology for complex dynamic scenes
- Pages: 1-41(2024)
Published Online: 23 December 2024
DOI: 10.11834/jig.240458
移动端阅览
浏览全部资源
扫码关注微信
Published Online: 23 December 2024 ,
移动端阅览
张艳宁,王昊宇,闫庆森等.面向复杂动态场景的无人移动视觉技术研究进展[J].中国图象图形学报,
Zhang Yanning,Wang Haoyu,Yan Qingsen,et al.Research progress of unmanned mobile vision technology for complex dynamic scenes[J].Journal of Image and Graphics,
随着人类活动范围的不断扩大和国家利益的持续发展,新域新质无人系统已成为世界各大国科技战略竞争的制高点和制胜未来的关键力量。无人移动视觉技术是无人系统辅助人类透彻感知理解物理世界的核心关键之一,旨在基于无人移动平台捕获的视觉数据,精准感知理解复杂动态场景与目标特性。深度神经网络凭借其超强的非线性拟合能力和区分能力,已经成为无人移动视觉技术的基准模型。然而,实际应用中无人系统通常面临成像环境复杂动态、成像目标高速机动-伪装对抗、成像任务需求多样,导致基于深度神经网络的无人移动视觉模型成像质量大幅退化,场景重建解译与目标识别分析精度显著下降,从而严重制约无人系统在复杂动态场景下对物理世界的感知解译能力与应用前景。针对这一挑战,本文深入探讨了面向复杂动态场景的无人移动视觉技术发展现状,分别从图像增强处理、三维重建、场景分割、目标检测识别以及异常检测与行为分析等五个关键技术入手,详细介绍了每项技术的基本研究思路与发展现状,分析每项技术中典型算法的优缺点,探究该技术目前依然面临的问题与挑战,并展望未来研究方向,为面向复杂动态场景的无人移动视觉技术长远发展与落地奠定基础。
In today's era of advancing automation and intelligence, unmanned systems are rapidly emerging as a new focal point of technological strategic competition among major global powers. These new domains and qualities of unmanned systems are not only key to supporting national security and strategic interests but also the core force driving future technological innovation and application development. Unmanned systems are redefining the boundaries of national security and the connotations of strategic advantages. As a key component of unmanned systems, unmanned mobile visual technology is demonstrating its immense potential in assisting humans to deeply understand the physical world. The progress of this technology not only provides unmanned systems with richer and more precise perceptual capabilities but also offers new perspectives for humans to observe, analyze, and ultimately master the complex and ever-changing physical environment. In the early stages of the development of unmanned mobile visual technology, researchers mainly relied on traditional learning methods for processing. These methods focused on manual feature extraction, depending on the experience and knowledge of domain experts. For instance, feature descriptors such as Scale-Invariant Feature Transform and Histogram of Oriented Gradients played significant roles in image matching and target detection tasks. Although traditional visual analysis methods still have their value in specific situations, their reliance on manual feature extraction and professional knowledge limits the efficiency and accuracy of the analysis. With the rise of deep neural network technology, unmanned mobile visual technology has ushered in revolutionary progress. Deep neural networks, through automatic feature extraction and hierarchical structures, can automatically learn feature representations from simple to complex. This enables them to capture local image features while also understanding and interpreting higher-level semantic information. This significantly enhances the fitting and discriminative capabilities of the models, demonstrating advantages that traditional methods cannot match, making deep neural networks the benchmark model for unmanned mobile visual technology. However, in practical applications, unmanned systems often face complex and diverse and dynamically changing application scenarios, posing great challenges to the application of deep learning. Firstly, the complexity and dynamics of the imaging environment are issues that unmanned systems must confront. Drastic changes in environmental lighting, uncertainty in weather conditions, and interference from other moving objects in the scene can all lead to a decline in image quality, thereby affecting subsequent processing and analysis. Secondly, the high-speed maneuverability and camouflage and concealment behaviors of imaging targets pose higher requirements for unmanned mobile visual systems. The rapid movement of targets makes it difficult for the system to track stably, while camouflage and concealment behaviors make target detection more difficult. These factors work together, causing the precision of scene reconstruction interpretation and target identification and analysis based on deep neural networks in unmanned mobile visual models to decline significantly. In addition, the diversity of imaging tasks also brings challenges to unmanned mobile visual technology. Different tasks may require different visual processing strategies and analysis methods, and the system needs to have enough flexibility and adaptability to meet the needs of different tasks. However, current deep neural network models are often optimized for specific tasks in the design, and their adaptability to diverse tasks is limited. The uncertainty and unpredictability of environmental factors pose extremely demanding requirements for the application of unmanned mobile visual technology, which requires unmanned mobile visual technology to provide precise perception and in-depth analysis, thus providing decision support for automated systems, enabling them to respond quickly and accurately to environmental changes, and improving system efficiency and reliability. In response to the visual challenges of unmanned systems in complex dynamic scenes, this article deeply explores the current state of development of unmanned mobile visual technology in dealing with these challenges, focusing on five key technical areas: image enhancement, 3D reconstruction, scene segmentation, object detection, and anomaly detection. Image enhancement is the first step in improving the quality of visual data. It improves the contrast, clarity, and color of images, providing more reliable input for subsequent analysis and processing, thereby enhancing the performance of unmanned systems under various environmental conditions. 3D reconstruction technology allows the recovery of three-dimensional structures from two-dimensional images, enabling unmanned systems to understand the depth and spatial layout of the scene, thus enhancing the system's understanding and adaptability to complex environments. Scene segmentation divides the image into multiple semantically meaningful regions or objects, providing a basis for precise environmental perception and target recognition. Object detection is a core task in unmanned mobile visual technology, enabling the system to locate and recognize specific targets in images or videos. Anomaly detection focus on identifying anomalies or events in the scene, providing the ability for unmanned systems to timely identify and respond to potential threats. For these key technologies, this article will deeply explore their research ideas, current status, and the advantages and disadvantages of typical algorithms, analyzing their performance in practical applications. The integration and collaborative work of these technologies have significantly enhanced the visual perception capabilities of unmanned systems in complex dynamic scenes, enabling them to perform tasks more intelligently and autonomously. Although some research has made certain progress, unmanned mobile visual technology still faces many problems in practical applications in complex dynamic scenes. This review paper aims to provide a comprehensive perspective, systematically combing and analyzing the latest research progress in unmanned mobile visual technology for complex dynamic scenes. It explores the advantages and limitations of the above key tasks in practical applications. In addition, this article will discuss the gaps and challenges in current research and propose future possible research directions. Through in-depth exploration of these research directions, unmanned mobile visual technology will continue to make progress, providing more powerful and flexible solutions to address the challenges in complex dynamic scenes, and laying a solid foundation for the long-term development and practical application of unmanned systems in the fields of automation and intelligence.
无人移动视觉复杂动态场景图像增强三维重建场景分割目标检测异常检测
unmanned mobile visioncomplex dynamic scenesimage enhancement3D reconstructionscene segmentationobject detectionanomaly detection
Abuolaim A, Brown M S. Defocus deblurring using dual-pixel data[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer International Publishing, 2020: 111-126.
Achlioptas P , Diamanti O , Mitliagkas I. Learning Representations and Generative Models for 3D Point Clouds[J]. 2017.DOI:10.48550/arXiv.1707.02392http://dx.doi.org/10.48550/arXiv.1707.02392.
Aiger, Dror, Mitra, Niloy J, Cohen-Or, Daniel. 2008. 4-points congruent sets for robust pairwise surface registration. In ACM SIGGRAPH 2008papers: 1-10.
Aoki, Yasuhiro, Goforth, Hunter, Srivatsan, Rangaprasad Arun, Lucey, Simon. 2019. PointNetLK: Robust & efficient point cloud registration using PointNet. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7163-7172.
Bai Yanfeng, Wang Libiao, Gao Weidong, Ma Yinglong. 2024. Multi-modal hierarchical classification for power equipment defect detection. Journal of Image and Graphics, 29(07):2011-2023.
白艳峰, 王立彪, 高卫东, 马应龙. 2024. 面向电力设备缺陷检测的多模态层次化分类. 中国图象图形学报, 29(07):2011-2023[DOI: 10.11834/jig.230269http://dx.doi.org/10.11834/jig.230269]
Barnes C, Shechtman E, Finkelstein A, et al. PatchMatch: A randomized correspondence algorithm for structural image editing[J]. ACM Trans. Graph., 2009, 28(3): 24.
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection
Bouaziz, Sofien, Tagliasacchi, Andrea, Pauly, Mark. 2013. Sparse iterative closest point. In Computer Graphics Forum, 32(5): 113-123. Wiley Online Library.
Bozcan I. and Kayacan,E. 2021.Context-dependent anomaly detection for low altitude traffic surveillance//2021 IEEE International Conference on Robotics and Automation. Xi'an, China: IEEE: 224-230[DOI: 10.1109/ICRA48506.2021.9562043http://dx.doi.org/10.1109/ICRA48506.2021.9562043]
Brüggemann D, Sakaridis C, Truong P and Van Gool L. 2023. Refign: Align and refine for adaptation of semantic segmentation to adverse conditions//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE: 3174-3184 [DOI: 10.1109/WACV56688.2023.00319http://dx.doi.org/10.1109/WACV56688.2023.00319]
Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving Into High Quality Object Detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, pp 6154–6162
Campbell N D F, Vogiatzis G, Hernández C, et al. Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I 10. Springer Berlin Heidelberg, 2008: 766-779.
Canelhas D R , Schaffernicht E , Stoyanov T. An Eigenshapes Approach to Compressed Signed Distance Fields and Their Utility in Robot Mapping[J]. 2016.DOI:10.3390/robotics6030015http://dx.doi.org/10.3390/robotics6030015.
Cao J, Leng H, Lischinski D, Cohen-Or D, Lischinski D, Tu C and Li Y. 2021. ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE: 7068-7077 [DOI: 10.1109/ICCV48922.2021.00700http://dx.doi.org/10.1109/ICCV48922.2021.00700]
Cao J, Leng H, Lischinski D, Cohen-Or D, Tu C and Li Y. 2021. ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE: 7068-7077 [DOI: 10.1109/ICCV48922.2021.00700http://dx.doi.org/10.1109/ICCV48922.2021.00700]
Cavagnero N, Rosi G, Cuttano C, Pistilli F, Ciccone M, Averta G and Cermelli F. 2024. Pem: Prototype-based efficient maskformer for image segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, Washington: IEEE: 15804-15813 [DOI: 10.48550/arXiv.2402.19422http://dx.doi.org/10.48550/arXiv.2402.19422]
Chang L., Feng X., Li X., et al. (2016). A fusion estimation method based on fractional Fourier transform. Digital Signal Processing, 59, 66-75.[ DOI: 10.1016/j.dsp.2016.07.007]
Chen C, Qi J, Liu X, et al (2024a) Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 26836–26845
Chen H, Gu J, Liu Y, et al. Masked image training for generalizable deep image denoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1692-1703.
Chen J, Yang Z, Chan T N, et al. Attention-guided progressive neural texture fusion for high dynamic range image restoration[J]. IEEE Transactions on Image Processing, 2022, 31: 2661-2672.
Chen K, Xie E, Chen Z, et al (2024b) GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4): 834-848 [DOI:10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen N, Li Y, Yang Z, et al (2023) LODNU: lightweight object detection network in UAV vision. J Supercomput 79:10117–10138. https://doi.org/10.1007/s11227-023-05065-xhttps://doi.org/10.1007/s11227-023-05065-x
Chen Q, Su X, Zhang X, et al (2024c) LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Chen W , Litalien J , Gao J. DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer[J].arXiv e-prints, 2021.DOI:10.48550/arXiv.2111.00140http://dx.doi.org/10.48550/arXiv.2111.00140.
Chen X, Lin K Y, Wang J, Wu W, Qian C, Li H and Zeng G. 2020. Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation//Computer Vision – ECCV 2020: 16th European Conference, Glasgow: Springer-Verlag: 561-577 [DOI: 10.1007/978-3-030-58621-8_33http://dx.doi.org/10.1007/978-3-030-58621-8_33]
Chen X, Wang X, Zhou J, et al. Activating more pixels in image super-resolution transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 22367-22377.
Chen Y-T, Shi J, Ye Z, et al (2022) Multimodal Object Detection via Probabilistic Ensembling. In: AvidanS, BrostowG, CisséM,等, eds Computer Vision – ECCV 2022. Springer Nature Switzerland, Cham, pp 139–158
Chen Yang, Medioni, Gérard. 1992. Object modelling by registration of multiple range images. Image and Vision Computing, 10(3): 145-155.
Chen L., X. Lu, J. Zhang, X. Chu and C. Chen. 2021. HINet: Half instance normalization network for image restoration//Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 182-192. [DOI: 10.1109/CVPRW53098.2021.00027http://dx.doi.org/10.1109/CVPRW53098.2021.00027]
Cheng B W, Collins M D, Zhu Y K, Liu T, Huang T S, Adam H and Chen L. 2020. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 12472-12482 [DOI:10.1109/CVPR42600.2020.01249http://dx.doi.org/10.1109/CVPR42600.2020.01249]
Cheng B W, Misra I, Schwing A G, Kirillov A and Girdhar R. 2022. Masked-attention mask transformer for universal image segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE: 1290-1299 [DOI:10.1109/CVPR52688.2022.00135http://dx.doi.org/10.1109/CVPR52688.2022.00135]
Cheng B W, Schwing A and Kirillov A. 2021. Per-Pixel Classification is Not All You Need for Semantic Segmentation[EB/OL].[2021-07-13]. https://arxiv.org/pdf/2107.06278.pdfhttps://arxiv.org/pdf/2107.06278.pdf
Cheng S, Xu Z, Zhu S, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2524-2534.
Cho S.-J., S.-W. Ji, J.-P. Hong, S.-W. Jung and S.-J. Ko. 2021. Rethinking coarse-to-fine approach in single image deblurring//Proceedings of the IEEE/CVF International Conference on Computer Vision 4641-4650. [DOI: 10.1109/ICCV48922.2021.00460http://dx.doi.org/10.1109/ICCV48922.2021.00460]
Choy C B , Xu D , Gwak J Y. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction[J].Springer International Publishing, 2016.DOI:10.1007/978-3-319-46484-8_38http://dx.doi.org/10.1007/978-3-319-46484-8_38.
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 3213-3223.[DOI:10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]
Cores D, Brea VM, Mucientes M, et al (2023) Downsampling GAN for Small Object Data Augmentation. In: Tsapatsoulis N, Lanitis A, Pattichis M,等, (eds) Computer Analysis of Images and Patterns. Springer Nature Switzerland, Cham, pp 89–98
Cornillere V., Djelouah A., Yifan W., et al. 2019. Blind Image Super-Resolution with Spatially Variant Degradations. ACM Transactions on Graphics, 38(6), 166.1-166.13. [DOI: 10.1145/3355089.3356575http://dx.doi.org/10.1145/3355089.3356575]
Curless B , Levoy M .A Volumetric Method for Building Complex Models from Range Images[J].ACM, 1996.DOI:10.1145/237170.237269http://dx.doi.org/10.1145/237170.237269.
Dabov K., Foi A., Katkovnik V., & Egiazarian K. 2007. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080-2095. [DOI: 10.1109/TIP.2007.901238http://dx.doi.org/10.1109/TIP.2007.901238]
Dasgupta K, Das A, Das S, et al (2022) Spatio-Contextual Deep Network-Based Multimodal Pedestrian Detection for Autonomous Driving. Trans Intell Transp Sys 23:15940–15950. https://doi.org/10.1109/TITS.2022.3146575https://doi.org/10.1109/TITS.2022.3146575
De Geus D, Meletis P and Dubbelman G. 2020. Fast Panoptic Segmentation Network. IEEE Robotics and Automation Letters,5(2): 1742-1749 [DOI:10.1109/LRA.2020.2969919http://dx.doi.org/10.1109/LRA.2020.2969919]
Deng X, Wang P, Lian X and Newsam S. 2022. NightLab: A dual-level architecture with hardness detection for segmentation at night//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE:16938-16948 [DOI: 10.1109/CVPR52688.2022.01643http://dx.doi.org/10.1109/CVPR52688.2022.01643]
Deng Y. P., Liu Q., & Ikenaga T. 2020. Multi-scale contextual attention based HDR reconstruction of dynamic scenes. In Proceedings of SPIE (Vol. 11519, p. 115191F). [DOI: 10.1117/12.2574011http://dx.doi.org/10.1117/12.2574011]
Ding Y, Yuan W, Zhu Q, et al. Transmvsnet: Global context-aware multi-view stereo network with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 8585-8594.
Dong N, Zhang Y, Ding M, Lee GH (2022) Open World DETR: Transformer based Open World Object Detection
Dong S H, Zhou W J, Qian X H and Yu L. 2022. GEBNet: Graph-Enhancement Branch Network for RGB-T Scene Parsing. IEEE Signal Processing Letters,29:2273-2277[DOI:10.1109/LSP.2022.3219350http://dx.doi.org/10.1109/LSP.2022.3219350]
Dong X and Yokoya N. 2024. Understanding dark scenes by contrasting multi-modal observations//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE:840-850 [DOI: 10.1109/WACV57701.2024.00089http://dx.doi.org/10.1109/WACV57701.2024.00089]
Dou M , Khamis S , Degtyarev Y. Fusion4D: Real-time Performance Capture of Challenging Scenes[J].Acm Transactions on Graphics, 2016, 35(4).DOI:10.1145/2897824.2925969http://dx.doi.org/10.1145/2897824.2925969.
Dou M, Davidson P, Fanello S R, et al. Motion2fusion: Real-time volumetric performance capture[J]. ACM Transactions on Graphics (ToG), 2017, 36(6): 1-16.
Du B, Huang Y, Chen J, Huang D (2023) Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, BC, Canada, pp 13435–13444
Erkan U., Engınoğlu S., & Thanh D.N. H. 2019. A recursive mean filter for image denoising. //International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE: 1-5. [DOI: 10.1109/IDAP.2019.8875955http://dx.doi.org/10.1109/IDAP.2019.8875955]
Esteban C H, Schmitt F. Silhouette and stereo fusion for 3D object modeling[J]. Computer Vision and Image Understanding, 2004, 96(3): 367-392.
Fan H, Su H, Guibas L J. A point set generation network for 3d object reconstruction from a single image[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 605-613.
Fan M Y, Lai S Q, Huang J S, Wei X M, Chai Z H, Luo J F and Wei X L. 2021. Rethinking bisenet for real-time semantic segmentation//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville, TN, USA: IEEE: 9716-9725 [DOI: 10.1109/CVPR46437.2021.00959http://dx.doi.org/10.1109/CVPR46437.2021.00959]
Fang H, Han B, Zhang S, et al (2024) Data Augmentation for Object Detection via Controllable Diffusion Models. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp 1246–1255
Fang J, Qiao J, Xue J and Li Z. 2023. Vision-based traffic accident detection and anticipation: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 34(4): 1983-1999[DOI: 10.1109/TCSVT.2023.3307655http://dx.doi.org/10.1109/TCSVT.2023.3307655]
Feng C, Zhong Y, Jie Z, et al (2024) InstaGen: Enhancing Object Detection by Training on Synthetic Dataset. In: Proceedings of the IEEE / CVF Computer Vision and Pattern Recognition
Fridovich-Keil S, Meanti G, Warburg F R. K-planes: Explicit radiance fields in space, time, and appearance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 12479-12488.
Fu J, Liu J, Wang Y H, Li Y, Bao Y J, Tang J H and Lu H Q. 2019. Adaptive context network for scene parsing//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul Korea: ICCV: 6748-6757 [DOI: 10.1109/ICCV.2019.00685http://dx.doi.org/10.1109/ICCV.2019.00685]
Fu Y P, Chen Q Q and Zhao H F. 2022. CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation. The Visual Computer38(9):1432-2315[DOI:10.1007/s00371-022-02559-2http://dx.doi.org/10.1007/s00371-022-02559-2]
Fu Z, Yang Y, Tu X, et al. Learning a simple low-light image enhancer from paired low-light instances[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 22252-22261.
Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(8): 1362-1376.
Galliani S, Lasinger K, Schindler K. Massively parallel multiview stereopsis by surface normal diffusion[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 873-881.
Gao H, Guo J, Wang G and Zhang Q. 2022. Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE: 9913-9923 [DOI: 10.1109/CVPR52688.2022.00968http://dx.doi.org/10.1109/CVPR52688.2022.00968]
Gao W , Tedrake R .SurfelWarp: Efficient Non-Volumetric Single View Dynamic Reconstruction.2019.DOI:10.48550/arXiv.1904.13073http://dx.doi.org/10.48550/arXiv.1904.13073.
Gao H., Tao X., Shen X., et al. 2019. Dynamic scene deblurring with parameter selective sharing and nested skip connections. //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3848-3856. [DOI: 10.1109/CVPR.2019.00397http://dx.doi.org/10.1109/CVPR.2019.00397]
Ge Z, Liu S, Wang F, et al (2021) YOLOX: Exceeding YOLO Series in 2021
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587
Gong D., Tan M., Zhang Y., van den Hengel A., & Shi Q. 2016. Blind Image Deconvolution by Automatic Gradient Activation. //IEEE Conference on Computer Vision and Pattern Recognition (CVPR).[DOI: 10.1109/CVPR.2016.180http://dx.doi.org/10.1109/CVPR.2016.180]
Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Commun ACM 63:139–144. https://doi.org/10.1145/3422622https://doi.org/10.1145/3422622
Gou J P, Zhou X B, Du L, Zhan Y B, Chen W and Yi Z. 2024. Difference-Aware Distillation for Semantic Segmentation. IEEE Transactions on Multimedia:1-12[DOI:10.1109/TMM.2024.3405619http://dx.doi.org/10.1109/TMM.2024.3405619]
Gu J, Lu H, Zuo W, et al. Blind super-resolution with iterative kernel correction[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 1604-1613.
Gu X, Fan Z, Zhu S, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 2495-2504.
Guo C, Li C, Guo J, et al. Zero-reference deep curve estimation for low-light image enhancement[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1780-1789.
Guo M H, Lu C Z, Hou Q, Liu Z, Cheng M M and Hu S M. 2022. Segnext: Rethinking convolutional attention design for semantic segmentation. Advances in Neural Information Processing Systems, 35:1140-1156 [DOI: 10.48550/arXiv.2209.08575http://dx.doi.org/10.48550/arXiv.2209.08575]
Ha Q S, Watanabe K, Karasawa T, Ushiku Y and Harada T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multispectral scenes//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver: IEEE: 2017: 5108-5115. [DOI: 10.1109/IROS.2017.8206396http://dx.doi.org/10.1109/IROS.2017.8206396]
Ha Q, Watanabe K, Karasawa T, Ushiku Y and Harada T. 2017. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes//Proceedings of the RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver:IEEE:5108-5115 [DOI:10.1109/IROS.2017.8206396http://dx.doi.org/10.1109/IROS.2017.8206396]
Han D., Li L., Guo X. J., & Ma J. Y. 2022. Multi-exposure image fusion via deep perceptual enhancement. Information Fusion, 79:248-262. [DOI: 10.1016/j.inffus.2021.10.006http://dx.doi.org/10.1016/j.inffus.2021.10.006]
Hasselgren J, Hofmann N, Munkberg J. Shape, light, and material decomposition from images using monte carlo rendering and denoising[J]. Advances in Neural Information Processing Systems, 2022, 35: 22856-22869.
Hazirbas C, Ma L, Domokos C and Cremers D. 2017. FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture//Proceedings of the Computer Vision - ACCV 2016: 13th Asian Conference on Computer Vision. Taipei:Springer International Publishing:213-228 [DOI:10.1007/978-3-319-54181-5_14http://dx.doi.org/10.1007/978-3-319-54181-5_14]
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Hernández C, Vogiatzis G, Cipolla R. Probabilistic visibility for multi-view stereo[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007: 1-8.
Hong W X, Guo Q P, Zhang W, Chen J D, Chu W. 2021. LPSNet: A lightweight solution for fast panoptic segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE: 16741-16749 [DOI:10.1109/CVPR46437.2021.01647http://dx.doi.org/10.1109/CVPR46437.2021.01647]
Hou R, Li J, Bhargava A, Raventos A, Guizilini V, Fang C, Lynch J and Gaidon A. 2020. Real-Time Panoptic Segmentation From Dense Detections//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 8520-8529 [DOI:10.1109/CVPR42600.2020.00855http://dx.doi.org/10.1109/CVPR42600.2020.00855]
Hu J, Gallo O, Pulli K, et al. HDR deghosting: How to deal with saturation?[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 1163-1170.
Hu J, Huang L Y, Ren T H, Zhang S C, Ji R R and Cao L J. 2023. You Only Segment Once: Towards Real-Time Panoptic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE: 17819-17829 [DOI:10.1109/CVPR52729.2023.01709http://dx.doi.org/10.1109/CVPR52729.2023.01709]
Hu X X, Yang K L, Fei L, Wang K W. 2019. ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation//Proceedings of the IEEE International Conference on Image Processing (ICIP). Taipei:IEEE:1440-1444 [DOI:10.1109/ICIP.2019.8803025http://dx.doi.org/10.1109/ICIP.2019.8803025]
Hu Y. T., Zhen R. W., & Sheikh H. 2019. CNN-based deghosting in high dynamic range imaging.//IEEE International Conference on Image Processing (ICIP). New York: IEEE Press,4360-4364. [DOI: 10.1109/ICIP.2019.8803421http://dx.doi.org/10.1109/ICIP.2019.8803421]
Huang Y, Li S, Wang L, et al. Unfolding the alternating optimization for blind super resolution[J]. Advances in Neural Information Processing Systems, 2020, 33: 5632-5643.
Huang Y-X, Liu H-I, Shuai H-H, Cheng W-H (2024) DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Ji D J, Wang H R, Tao M Y, Huang J Q, Hua X S, Lu H T. 2022. Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans:IEEE:16855-16864 [DOI:10.1109/CVPR52688.2022.01637http://dx.doi.org/10.1109/CVPR52688.2022.01637]
Joseph KJ, Khan S, Khan FS, Balasubramanian VN (2021) Towards Open World Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)
Junos MH, Mohd Khairuddin AS, Dahari M (2022) Automated object detection on aerial images for limited capacity embedded device using a lightweight CNN model. Alex Eng J 61:6023–6041. https://doi.org/10.1016/j.aej.2021.11.027https://doi.org/10.1016/j.aej.2021.11.027
Kalantari N K, Ramamoorthi R. Deep high dynamic range imaging of dynamic scenes[J]. ACM Trans. Graph., 2017, 36(4): 144:1-144:12.
Kar A , Hne C , Malik J .Learning a Multi-View Stereo Machine[J]. 2017.DOI:10.48550/arXiv.1708.05375http://dx.doi.org/10.48550/arXiv.1708.05375.
Kerbl B, Kopanas G, Leimkühler T. 3d gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 1-14.
Kuhn A, Lin S, Erdler O. Plane completion and filtering for multi-view stereo reconstruction[C]//Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10–13, 2019, Proceedings 41. Springer International Publishing, 2019: 18-32.
Kumari R., & Mustafi A. 2022. Denoising of images using fractional Fourier transform.//2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET).IEEE:1-6.[DOI:10.1109/ICEFEET53083.2022.9845527http://dx.doi.org/10.1109/ICEFEET53083.2022.9845527]
Lee H, Kang S, Chung K (2022) Robust Data Augmentation Generative Adversarial Network for Object Detection. Sensors 23:157. https://doi.org/10.3390/s23010157https://doi.org/10.3390/s23010157
Lee J, Son H, Rim J, et al. Iterative filter adaptive network for single image defocus deblurring[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 2034-2042.
Li C, Li L, Jiang H, et al (2022a) YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
Li F, Zhang H, Liu S, et al (2022b) Dn-detr: Accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13619–13627
Li G, Wang Y, Liu Z, Zhang X and Zeng D. 2023. RGB-T Semantic Segmentation With Location, Activation, and Sharpening//IEEE Transactions on Circuits and Systems for Video Technology, 33(3): 1223-1235 [DOI: 10.1109/TCSVT.2022.3208833http://dx.doi.org/10.1109/TCSVT.2022.3208833]
Li S , Zhang G , Luo Z ,et al.From general to specific: Online updating for blind super-resolution[J].Pattern Recognition: The Journal of the Pattern Recognition Society, 2022:127.[DOI:10.1016/j.patcog.2022.108613http://dx.doi.org/10.1016/j.patcog.2022.108613]
Li X, Li B, Jin X, et al. Learning distortion invariant representation for image restoration from a causality perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1714-1724.
Li* LH, Zhang* P, Zhang* H, et al (2022) Grounded Language-Image Pre-training. In: CVPR
Li H., Wu X. J., & Durrani T. 2020. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement, 69(12), 9645-9656. [DOI: 10.1109/TIM.2020.3005230http://dx.doi.org/10.1109/TIM.2020.3005230]
Liang Z, Li C, Zhou S, et al. Iterative prompt learning for unsupervised backlit image enhancement[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 8094-8103.
Liao J, Ding Y, Shavit Y, et al. Wt-mvsnet: window-based transformers for multi-view stereo[J]. Advances in Neural Information Processing Systems, 2022, 35: 8564-8576.
Lin T-Y, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Lin T. Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., ... & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740-755). Springer International Publishing.
Lingfeng Tang, Huan Huang, Yafei Zhang, et al. Spatial aware channel attention guided high dynamic image reconstruction[J]. Journal of Image and Graphics, 2022,27(12):3581-3595.
唐凌峰, 黄欢, 张亚飞, 等. 空间感知通道注意力引导的高动态图像重建[J]. 中国图象图形学报, 2022,27(12):3581-3595.[DOI: 10.11834/jig.211039http://dx.doi.org/10.11834/jig.211039]
Liu J, Liu Y, Lin J, Li J, Sun P, Hu B, Song L, Boukerche A and Leung V. 2024. Networking Systems for Video Anomaly Detection: A Tutorial and Survey[EB/OL]. [2024-05-15]. http://arxiv.org/abs/2405.10347.pdfhttp://arxiv.org/abs/2405.10347.pdf
Liu R, Ma L, Zhang J, et al. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 10561-10570.
Liu S, Li F, Zhang H, et al (2022) DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp 21–37
Liu Y F, Chen K, Liu C, Qin Z C, Luo Z B and Wang J D. 2019. Structured knowledge distillation for semantic segmentation//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Long Beach, CA, USA: IEEE: 2604-2613 [DOI: 10.1109/CVPR.2019.00271http://dx.doi.org/10.1109/CVPR.2019.00271]
Liu J. Y., Fan X., Huang Z. B., Wu G. Y., Liu R. S., Zhong W., & Luo Z. X. 2022. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. //Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 5802-5811. [DOI: 10.1109/CVPR52688.2022.00570http://dx.doi.org/10.1109/CVPR52688.2022.00570]
Long J , Shelhamer E , Darrell T. 2015. Fully Convolutional Networks for Semantic Segmentation//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Boston, MA, USA: IEEE: 3431-3440 [DOI:10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Lu X, Li B, Yue Y, et al (2019) Grid r-cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7363–7372
Lugmayr A, Danelljan M, Timofte R. Ntire 2020 challenge on real-world image super-resolution: Methods and results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020: 494-495.
Luo K, Guan T, Ju L, et al. Attention-aware multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1590-1599.
Ma L, Ma T, Liu R, et al. Toward fast, flexible, and robust low-light image enhancement[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 5637-5646.
Ma J. Y., Tang L. F., Xu M. L., Zhang H., & Xiao G. B. 2021. STDFusionNet: An infrared and visible image fusion network based on salient target detection. IEEE Transactions on Instrumentation and Measurement, 70,1-13. [DOI: 10.1109/TIM.2021.3075747http://dx.doi.org/10.1109/TIM.2021.3075747]
Mahaur B, Mishra KK, Kumar A (2023) An improved lightweight small object detection framework applied to real-time autonomous driving. Expert Syst Appl 234:121036. https://doi.org/10.1016/j.eswa.2023.121036https://doi.org/10.1016/j.eswa.2023.121036
Mao X, Liu Y, Liu F, et al. Intriguing findings of frequency selection for image deblurring[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(2): 1905-1913.
Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings eighth IEEE international conference on computer vision. ICCV 2001. IEEE, 2001, 2: 416-423.
Mellado, Nicolas, Aiger, Dror, Mitra, Niloy J. 2014. Super 4pcs fast global pointcloud registration via smart indexing. Computer Graphics Forum, 33(5): 205-215.
Meng D, Chen X, Fan Z, et al (2021) Conditional DETR for Fast Training Convergence. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, pp 3631–3640
Mescheder L , Oechsle M , Niemeyer M. Occupancy Networks: Learning 3D Reconstruction in Function Space[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2019.DOI:10.1109/CVPR.2019.00459http://dx.doi.org/10.1109/CVPR.2019.00459.
Mildenhall B , Srinivasan P P , Tancik M. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[C]//2020.DOI:10.48550/arXiv.2003.08934http://dx.doi.org/10.48550/arXiv.2003.08934.
Munkberg J , Hasselgren J , Shen T. Extracting Triangular 3D Models, Materials, and Lighting From Images[J]. 2021.DOI:10.48550/arXiv.2111.12503http://dx.doi.org/10.48550/arXiv.2111.12503.
Murali V, Sudeep P V. Image denoising using DnCNN: An exploration study[C]//Advances in Communication Systems and Networks: Select Proceedings of ComNet 2019. Springer Singapore, 2020: 847-859.
Nah S., Kim T. H., & Lee, K. M. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring. //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3883-3891.[DOI: 10.1109/CVPR.2017.413http://dx.doi.org/10.1109/CVPR.2017.413]
Newcombe R A , Izadi S , Hilliges O. KinectFusion: Real-time dense surface mapping and tracking[C]//IEEE International Symposium on Mixed & Augmented Reality.IEEE, 2012.DOI:10.1109/ISMAR.2011.6092378http://dx.doi.org/10.1109/ISMAR.2011.6092378.
Nguyen K, Fookes C, Sridharan S, Tian Y, Liu F, Liu X and Ross A. 2022. The state of aerial surveillance: A survey[EB/OL].[2022-06-13]. https://arxiv.org/pdf/2201.03080.pdfhttps://arxiv.org/pdf/2201.03080.pdf
Nguyen RM, Kim SJ, Brown MS (2014) Illuminant aware gamut-based color transfer. In: Computer Graphics Forum. Wiley Online Library, pp 319–328
Niu Y, Wu J, Liu W, et al. Hdr-gan: Hdr image reconstruction from multi-exposed ldr images with large motions[J]. IEEE Transactions on Image Processing, 2021, 30: 3885-3896.
Pan L, Chowdhury S, Hartley R, et al. Dual pixel exploration: Simultaneous depth estimation and image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4340-4349.
Pang J, Chen K, Shi J, et al (2019) Libra R-CNN: Towards Balanced Learning for Object Detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 821–830
Park J J , Florence P , Straub J. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation.2019.DOI:10.1109/CVPR.2019.00025http://dx.doi.org/10.1109/CVPR.2019.00025.
Park K, Sinha U, Barron J T, et al. Nerfies: Deformable neural radiance fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5865-5874.
Park M, Tran DQ, Jung D, Park S (2020) Wildfire-Detection Method Using DenseNet and CycleGAN Data Augmentation-Based Remote Camera Imagery. Remote Sens 12:3715. https://doi.org/10.3390/rs12223715https://doi.org/10.3390/rs12223715
Paszke A, Chaurasia A, Kim S and Culurciello E. 2016. Enet: a deep neural network architecture for real-time semantic segmentation[EB/OL].[2016-06-07]. https://arxiv.org/pdf/1606.02147.pdfhttps://arxiv.org/pdf/1606.02147.pdf
Pumarola A , Corona E , Pons-Moll G. D-NeRF: Neural Radiance Fields for Dynamic Scenes[J]. 2020.DOI:10.48550/arXiv.2011.13961http://dx.doi.org/10.48550/arXiv.2011.13961.
Qi C R , Su H , Mo K. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation[J].IEEE, 2017.DOI:10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16.
Qi, Charles R., Su, Hao, Mo, Kaichun, Guibas, Leonidas J. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 652-660.
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, pp 779–788
Redmon J, Farhadi A (2016) YOLO9000: Better, Faster, Stronger
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement
Ren C, Xu Q, Zhang S, et al. Hierarchical prior mining for non-local multi-view stereo[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 3611-3620.
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031https://doi.org/10.1109/TPAMI.2016.2577031
Salhaoui M, Guerrero-González A, Arioua M, Ortiz F J, El Oualkadi A and Torregrosa C L. 2019. Smart industrial iot monitoring and control system based on UAV and cloud computing applied to a concrete plant. Sensors, 19(15): 3316[DOI: https://doi.org/10.3390/s19153316http://dx.doi.org/https://doi.org/10.3390/s19153316]
Santhosh K.K, Dogra D P and Roy P P. 2020. Anomaly detection in road traffic using visual surveillance: A survey. ACM Computing Surveys, 53(6): 1-26 [DOI: 10.1145/3417989http://dx.doi.org/10.1145/3417989]
Sarode, Vinit, Li, Xueqian, Goforth, Hunter, Aoki, Yasuhiro, Srivatsan, Rangaprasad Arun, Lucey, Simon, Choset, Howie. 2019. PCRNet: Point Cloud Registration Network using PointNet Encoding. arXiv preprint arXiv:1908.07906.
Satapathy L. M., Das P., Shatapathy A., et al. 2019. Bio-medical image denoising using wavelet transform. International Journal of Recent Technology and Engineering, 8(1): 2874-2879.[ DOI: 10.35940/ijrte.A1659.098119]
Schnabel, Ruwen, Wahl, Roland, Klein and Reinhard. 2007. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum, 26(2): 214-226.
Sen P, Kalantari N K, Yaesoubi M, et al. Robust patch-based hdr reconstruction of dynamic scenes[J]. ACM Trans. Graph., 2012, 31(6): 203:1-203:11.
Sharp, Gregory C, Lee, Sang W, Wehe, David K. 2002. ICP registration using invariant features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1): 90-102.
Shen S. Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes[J]. IEEE transactions on image processing, 2013, 22(5): 1901-1914.
Shi M, Lin S W, Yi Q M, Weng J, Luo A W and Zhou Y C. 2024. Lightweight context-aware network using partial-channel transformation for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems: 1-16 [DOI: 10.1109/TITS.2023.3348631http://dx.doi.org/10.1109/TITS.2023.3348631]
Shivakumar S S, Rodrigues N, Zhou A, Miller I D, Kumar V J, Taylor C J. 2020. PST900: RGB-Thermal Calibration, Dataset and Segmentation Network//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Paris:IEEE: [DOI:10.1109/ICRA40945.2020.9196831http://dx.doi.org/10.1109/ICRA40945.2020.9196831]
Shocher A, Cohen N, Irani M. “zero-shot” super-resolution using deep internal learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 3118-3126.
Shu C Y, Liu Y F, Gao J F, Yan Z and Shen C H. 2021. Channel-wise knowledge distillation for dense prediction//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada: IEEE: 5311-5320 [DOI: 10.1109/ICCV48922.2021.00526http://dx.doi.org/10.1109/ICCV48922.2021.00526]
Silberman N, Hoiem D, Kohli P and Fergus R. 2012. Indoor Segmentation and Support Inference from RGBD Images//Proceedings of the 12th European Conference on Computer Vision, Florence: Springer: 746-760 [DOI: 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54]
Simard PY, Steinkraus D, Platt JC, others (2003) Best practices for convolutional neural networks applied to visual document analysis. In: Icdar. Edinburgh
Singh A, Patil, D and Omkar S.N. 2018. Eye in the sky: Real-time drone surveillance system (dss) for violent individuals identification using scatternet hybrid deep learning network//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Salt Lake City, USA: IEEE:1629-1637
Song L, Chen A, Li Z, et al. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(5): 2732-2742.
Song S, Lichtenberg S P and Xiao J. 2015. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 567-576 [DOI: 10.1109/CVPR.2015.7298655http://dx.doi.org/10.1109/CVPR.2015.7298655]
Srivastava A, Badal T, Garg A, Vidyarthi A and Singh R. 2021. Recognizing human violent action using drone surveillance within real-time proximity. Journal of Real-Time Image Processing, 18: 1851-1863[DOI: https://doi.org/10.1007/s11554-021-01171-2http://dx.doi.org/https://doi.org/10.1007/s11554-021-01171-2]
Srivastava A, Badal T, Saxena P, Vidyarthi A and Singh R. 2022. UAV surveillance for violence detection and individual identification. Automated Software Engineering, 29(1): 28[DOI: https://doi.org/10.1007/s10515-022-00323-3http://dx.doi.org/https://doi.org/10.1007/s10515-022-00323-3]
Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Stoyanov, Todor, Magnusson, Martin, Andreasson, Henrik, Lilienthal, Achim J. 2012. Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. The International Journal of Robotics Research, 31(12): 1377-1393.
Stutz D, Geiger A. Learning 3d shape completion from laser scan data with weak supervision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1955-1964.
Sun Y X, Zuo W X and Liu M. 2019. RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes. IEEE Robotics and Automation Letters,4(3): 2576-2583[DOI: 10.1109/LRA.2019.2904733http://dx.doi.org/10.1109/LRA.2019.2904733]
Sun Y, Cao B, Zhu P, Hu Q (2022) Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans Circuits Syst Video Technol 32:6700–6713
Sun Y, Zheng W, Du X, Yan Z (2023) Underwater Small Target Detection Based on YOLOX Combined with MobileViT and Double Coordinate Attention. J Mar Sci Eng 11:1178. https://doi.org/10.3390/jmse11061178https://doi.org/10.3390/jmse11061178
Sun, Y X, Zuo W X, Yun P, Wang H L and Liu M. 2021. FuseSeg: Semantic Segmentation of Urban Scenes Based on RGB and Thermal Data Fusion. IEEE Transactions on Automation Science and Engineering,18(3):1000-1011[DOI:10.1109/TASE.2020.2993143http://dx.doi.org/10.1109/TASE.2020.2993143]
Tao X., Gao H., Shen X., et al. 2018. Scale-recurrent network for deep image deblurring. //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition : 8174-8182. [DOI: 10.1109/CVPR.2018.00853http://dx.doi.org/10.1109/CVPR.2018.00853]
Tatarchenko M , Dosovitskiy A , Brox T .Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs[J].IEEE, 2017.DOI:10.1109/ICCV.2017.230http://dx.doi.org/10.1109/ICCV.2017.230.
Testolina P, Barbato F, Michieli U, Giordani M, Zanuttigh P and Zorzi M. 2023. Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints. IEEE Transactions on Intelligent Transportation Systems,24(7):7012-7024 [DOI: 10.1109/TITS.2023.3257086http://dx.doi.org/10.1109/TITS.2023.3257086]
Tikhonov, A. N.1963. On the solution of ill-posed problems and the method of regularization. //Doklady akademii nauk: volume 151. Russian Academy of Sciences:501-504
Trinidad M. C., Martin-Brualla R., Kainz F., et al. 2019. Multiview image fusion.//2019 IEEE/CVF International Conference on Computer Vision (ICCV) . New York: IEEE Press:4100-4109.[ DOI: 10.1109/ICCV.2019.00420http://dx.doi.org/10.1109/ICCV.2019.00420]
Tulsiani S , Efros A A , Malik J .Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction[J].IEEE, 2018.DOI:10.1109/CVPR.2018.00306http://dx.doi.org/10.1109/CVPR.2018.00306.
Tulsiani S , Zhou T , Efros A A. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2017.DOI:10.1109/CVPR.2017.30http://dx.doi.org/10.1109/CVPR.2017.30.
Vaswani A, Shazeer N, Parmar N, et al (2023) Attention Is All You Need
Wang A, Chen H, Liu L, et al (2024a) YOLOv10: Real-Time End-to-End Object Detection
Wang C, He W, Nie Y, et al (2023a) Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. In: Oh A, Naumann T, Globerson A,等, (eds) Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 51094–51112
Wang C-Y, Bochkovskiy A, Liao H-YM (2023b) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, BC, Canada, pp 7464–7475
Wang C-Y, Yeh I-H, Liao H-YM (2024b) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Wang F, Galliani S, Vogel C, et al. Patchmatchnet: Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14194-14203.
Wang H, Chen Y, Cai Y, Chen L, Li Y, Sotelo M A and Li Z. 2022. SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes. IEEE Transactions on Intelligent Transportation Systems,23(11): 21405-21417[DOI: 10.1109/TITS.2022.3177615http://dx.doi.org/10.1109/TITS.2022.3177615]
Wang H, Wang C, Fu Q, et al (2024c) Cross-Modal Oriented Object Detection of UAV Aerial Images Based on Image Feature. IEEE Trans Geosci Remote Sens 62:1–21. https://doi.org/10.1109/TGRS.2024.3367934https://doi.org/10.1109/TGRS.2024.3367934
Wang N, Zhang Y, Li Z, et al. Pixel2mesh: Generating 3d mesh models from single rgb images[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 52-67.
Wang P S , Liu Y , Guo Y X. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis[J].Acm Transactions on Graphics, 2017, 36(4):72.DOI:10.1145/3072959.3073608http://dx.doi.org/10.1145/3072959.3073608.
Wang S, Xu X, Ma X, Jiang K and Wang Z. 2023. Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic Segmentation//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa: ACM:163-172 [DOI: 10.1145/3581783.3611956http://dx.doi.org/10.1145/3581783.3611956]
Wang T X, Jiang Z, Jiang D G, Li B L, Guo C L. 2022. Improved ICP laser point cloud registration algorithm integrating PCA. Remote Sensing Information, 37(2): 70-76.
Wang W Y and Ulrich N. 2018. Depth-aware cnn for rgb-d segmentation//Proceedings of the Computer Vision–ECCV 2018: 15th European Conference. Munich:Springer International Publishing:144-161 [DOI:10.1007/978-3-030-01252-6_9http://dx.doi.org/10.1007/978-3-030-01252-6_9]
Wang Wenjing, Yang Wenhan, Fang Yuming, Huang Hua, Liu Jiaying. 2024. Visual perception and understanding in degraded scenarios. Journal of Image and Graphics, 29(06):1667-1684
汪文靖, 杨文瀚, 方玉明, 黄华, 刘家瑛. 2024. 恶劣场景下视觉感知与理解综述. 中国图象图形学报, 29(06):1667-1684[DOI:10.11834/jig.240041http://dx.doi.org/10.11834/jig.240041]
Wang X, Xie L, Dong C, et al. Real-esrgan: Training real-world blind super-resolution with pure synthetic data[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 1905-1914.
Wang Y K, Zhou W, Jiang T, Bai X and Xu Y C. 2020. Intra-class feature variation distillation for semantic segmentation//Proceedings of the European conference on computer vision. Glasgow, UK: Springer, Cham: Part VII 16: 346-362[DOI: 10.1007/978-3-030-58571-6_21http://dx.doi.org/10.1007/978-3-030-58571-6_21]
Wang Y, Chen X, Cao L, Huang W, Sun F and Wang Y. 2022. Multimodal Token Fusion for Vision Transformers//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 12176-12185 [DOI: 10.1109/CVPR52688.2022.01187http://dx.doi.org/10.1109/CVPR52688.2022.01187]
Wang Y, Chen X, Cao L, Huang W, Sun F and Wang Y. Multimodal Token Fusion for Vision Transformers//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE: 2022: 12176-12185 [DOI: 10.1109/CVPR52688.2022.01187http://dx.doi.org/10.1109/CVPR52688.2022.01187]
Wang Y, Gao R, Chen K, et al (2024d) DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Wang Y, Sun F, Huang W, He F and Tao D. 2023. Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5481-5496 [DOI: 10.1109/TPAMI.2022.3211086http://dx.doi.org/10.1109/TPAMI.2022.3211086]
Wang Y, Zeng Z, Guan T, et al. Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1621-1630.
Wang Z, Li Y, Chen X, et al (2023c) Detecting everything in the open world: Towards universal object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 11433–11443
Wang Z, Li Y, Chen X, et al (2024e) UniDetector: Towards Universal Object Detection with Heterogeneous Supervision. IEEE Trans Pattern Anal Mach Intell 1–18. https://doi.org/10.1109/TPAMI.2024.3411595https://doi.org/10.1109/TPAMI.2024.3411595
Wang, Yue, Solomon, Justin M. 2019. Deep Closest Point: Learning Representations for Point Cloud Registration. In Proceedings of the IEEE/CVF international conference on computer vision, 3523-3532.
Wang Z., Cun X., Bao J., et al. 2022. Uformer: A general u-shaped transformer for image restoration.//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:17683-17693.[DOI:10.1109/CVPR52688.2022.01722http://dx.doi.org/10.1109/CVPR52688.2022.01722]
Wei P., Xie Z., Lu H., et al. 2020. Component Divide-and-Conquer for Real-World Image Super-Resolution. //Proceedings of the European Conference on Computer Vision (ECCV): 1-17. [DOI: 10.1007/978-3-030-58598-3_7http://dx.doi.org/10.1007/978-3-030-58598-3_7]
Wittner C , Schauerte B , Stiefelhagen R .What's the point? Frame-wise Pointing Gesture Recognition with Latent-Dynamic Conditional Random Fields[J].Computer Science, 2015.DOI:10.48550/arXiv.1510.05879http://dx.doi.org/10.48550/arXiv.1510.05879.
Wu P, Liu J, Shi Y, Sun Y, Shao F, Wu Z and Yang Z. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision//Computer Vision–ECCV 2020: 16th European Conference. Glasgow, UK: 322-339[DOI: https://doi.org/10.1007/978-3-030-58577-8_20http://dx.doi.org/https://doi.org/10.1007/978-3-030-58577-8_20]
Wu P, Liu X. and Liu J. 2022. Weakly supervised audio-visual violence detection. IEEE Transactions on Multimedia, 25: 1674-1685[DOI: 10.1109/TMM.2022.3147369http://dx.doi.org/10.1109/TMM.2022.3147369]WuR, NieJH, G H, LiuY, LuHT. 2022. SingleMatch: a point cloud coarse registration method with single match point and deep-learning describer. Multimedia Tools and Applications, 81(12): 16967-16986.
Wu S, Xu J, Tai Y W, et al. Deep high dynamic range imaging with large foreground motions[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 117-132.
Wu Y X, Zhang G W, Gao Y M, Deng X J, Gong K, Liang X D and Lin L. 2020. Bidirectional Graph Reasoning Network for Panoptic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 9077-9086 [DOI:10.1109/CVPR42600.2020.00910http://dx.doi.org/10.1109/CVPR42600.2020.00910]
Wu Yue, Yuan Yongzhe, Xiang Benhua, Sheng Jinlong, Lei Jiayi, Hu Congying, Gong Maoguo, Ma Wenping, Miao Qiguang. 2023. Overview of the computational intelligence method in 3D point cloud registration. Journal of Image and Graphics, 28(09):2763-2787
武越, 苑咏哲, 向本华, 绳金龙, 雷佳熠, 胡聪颖, 公茂果, 马文萍, 苗启广. 2023. 三维点云配准中的计算智能方法综述. 中国图象图形学报, 28(09):2763-2787[DOI:10.11834/jig.220727http://dx.doi.org/10.11834/jig.220727]
Wu Z , Song S , Khosla A. 3D ShapeNets: A Deep Representation for Volumetric Shapes[J].IEEE, 2015.DOI:10.1109/CVPR.2015.7298801http://dx.doi.org/10.1109/CVPR.2015.7298801.
Xia R, Zhao C, Zheng M, Wu Z, Sun Q and Tang Y. 2023. Cmda: Cross-modality domain adaptation for nighttime semantic segmentation//Proceedings of the IEEE International Conference on Computer Vision. Paris: IEEE:21572-21581 [DOI: 10.1109/ICCV51070.2023.01972http://dx.doi.org/10.1109/ICCV51070.2023.01972]
Xie H , Yao H , Sun X. Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV).IEEE, 2020.DOI:10.1109/ICCV.2019.00278http://dx.doi.org/10.1109/ICCV.2019.00278.
Xie Z, Wang S, Xu K, Zhang Z, Tan X, Xie Y and Ma L. 2023. Boosting night-time scene parsing with learnable frequency. IEEE Transactions on Image Processing,32:2386-2398 [DOI: 10.1109/TIP.2023.3267044http://dx.doi.org/10.1109/TIP.2023.3267044]
Xiong Y W, Liao R J, Zhao H S, Hu R, Bai M,Yumer E and Urtasun R. 2019. UPSNet: A Unified Panoptic Segmentation Network//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE: 8810-8818 [DOI:10.1109/CVPR.2019.00902http://dx.doi.org/10.1109/CVPR.2019.00902]
Xu J C, Xiong Z X and Bhattacharyya S P2023. PIDNet: A real-time semantic segmentation network inspired by PID controllers//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver, BC, Canada: IEEE: 19529-19539 [DOI: 10.1109/CVPR52729.2023.01871http://dx.doi.org/10.1109/CVPR52729.2023.01871]
Xu Ke, Liu Xinpu, Wang Hanyun, Wan Jianwei, Guo Yulan. 2024. Infrared-visible image object detection algorithm using feature dynamic selection. Journal of Image and Graphics, 29(08):2350-2363.
许可, 刘心溥, 汪汉云, 万建伟, 郭裕兰. 2024. 红外与可见光图像特征动态选择的目标检测网络. 中国图象图形学报, 29(08):2350-2363[DOI: 10.11834/jig.230495http://dx.doi.org/10.11834/jig.230495]
Xu Q, Kong W, Tao W, et al. Multi-scale geometric consistency guided and planar prior assisted multi-view stereo[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4945-4963.
Xu Q, Tao W. Multi-scale geometric consistency guided multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5483-5492.
Xu Q, Tao W. Planar prior assisted patchmatch multi-view stereo[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 12516-12523.
Xu S, Wang X, Lv W, et al (2022) PP-YOLOE: An evolved version of YOLO
Xu Y, Zhang M, Fu C, et al (2023) Multi-modal Queried Object Detection in the Wild. In: Thirty-seventh Conference on Neural Information Processing Systems
Xu Z, Liu Y, Shi X, et al. Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 5981-5990.
Yan Q, Zhang L, Liu Y, et al. Deep HDR imaging via a non-local network[J]. IEEE Transactions on Image Processing, 2020, 29: 4308-4322.
Yan Q., Gong D., Shi Q., et al. 2019. Attention-guided network for ghost-free high dynamic range imaging.//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1751-1760. [DOI: 10.1109/CVPR.2019.00182http://dx.doi.org/10.1109/CVPR.2019.00182]
Yang C G, Zhou H L, An Z L, Jiang X and Xu Y J and Zhang Q. 2022. Cross-Image Relational Knowledge Distillation for Semantic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans:IEEE:12309-12318 [DOI:10.1109/CVPR52688.2022.01200http://dx.doi.org/10.1109/CVPR52688.2022.01200]
Yang C, Lin Z, Lan Z, et al (2024) Evolutionary channel pruning for real-time object detection. Knowl-Based Syst 287:111432. https://doi.org/10.1016/j.knosys.2024.111432https://doi.org/10.1016/j.knosys.2024.111432
Yang J L, Li H D, Campbell, Dylan, Jia Yunde. 2015. Go-ICP: A globally optimal solution to 3D ICP point-set registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11): 2241-2254.
Yang J, Mao W, Alvarez J M, et al. Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 4877-4886.
Yang M J, Zhong Y Z, Guo B, Tian S Y. 2022. Point cloud registration based on improved grey wolf optimization algorithm. Computer Simulation, 39(12): 513-518.
Yang Y, Pan L, Liu L, et al. K3dn: Disparity-aware kernel estimation for dual-pixel defocus deblurring[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 13263-13272.
Yang Z, Gao X, Zhou W, et al. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction[J]. arXiv preprint arXiv:2309.13101, 2023.
Yao Y, Luo Z, Li S, et al. Mvsnet: Depth inference for unstructured multi-view stereo[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 767-783.
Yao Y, Luo Z, Li S, et al. Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 5525-5534.
Yao Z., Ai J., Li B., & Zhang C. (2021). Efficient detr: improving end-to-end object detector with dense prior. arxiv preprint arxiv:2104.01318.
Ye T, Qin W, Zhao Z, et al (2023) Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformer. IEEE Trans Instrum Meas 72:1–13. https://doi.org/10.1109/TIM.2023.3241825https://doi.org/10.1109/TIM.2023.3241825
Ye X. R., Li Z. P., & Xu C. 2022. Ghost-free multi-exposure image fusion technology based on the multi-scale block LBP operator. Electronics, 11(19):3129. [DOI: 10.3390/electronics11193129http://dx.doi.org/10.3390/electronics11193129]
Yu C Q, Gao C X, Wang J B, Yu G, Shen C H and Sang N. 2021. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International journal of computer vision,129: 3051-3068 [DOI: 10.1007/s11263-021-01515-2http://dx.doi.org/10.1007/s11263-021-01515-2]
Yu C Q, Wang J B, Peng C, Gao C X, Yu G and Sang N. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation//Proceedings of the European conference on computer vision. Munich, Germany: Springer, Cham: 325-341 [DOI: 10.1007/978-3-030-01261-8_20http://dx.doi.org/10.1007/978-3-030-01261-8_20]
Yu Q H, Wang H Y, Qiao S Y, Collins M, Zhu Y K, Adam H, Yuille A and Chen L. 2022. k-means Mask Transformer//Proceedings of the Computer Vision–ECCV 2022: 17th European Conference. Berlin: Springer-Verlag: 288–307.[DOI:10.1007/978-3-031-19818-2_17http://dx.doi.org/10.1007/978-3-031-19818-2_17]
Yuan J, Zhou W and Luo T. 2019. DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation. IEEE Access, 7: 169350-169358 [DOI: 10.1109/ACCESS.2019.2955101http://dx.doi.org/10.1109/ACCESS.2019.2955101]
Yuan M, Shi X, Wang N, et al (2024) Improving RGB-infrared object detection with cascade alignment-guided transformer. Inf Fusion 105:102246. https://doi.org/10.1016/j.inffus.2024.102246https://doi.org/10.1016/j.inffus.2024.102246
Yuan M, Wei X (2024) C2Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection. IEEE Trans Geosci Remote Sens 62:1–12. https://doi.org/10.1109/TGRS.2024.3376819https://doi.org/10.1109/TGRS.2024.3376819
Yun S, Han D, Oh SJ, et al (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6023–6032
Zamir S. W., Arora A., Khan S., et al. 2022. Restormer: Efficient transformer for high-resolution image restoration.//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 728-5739. [DOI: 10.1109/CVPR52688.2022.00564http://dx.doi.org/10.1109/CVPR52688.2022.00564]
Zeng A , Xiao J .3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions[J].IEEE, 2017.DOI:10.1109/CVPR.2017.29http://dx.doi.org/10.1109/CVPR.2017.29.
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: Beyond Empirical Risk Minimization. In: International Conference on Learning Representations
Zhang H, Li F, Liu S, et al (2022) DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Zhang J, Liu H, Yang K, Hu X, Liu R and Stiefelhagen R. 2023. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers. IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679-14694 [DOI: 10.1109/TITS.2023.3300537http://dx.doi.org/10.1109/TITS.2023.3300537]
Zhang K, Liang J, Van Gool L, et al. Designing a practical degradation model for deep blind image super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4791-4800.
Zhang L, Liu Z, Zhu X, et al (2024) Weakly Aligned Feature Fusion for Multimodal Object Detection. IEEE Trans Neural Netw Learn Syst 1–15. https://doi.org/10.1109/tnnls.2021.3105143https://doi.org/10.1109/tnnls.2021.3105143
Zhang L, Wu X, Buades A, et al. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding[J]. Journal of Electronic imaging, 2011, 20(2): 023016-023016-16.
Zhang N, Nex F, Kerle N and Vosselman G. 2021. Towards learning low-light indoor semantic segmentation with illumination-invariant features. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B2-2021: 427-432 [DOI: 10.5194/isprs-archives-XLIII-B2-2021-427-2021http://dx.doi.org/10.5194/isprs-archives-XLIII-B2-2021-427-2021]
Zhang P, Zhong Y, Li X (2019) SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, Seoul, Korea (South), pp 37–45
Zhang Q, Zhao S L, Luo Y J, Zhang D W, Huang N C and Han J G. 2021. ABMDRNet: Adaptive-Weighted Bi-Directional Modality Difference Reduction Network for RGB-T Semantic Segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville:IEEE:2633-2642[DOI:10.1109/CVPR46437.2021.00266http://dx.doi.org/10.1109/CVPR46437.2021.00266]
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14. Springer, pp 649–666
Zhang S, Xie T, Wang Y, et al (2023) SF-YOLO: RGB-T Fusion Object Detection in UAV Scenes. In: 2023 8th International Conference on Image, Vision and Computing (ICIVC). pp 51–59
Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G and Shen C. 2022. Topformer: Token pyramid transformer for mobile semantic segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE: 12083-12093 [DOI: 10.1109/CVPR52688.2022.01177http://dx.doi.org/10.1109/CVPR52688.2022.01177]
Zhang X Y,Yang J Q,Zhang S K and Zhang Y N. 2023. A Maximum-Clique-Based 3D Registration Method. Science and Technology Frontier, 2023..
Zhang Y, Li K, Li K, et al. Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 286-301.
Zhang Y, Song Z, Li W (2021) Unsupervised data augmentation for object detection
Zhang K., Zuo W., Chen Y., Meng D., & Zhang L. 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing, 26(7),:3142-3155. [DOI: 10.1109/TIP.2017.2662206http://dx.doi.org/10.1109/TIP.2017.2662206]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017.Pyramid scene parsing network//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: IEEE: 2881-2890[DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
Zhao H S, Zhang Y, Liu S, Shi J P, Loy C C, Lin D H and Jia J Y. 2018. PSAnet: Point-wise spatial attention network for scene parsing//Proceedings of the European conference on computer vision. Munich, Germany: Springer, Cham: 267-283 [DOI:10.1007/978-3-030-01240-3_17http://dx.doi.org/10.1007/978-3-030-01240-3_17]
Zhao T, Yuan M, Jiang F, et al (2024a) Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion
Zhao X, Xia Y, Zhang W, et al (2023) YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection. Remote Sens 15:3778. https://doi.org/10.3390/rs15153778https://doi.org/10.3390/rs15153778
Zhao Y, Lv W, Xu S, et al (2024b) DETRs Beat YOLOs on Real-time Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 16965–16974
Zhao T., Zhang C., Ren W., et al. 2018. Unsupervised Degradation Learning for Single Image Super-Resolution. arXiv. [DOI: 10.48550/arXiv.1812.04240http://dx.doi.org/10.48550/arXiv.1812.04240]
Zhong L., Cho S., Metaxas D., Paris S., & Wang, J. 2013. Handling noise in single image deblurring using directional filters.//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): S612-619. [DOI: 10.1109/CVPR.2013.85http://dx.doi.org/10.1109/CVPR.2013.85]
Zhou B L, Zhao H, Puig X, Fidler S, Barriuso A and Torralba A. 2017. Scene Parsing through ADE20K Dataset//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 5122-5130.[DOI:10.1109/CVPR.2017.544http://dx.doi.org/10.1109/CVPR.2017.544]
Zhou H, Qi L, Wan Z, Huang H and Yang X. 2021. RGB-D Co-attention Network for Semantic Segmentation//Proceedings of the Asian conference on computer vision. Kyoto: Springer: [DOI: 10.1007/978-3-030-69525-5_31http://dx.doi.org/10.1007/978-3-030-69525-5_31]
Zhou Q, Shi H, Xiang W, et al (2024) DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention. IEEE Trans Neural Netw Learn Syst 1–15. https://doi.org/10.1109/TNNLS.2024.3376563https://doi.org/10.1109/TNNLS.2024.3376563
Zhou W J, Lin X Y, Lei J S, Yu L and Hwang J N. 2022. MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB–Thermal Urban Road Scene Parsing. IEEE Transactions on Multimedia,24:2526-2538[DOI:10.1109/TMM.2021.3086618http://dx.doi.org/10.1109/TMM.2021.3086618]
Zhou W J, Liu J F, Lei J S, Yu L and Hwang J N. 2021. GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation. IEEE Transactions on Image Processing,30:7790-7802[DOI:10.1109/TIP.2021.3109518http://dx.doi.org/10.1109/TIP.2021.3109518]
Zhou W J, Yue Y C, Fang M X, Qian X H, Yang R W and Yu L. 2023. BCINet: Bilateral cross-modal interaction network for indoor scene understanding in RGB-D images. Information Fusion,94:32-42.[DOI:10.1016/j.inffus.2023.01.016http://dx.doi.org/10.1016/j.inffus.2023.01.016]
Zhou W, Dong S H, Fang M X and Yu L. 2024. CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing. IEEE Transactions on Intelligent Vehicles,9(1):1919-1929[DOI:10.1109/TIV.2023.3314527http://dx.doi.org/10.1109/TIV.2023.3314527]
Zhou W, Yuan J, Lei J and Luo T. 2021. TSNet: Three-Stream Self-Attention Network for RGB-D Indoor Semantic Segmentation//IEEE Intelligent Systems, 36(4): 73-78 [DOI: 10.1109/MIS.2020.2999462http://dx.doi.org/10.1109/MIS.2020.2999462]
Zhou, Qian-Yi, Park, Jaesik, Koltun, Vladlen. 2016. Fast global registration. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14: 766-782. Springer.
Zhu X, Su W, Lu L, et al (2021) Deformable DETR: Deformable Transformers for End-to-End Object Detection
Zuo W., Ren D., Gu S., et al. 2015. Discriminative learning of iteration-wise priors for blind deconvolution. //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 3232-3240. [DOI: 10.1109/CVPR.2015.7298942http://dx.doi.org/10.1109/CVPR.2015.7298942]
相关文章
相关作者
相关机构