Current Issue Cover
恶劣场景下视觉感知与理解

汪文靖1, 杨文瀚2, 方玉明3, 黄华4, 刘家瑛1(1.北京大学;2.鹏城实验室;3.江西财经大学;4.北京师范大学)

摘 要
恶劣场景下采集的图像与视频数据存在复杂的视觉降质,一方面降低视觉呈现与感知体验,另一方面也为视觉分析理解带来了很大困难。本报告系统地分析了国际国内近年恶劣场景下视觉感知与理解领域的重要研究进展,包括图像视频与降质建模、恶劣场景视觉增强、恶劣场景下视觉分析理解等技术。其中,视觉数据与降质建模部分探讨了不同降质场景下的图像视频与降质过程建模方法,涵盖噪声建模、降采样建模、光照建模、雨雾建模。传统恶劣场景视觉增强部分探讨了早期非深度学习的视觉增强算法,包括直方图均衡化、视网膜大脑皮层理论、滤波方法等。基于深度学习模型的恶劣场景视觉增强部分则以模型架构创新的角度梳理,探讨了卷积神经网络、Transformer模型、扩散模型等架构。不同于传统视觉增强的目标为全面提升人眼对图像视频的视觉感知效果,新一代视觉增强以及分析方法考虑降质场景下机器视觉对图像视频的理解性能。恶劣场景下视觉理解技术部分探讨了恶劣场景下视觉理解数据集和基于深度学习模型的恶劣场景视觉理解,以及恶劣场景下视觉增强与理解协同计算。报告详细综述了上述研究的挑战性,梳理了国内外技术发展脉络和前沿动态。最后,根据上述分析展望了恶劣场景下视觉感知与理解的发展方向。
关键词
Visual perception and understanding in degraded scenarios

(Peking University)

Abstract
Visual media such as images and videos are crucial means for humans to acquire, express, and convey information. With the widespread application of foundational technologies like artificial intelligence and big data, systems for the perception and understanding of images and videos are gradually integrating into all aspects of production and daily life. However, the emergence of massive applications also brings challenges: in open environments, various applications generate vast amounts of heterogeneous data, leading to complex visual degradation in images and videos. For instance, adverse weather conditions like heavy fog can reduce visibility, resulting in the loss of details. Data captured in rainy or snowy weather can exhibit deformations in objects or individuals due to raindrops, creating structured noise. Low-light conditions can cause severe loss of details and structured information in images. Visual degradation not only diminishes the visual presentation and perceptual experience of images and videos but also significantly affects the usability and effectiveness of existing visual analysis and understanding systems. In today"s era of intelligence and information technology, with explosive growth in visual media data, especially in challenging scenarios, visual perception and understanding technologies hold significant scientific significance and practical value. Traditional visual enhancement techniques can be divided into two methods: spatial domain-based and frequency domain-based. Spatial domain methods directly process two-dimensional spatial data, including grayscale transformation, histogram transformation, and spatial domain filtering. Frequency domain methods transform data into the frequency domain through models like Fourier transform for processing and then restore it to the spatial domain. With the development of computer vision technology, more well-designed and robust visual enhancement algorithms have emerged, such as dehazing algorithms based on dark channel priors. With the rapid advancement of artificial intelligence technology, many visual enhancement methods based on deep learning models have emerged. These methods can not only reconstruct damaged visual information but also further improve the visual presentation, comprehensively enhancing the visual perceptual experience of images and videos captured in challenging scenarios. On the other hand, as computer vision technology becomes more widespread, intelligent visual analysis and understanding are penetrating various aspects of society, such as face recognition and autonomous driving. However, in traditional digital image processing frameworks, visual enhancement mainly focuses on improving visual effects, overlooking the impact on high-level analysis tasks, severely reducing the usability and effectiveness of existing visual understanding systems. In recent years, a series of visual understanding datasets for challenging scenarios have been established, leading to the development of numerous visual analysis and understanding algorithms for these scenarios. To further reduce reliance on datasets, domain transfer methods from ordinary scenes to challenging scenes are gaining attention. How to coordinate and optimize the relationship between visual perception and visual presentation, the two different task objectives, is also an important research question in the field of visual computing. Addressing the development needs of the visual computing field in challenging scenarios, this paper extensively reviews the challenges of the aforementioned research, outlines the developmental trends, and explores the cutting-edge dynamics. Specifically, this paper reviews the technologies related to visual degradation modeling, visual enhancement, and visual analysis and understanding in challenging scenarios. In the section on visual data and degradation modeling, various methods for modeling image and video degradation processes in different degradation scenarios are discussed. This includes noise modeling, downsampling modeling, illumination modeling, and rain and fog modeling. In the traditional visual enhancement section, early non-deep learning visual enhancement algorithms are explored. This includes techniques such as histogram equalization, Retinex theory, and filtering methods. The section on visual enhancement based on deep learning models takes an innovative approach to model architecture. It discusses architectures like convolutional neural networks, transformer models, and diffusion models. The section on visual understanding technology in challenging scenarios discusses visual understanding datasets in challenging scenarios and visual understanding in challenging scenarios based on deep learning models. It also explores the collaborative computation of visual enhancement and understanding in challenging scenarios. Finally, based on the analysis, it provides prospects for the future development of visual perception and understanding in adverse scenarios.
Keywords

订阅号|日报