Current Issue Cover

汪文靖1, 杨文瀚2, 方玉明3, 黄华4, 刘家瑛1(1.北京大学;2.鹏城实验室;3.江西财经大学;4.北京师范大学)

摘 要
Visual perception and understanding in degraded scenarios

(Peking University)

Visual media such as images and videos are crucial means for humans to acquire, express, and convey information. With the widespread application of foundational technologies like artificial intelligence and big data, systems for the perception and understanding of images and videos are gradually integrating into all aspects of production and daily life. However, the emergence of massive applications also brings challenges: in open environments, various applications generate vast amounts of heterogeneous data, leading to complex visual degradation in images and videos. For instance, adverse weather conditions like heavy fog can reduce visibility, resulting in the loss of details. Data captured in rainy or snowy weather can exhibit deformations in objects or individuals due to raindrops, creating structured noise. Low-light conditions can cause severe loss of details and structured information in images. Visual degradation not only diminishes the visual presentation and perceptual experience of images and videos but also significantly affects the usability and effectiveness of existing visual analysis and understanding systems. In today"s era of intelligence and information technology, with explosive growth in visual media data, especially in challenging scenarios, visual perception and understanding technologies hold significant scientific significance and practical value. Traditional visual enhancement techniques can be divided into two methods: spatial domain-based and frequency domain-based. Spatial domain methods directly process two-dimensional spatial data, including grayscale transformation, histogram transformation, and spatial domain filtering. Frequency domain methods transform data into the frequency domain through models like Fourier transform for processing and then restore it to the spatial domain. With the development of computer vision technology, more well-designed and robust visual enhancement algorithms have emerged, such as dehazing algorithms based on dark channel priors. With the rapid advancement of artificial intelligence technology, many visual enhancement methods based on deep learning models have emerged. These methods can not only reconstruct damaged visual information but also further improve the visual presentation, comprehensively enhancing the visual perceptual experience of images and videos captured in challenging scenarios. On the other hand, as computer vision technology becomes more widespread, intelligent visual analysis and understanding are penetrating various aspects of society, such as face recognition and autonomous driving. However, in traditional digital image processing frameworks, visual enhancement mainly focuses on improving visual effects, overlooking the impact on high-level analysis tasks, severely reducing the usability and effectiveness of existing visual understanding systems. In recent years, a series of visual understanding datasets for challenging scenarios have been established, leading to the development of numerous visual analysis and understanding algorithms for these scenarios. To further reduce reliance on datasets, domain transfer methods from ordinary scenes to challenging scenes are gaining attention. How to coordinate and optimize the relationship between visual perception and visual presentation, the two different task objectives, is also an important research question in the field of visual computing. Addressing the development needs of the visual computing field in challenging scenarios, this paper extensively reviews the challenges of the aforementioned research, outlines the developmental trends, and explores the cutting-edge dynamics. Specifically, this paper reviews the technologies related to visual degradation modeling, visual enhancement, and visual analysis and understanding in challenging scenarios. In the section on visual data and degradation modeling, various methods for modeling image and video degradation processes in different degradation scenarios are discussed. This includes noise modeling, downsampling modeling, illumination modeling, and rain and fog modeling. In the traditional visual enhancement section, early non-deep learning visual enhancement algorithms are explored. This includes techniques such as histogram equalization, Retinex theory, and filtering methods. The section on visual enhancement based on deep learning models takes an innovative approach to model architecture. It discusses architectures like convolutional neural networks, transformer models, and diffusion models. The section on visual understanding technology in challenging scenarios discusses visual understanding datasets in challenging scenarios and visual understanding in challenging scenarios based on deep learning models. It also explores the collaborative computation of visual enhancement and understanding in challenging scenarios. Finally, based on the analysis, it provides prospects for the future development of visual perception and understanding in adverse scenarios.