Current Issue Cover
基于机器学习的注视点渲染方法综述

李英群1, 胡啸1, 徐翔2, 徐延宁1, 王璐1(1.山东大学;2.山东财经大学)

摘 要
在大型高分辨显示器和头戴式显式设备中实现实时、逼真的渲染仍然是计算机图形学面临的主要挑战之一。注视点渲染(Foveated Renderin)利用人类视觉系统的局限性,根据注视点调整图像渲染质量,从而在不损失用户感知质量的前提下大大提高渲染速度。随着近年来机器学习方法在渲染领域的广泛应用,涌现出大量基于机器学习的注视点渲染新方法。本文从机器学习的角度对注视点渲染领域的最新方法进行综述。首先,本文概述了人类视觉感知的背景知识。接着,本文简要介绍了注视点渲染中最具代表性的非机器学习方法,包括自适应分辨率、几何简化、着色简化和硬件实现,并总结了这些方法的优缺点。随后,本文描述了文中用于评估机器学习不同方法所使用的评估准则,包括常用的注视点渲染图像的评估指标和注视点预测评估指标。接下来,本文将注视点渲染中的机器学习方法细分为超分辨率、降噪、补全、图像合成、注视点预测和图像应用,对它们进行详细概述和总结。最后,本文提出了机器学习方法目前面临的问题和挑战。通过本文对注视点渲染领域的机器学习方法的讨论,可以更详细的展示机器学习在注视点渲染中的研究前景和发展方向,对后续研究人员在选择研究方向和设计网络架构等方面都有一定的参考价值。
关键词
Machine-learning-based foveated rendering: a review

Liyingqun, Huxiao1, Xuxiang2, Xuyanning1, Wanglu1(1.Shandong University;2.Shandong University of Finance and Economics)

Abstract
The widespread adoption of Virtual Reality (VR) and Augmented Reality (AR) technologies across various sectors, including healthcare, education, military, and entertainment, has propelled Head-Mounted Displays (HMDs) with high resolution and wide fields of view into the forefront of display devices. However, attaining a satisfactory level of immersion and interactivity poses a primary challenge in the realm of virtual reality, with latency potentially leading to user discomfort in the form of dizziness and nausea. Multiple studies have underscored the necessity of achieving a highly realistic VR experience while maintaining user comfort, entailing the elevation of the screen"s image refresh rate to 1800 Hz and keeping latency below 3-40 ms. It is evident that achieving real-time, photorealistic rendering at high resolution and low latency represents a formidable objective. Foveated rendering is an effective approach to address these issues by adjusting the rendering quality across the image based on gaze position, maintaining high quality in the fovea area while reducing quality in the periphery. This technique leads to significant computational savings and improved rendering speed without a perceptible loss in visual quality. While previous reviews have examined technical approaches to foveated rendering, they focused more on categorizing the implementation techniques. A comprehensive review within the domain of machine learning still needs to be explored. With the ongoing advancements in machine learning within the rendering field, combining machine learning and foveated rendering is considered a promising research area, especially in post-processing, where machine learning methods have great potential. Non-machine learning methods inevitably introduce artifacts. In contrast, machine learning methods have a wide range of applications in the post-processing domain of rendering to optimize and improve foveated rendering results and enhance the realism and immersion of foveated images in a manner unattainable through non-machine learning approaches. Therefore, this work presents a comprehensive overview of foveated rendering from a machine-learning perspective. In this paper, we first provide an overview of the background knowledge of human visual perception, including aspects of the human visual system, contrast sensitivity functions, visual acuity models, and visual crowding. Subsequently, this paper briefly describes the most representative non-machine learning methods for point-of-attention rendering, including adaptive resolution, geometric simplification, shading simplification, and hardware implementation, and summarises these methods" features, advantages, and disadvantages. Additionally, we describe the criteria employed for method evaluation in this review, including evaluation metrics for foveated images and gaze-point prediction, e.g., FWQI, FovVideoVDP, NSS, KLD, etc. Next, we subdivide machine learning methods into super-resolution, denoise, image reconstruction, image synthesis, gaze prediction, and image application. We provide a detailed summary of them in terms of four aspects: results quality, network speed, user experience, and the ability to handle objects. Among them, super-resolution methods commonly use more neural blocks in the foveal region while fewer neural blocks in the periphery region, resulting in variable regional super-resolution quality. Similarly, foveated denoising usually performs fine denoising in the fovea and coarse denoising in the peripheral, but the denoising aspect has yet to receive extensive attention. The initial attempt to integrate image reconstruction with gaze utilized GAN networks, yielding promising outcomes. Then, some researchers combined direct prediction and kernel prediction for image reconstruction, which is also the state of the art in this field. Gaze prediction is a significant development direction for future Virtual reality rendering, which is mostly combined with saliency detection to predict the location of the viewpoint. There is a large amount of work in the field, but unfortunately, only a tiny portion of the work can be achieved in real time. Finally, we present the current problems and challenges machine learning methods face. Our review of machine learning approaches in foveated rendering not only elucidates the research prospects and developmental direction but also provides insights for future researchers in choosing research direction and designing network architectures.
Keywords

订阅号|日报