Current Issue Cover
面向增强现实的虚实遮挡技术综述

吴宇晖, 李晓娟, 刘越(北京市混合现实与新型显示工程技术研究中心, 北京理工大学光电学院)

摘 要
随着软件技术的快速发展以及硬件设备的不断更新,增强现实技术已逐步成熟并广泛应用于各个领域。在增强现实中,虚实遮挡处理是实现虚拟世界和真实世界无缝融合的前提,对提升用户的沉浸感和真实感具有重要的研究意义。该技术通过建立尽可能精确的虚实物体遮挡关系以呈现逼真的虚实融合效果,使得用户能够正确地感知虚拟物体和真实物体的空间位置关系,从而提升交互体验。本文首先介绍了虚实遮挡的相关背景、概念和总体处理流程。然后针对刚性物体和非刚性物体的不同特点,总结了现有的基于深度、基于图像分析和基于模型三类虚实遮挡处理方法的具体原理、代表性研究工作以及它们对刚性物体和非刚性物体的适用性。在此基础上,从实时性、自动化程度、是否支持动态场景及适用范围等多个角度对现有的虚实遮挡方法进行了对比分析,并归纳了三类虚实遮挡处理方法的具体流程、难点以及局限性。最后针对相关工作中存在的问题,提出了目前虚实遮挡技术所面临的挑战以及未来可能的研究方向,希望能为后续的研究工作提供参考。
关键词
A survey of virtual-real occlusion handling technologies in augmented reality

Wu Yuhui, Li Xiaojuan, Liu Yue(Beijing Engineering Research Center of Mixed Reality and Advanced Display,School of Optics and Photonics,Beijing Institute of Technology)

Abstract
With the rapid development of software technology and the continuous updating of hardware devices, augmented reality technology has gradually matured and been widely used in various fields such as military, medical, gaming, industry, and education. Accurate depth perception is crucial in augmented reality, and simply overlaying virtual objects onto video sequences no longer meets user demands. In many augmented reality scenarios, users need to constantly interact with virtual objects, and without accurate depth perception, it is difficult for augmented reality to provide a seamless interactive experience. Virtual-real occlusion handling is one of the key factors to achieve this goal. It presents a realistic virtual-real fusion effect by establishing accurate occlusion relationship, so that the fusion scene can correctly reflect the spatial position relationship between virtual and real objects, thereby enhancing the user"s sense of immersion and realism. This paper firstly introduces the related background, concepts and overall processing flow of virtual-real occlusion handling. Existing occlusion handling methods can be divided into three categories: depth-based, image analysis-based and model-based. By analyzing the distinct characteristics of rigid and non-rigid objects, we summarize the specific principles, representative research works and the applicability to rigid and non-rigid objects of these three virtual-real occlusion handling methods. The shape and size of rigid objects remain unchanged after motion or force, and they mainly use two types virtual-real occlusion handling methods: depth-based and model-based. The depth-based methods have evolved from the early use of stereo vision algorithms to the use of depth sensors for indoor depth image acquisition, and further to the prediction of moving objects’ depth by using outdoor map data, as well as the densification of sparse SLAM(simultaneous localization and mapping) depth in monocular mobile augmented reality. Further research should focus on the depth image restoration algorithms and the balance between real-time performance and accuracy of scene-dense depth computation algorithms in mobile augmented reality. The model-based methods have developed from constructing partial 3D models by segmenting object contours in video key frames or directly using modeling software to achieving dense reconstruction of indoor static scenes using depth images, as well as constructing approximate 3D models of outdoor scenes by incorporating geographic spatial information. Model-based methods already have a relative well-established processing flow, but further exploration is still needed on how to enhance real-time performance while ensuring tracking and occlusion accuracy. Unlike rigid objects, non-rigid objects are prone to irregular deformations during movement. Typical non-rigid objects in augmented reality are user"s hands or the bodies of other users. For non-rigid objects, there are related research on all three types virtual-real occlusion handling methods. Depth-based methods focus on the depth image restoration algorithms. These algorithms aim to repair depth image noise while ensuring precise alignment between depth and RGB image, especially in extreme scenarios such as when foreground and background have similar colors. Image analysis-based methods focus on foreground segmentation algorithms and occlusion relationship judgment means. Foreground segmentation algorithms have evolved from the early color models and background subtraction techniques to the deep learning-based segmentation networks. And the occlusion relationship judgment means have transitioned from user-specified to incorporating depth information to assist judgment. The key challenge in image analysis-based methods lies in overcoming the irregular deformations of non-rigid objects, obtaining accurate foreground segmentation masks and tracking continuously. Model-based methods initially used LeapMotion combined with customized hand parameters to fit hand model, but now using deep learning networks to reconstruct hand models has become mainstream. Model-based methods should improve both speed and accuracy of hand reconstruction. On the basis of summarizing the virtual-real occlusion handling methods for rigid and non-rigid objects, we also conduct a comparative analysis of existing methods from various perspectives including real-time performance, automation level, whether to support perspective or scene changes, and application scope. In addition, we summarize the specific workflows, difficulties and limitations of the three virtual-real occlusion handling methods. Finally, aiming at the problems existing in related research, we explore the challenges faced by current virtual-real occlusion technology and propose potential future research directions: 1) Occlusion handling for moving non-rigid objects. Obtaining accurate depth or 3D models of non-rigid objects is the key to solving this problem. It is necessary to further improve the accuracy and robustness of hand segmentation. Additionally, the use of simpler monocular depth estimation and rapid reconstruction of non-rigid objects other than user’s hands need to be further explored. 2) Occlusion handling for outdoor dynamic scenes. Existing depth cameras have limited working range, which makes them ineffective in outdoor scenes. Sparse 3D models obtained from geographic information systems have low precision and cannot be applied to dynamic objects such as automobiles. Therefore, further research on dynamic objects’ virtual-real occlusion handling in large outdoor scenes is needed. 3) Registration algorithms for depth and RGB images. It is of great importance to improve the accuracy of edge alignment between depth and color images without consuming too much computing resources.
Keywords

订阅号|日报