Current Issue Cover

吴宇晖, 李晓娟, 刘越(北京市混合现实与新型显示工程技术研究中心, 北京理工大学光电学院)

摘 要
A survey of virtual-real occlusion handling technologies in augmented reality

Wu Yuhui, Li Xiaojuan, Liu Yue(Beijing Engineering Research Center of Mixed Reality and Advanced Display,School of Optics and Photonics,Beijing Institute of Technology)

With the rapid development of software technology and the continuous updating of hardware devices, augmented reality technology has gradually matured and been widely used in various fields such as military, medical, gaming, industry, and education. Accurate depth perception is crucial in augmented reality, and simply overlaying virtual objects onto video sequences no longer meets user demands. In many augmented reality scenarios, users need to constantly interact with virtual objects, and without accurate depth perception, it is difficult for augmented reality to provide a seamless interactive experience. Virtual-real occlusion handling is one of the key factors to achieve this goal. It presents a realistic virtual-real fusion effect by establishing accurate occlusion relationship, so that the fusion scene can correctly reflect the spatial position relationship between virtual and real objects, thereby enhancing the user"s sense of immersion and realism. This paper firstly introduces the related background, concepts and overall processing flow of virtual-real occlusion handling. Existing occlusion handling methods can be divided into three categories: depth-based, image analysis-based and model-based. By analyzing the distinct characteristics of rigid and non-rigid objects, we summarize the specific principles, representative research works and the applicability to rigid and non-rigid objects of these three virtual-real occlusion handling methods. The shape and size of rigid objects remain unchanged after motion or force, and they mainly use two types virtual-real occlusion handling methods: depth-based and model-based. The depth-based methods have evolved from the early use of stereo vision algorithms to the use of depth sensors for indoor depth image acquisition, and further to the prediction of moving objects’ depth by using outdoor map data, as well as the densification of sparse SLAM(simultaneous localization and mapping) depth in monocular mobile augmented reality. Further research should focus on the depth image restoration algorithms and the balance between real-time performance and accuracy of scene-dense depth computation algorithms in mobile augmented reality. The model-based methods have developed from constructing partial 3D models by segmenting object contours in video key frames or directly using modeling software to achieving dense reconstruction of indoor static scenes using depth images, as well as constructing approximate 3D models of outdoor scenes by incorporating geographic spatial information. Model-based methods already have a relative well-established processing flow, but further exploration is still needed on how to enhance real-time performance while ensuring tracking and occlusion accuracy. Unlike rigid objects, non-rigid objects are prone to irregular deformations during movement. Typical non-rigid objects in augmented reality are user"s hands or the bodies of other users. For non-rigid objects, there are related research on all three types virtual-real occlusion handling methods. Depth-based methods focus on the depth image restoration algorithms. These algorithms aim to repair depth image noise while ensuring precise alignment between depth and RGB image, especially in extreme scenarios such as when foreground and background have similar colors. Image analysis-based methods focus on foreground segmentation algorithms and occlusion relationship judgment means. Foreground segmentation algorithms have evolved from the early color models and background subtraction techniques to the deep learning-based segmentation networks. And the occlusion relationship judgment means have transitioned from user-specified to incorporating depth information to assist judgment. The key challenge in image analysis-based methods lies in overcoming the irregular deformations of non-rigid objects, obtaining accurate foreground segmentation masks and tracking continuously. Model-based methods initially used LeapMotion combined with customized hand parameters to fit hand model, but now using deep learning networks to reconstruct hand models has become mainstream. Model-based methods should improve both speed and accuracy of hand reconstruction. On the basis of summarizing the virtual-real occlusion handling methods for rigid and non-rigid objects, we also conduct a comparative analysis of existing methods from various perspectives including real-time performance, automation level, whether to support perspective or scene changes, and application scope. In addition, we summarize the specific workflows, difficulties and limitations of the three virtual-real occlusion handling methods. Finally, aiming at the problems existing in related research, we explore the challenges faced by current virtual-real occlusion technology and propose potential future research directions: 1) Occlusion handling for moving non-rigid objects. Obtaining accurate depth or 3D models of non-rigid objects is the key to solving this problem. It is necessary to further improve the accuracy and robustness of hand segmentation. Additionally, the use of simpler monocular depth estimation and rapid reconstruction of non-rigid objects other than user’s hands need to be further explored. 2) Occlusion handling for outdoor dynamic scenes. Existing depth cameras have limited working range, which makes them ineffective in outdoor scenes. Sparse 3D models obtained from geographic information systems have low precision and cannot be applied to dynamic objects such as automobiles. Therefore, further research on dynamic objects’ virtual-real occlusion handling in large outdoor scenes is needed. 3) Registration algorithms for depth and RGB images. It is of great importance to improve the accuracy of edge alignment between depth and color images without consuming too much computing resources.