Current Issue Cover


摘 要
随着元宇宙概念的兴起,以6自由度(Six Degree of Freedom, 6DoF)视频为代表的新一代交互式媒体技术得到产业界和学术界的广泛关注。6DoF视频隶属于多媒体通信领域,通过计算重构的方式向用户提供包括视角、光照、焦距、视场范围等多个维度的媒体交互与内容变化,能使千里之外的用户有身临其境、千人千面之感,这与元宇宙所具有的感知、计算、重构、协同、交互等技术特征具有高度重合性。因此,6DoF视频所涵盖的技术体系可用作为实现元宇宙的替代技术框架。本年度进展报告提出了6DoF视频10个方面的40个科学问题,并将6DoF视频端到端技术链条归纳为生成、分发和呈现三个宏观阶段,随后围绕这三个技术阶段分别从内容采集与预处理、编码压缩与传输优化、交互与呈现等方面来阐述国内外研究进展。其中,在内容采集与预处理阶段,分别阐述了多视点联合采集、多视点与深度联合采集、深度图与点云预处理等三个方面;在视频压缩与传输阶段,分别阐述了多视点视频编码、多视点+深度视频编码、光场图像压缩、焦栈图像压缩、点云编码压缩、6DoF视频传输优化等六个方面;在交互与显示阶段,分别阐述了解码后滤波增强、虚拟视点合成等两个方面。最后,报告围绕该领域当下挑战及未来趋势开展讨论。
Research Progress of Six Degree of Freedom (6DoF) Video Technology

Wang Xu,Liu Qiong,Peng Zongju,Hou Junhui,Yuan Hui,Zhao Tiesong,Qin Yi,Wu Kejun,Liu Wenyu,Yang You(Huazhong University of Science and Technology)

Abstract: Six Degree of Freedom (6DoF) video is featured by interaction between video content and users, and the number of freedoms come from linear, horizontal straightness, vertical straightness, pitch, yaw, and roll motions of users. In this manner, users can change multiple audio-visual dimensions including viewing perspective, lighting condition or directions, focal length or spot, field of view and so on through computational content reconstruction or synthesize with the help of contents from real viewpoints. 6DoF video can completely change the traditional way that people watching video, in which the user-video interaction is limited to different channels but without any relations to the video contents. Hereafter, with the help of 6DoF, users can watch the video contents in the same way of their daily life, where what they can see will be changed as per their motion, and this is quite immersive for users. In this way, 6DoF video is an epoch-making type of video to both academia and industries. At the same time, with the flourishing of Metaverse, 6DoF video has also been regarded as a new generation of interactive media technology, which is one option of the fundamental technologies for Metaverse and has begun to receive extensive attentions from the community. All these features make users feel immersive from thousands of miles away while providing different viewing experiences to different people. This is highly coincident with the perception, computing, reconstruction, collaboration, interaction, and other technical features of the Metaverse. Basically, 6DoF video originated from the framework of typical multimedia communication system, where it satisfies the basic procedure of multimedia communication from video capture, content process, video compression, transmission, decode and display, to intelligent human-terminal interaction. It brings a new look to traditional 3D video communication system, and the requirements for interaction range and intelligence are much more comprehensive than ever before. Therefore, new techniques should be involved to support this new type of video. In this annual progress report, we reorganize the technical framework of 6DoF multimedia communication system by three parts, including generation, distribution, and visualization of 6DoF video content. We first post 40 scientific and technical challenges of this field and categorized them into 10 different directions. These challenges formulate the research framework of 6DoF video, while all of them are still open for the community. Based on these formulations, we deeply survey the research progress from the aspects of content acquisition and pre-processing, coding compression and transmission optimization, interaction, and presentation as per these 10 directions. In the part of content generation, we focus on the techniques of how to capture video content with multiple views, multiview video plus depth, and point cloud. These contents are basic data representations of 6DoF video. Systems for data capture can be categorized by 2 types, include multiview and multiview plus depth system, and different types of contents can thus be obtained via these systems. Originally, multiview color videos can be captured without any affiliated information to describe the 3D structure of the spot scene, and this is quite difficult for subsequent data processing techniques. After that, multiview plus depth system is proposed to handle this problem, while data can be upgraded into two types, including color plus depth and point cloud. No matter the data are captured by any type of system, data volume is the most important challenge for these kinds of data representation, and therefore, we focus on video compression techniques after we have the video contents. State-of-the-art compression techniques for multiview video, multiview video plus depth, light fields, and point clouds are discussed in details, including their origination, mechanism, performance, and application in domestic and international standards. Subsequently, transmission techniques for 6DoF video are also presented in this report after the video bitstream is obtained. Techniques such as bit allocation, interaction oriented transmission, standards and protocols are all mentioned and discussed in details. Finally, computational models are discussed for quality enhancement and novel view synthesize for user-terminal interaction. With all of the techniques discussed above, people can easily build up a comprehensive 6DoF video system from capture to display. Pixel based methods were discussed in former 15 years for virtual view synthesize, but computational cost is an inevitable challenge for the applications. On the other hand, learning based methods act important roles in recent years for terminal oriented applications, especially for view synthesize. However, challenges remain in dynamic scenarios. Therefore, none of the above way is able to completely satisfy all of the requirements from practical applications, and all the above 40 scientific and technical challenges still remain open for the community. Finally, we discuss the current challenges and future trends in this community, and a long exploration way is required to fulfill the final goal.