Current Issue Cover
多模式3维视频形状编码

朱仲杰1, 王玉儿1, 蒋刚毅2(1.浙江万里学院宁波市DSP重点实验室, 宁波 315100;2.宁波大学电路与系统研究所, 宁波 315211)

摘 要
目的 具有立体感和高端真实感的3D视频正越来越受到学术界和产业界的关注和重视,未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基3D视频是未来3D视频技术的重要发展趋势,其中高效形状编码是对象基3D视频应用中的关键问题。但现有形状编码方法主要针对图像和视频对象,面向3D视频的形状编码算法还很少。为此,基于对象基3D视频的应用需求,提出一种基于轮廓和链码表示的高效多模式3D视频形状编码方法。方法 对于给定的3D视频形状序列逐帧进行对象轮廓提取并预处理后,进行对象轮廓活动性分析,将形状图像分成帧内模式编码图像和帧间预测模式编码图像。对于帧内编码图像,基于轮廓内链码方向约束和线性特征进行高效编码。对于帧间编码图像,采用基于链码表示的轮廓基运动补偿预测、视差补偿预测、联合运动与视差补偿预测等多种模式进行编码,以充分利用视点内对象轮廓的帧间时域相关性和视点间对象轮廓的空域相关性,从而达到高效编码的目的。结果 实验仿真结果显示所提算法性能优于经典和现有的最新同类方法,压缩效率平均能提高9.3%到64.8%不等。结论 提出的多模式3D视频形状编码方法可以有效去除对象轮廓的帧间和视点间冗余,能够进行高效编码压缩,性能优于现有同类方法,可广泛应用于对象基编码、对象基检索、对象基内容分析与理解等。
关键词
Multi-mode shape coding for 3D video

Zhu Zhongjie1, Wang Yuer1, Jiang Gangyi2(1.Ningbo Key Lab. of DSP, Zhejiang Wanli University, Ningbo 315100, China;2.Institute of Technology, Ningbo University, Ningbo 315211, China)

Abstract
Objective Three dimensional video has attracted considerable attention from the image processing community due to its satisfactory performance in various applications, including 3D television, free-view television, free-view video, and immersive teleconference. Compared with traditional block-based techniques, object-based methods have the merits of flexible interactivity and efficient resource usage and are thus favored in many practical applications. Hence, object-based 3D video, whose efficient shape coding is a key technique in practical applications, is an important developing trend. Shape coding has been considerably studied, and many methods have been proposed. However, most of the existing shape coding methods are mainly proposed for image or video shape coding and seldom for 3D video shape coding. A straightforward approach to encoding the shapes of 3D video objects is through the use of the same techniques for image or video objects. However, this method of coding does not completely exploit the redundancy among 3D shape videos, thus resulting in poor coding efficiency. Strong inter-frame redundancy exists across time and view directions in a 3D video sequence. Therefore, most of the existing 3D video coding schemes jointly adopt motion-compensated prediction (MCP) and disparity-compensated prediction (DCP) techniques to achieve high coding efficiency. 3D video and 3D shape video share certain similarities; thus, correlations may also exist among object contours across time and view directions, which may be exploited in shape coding to improve coding efficiency. Hence, with this speculation and consideration of the requirements of practical object-based 3D video applications, an efficient multi-mode 3D video shape coding scheme is proposed in this study. This scheme is based on contour and chain representation, where the correlation among object contours across time and view directions is exploited to achieve high coding efficiency.Method For a given 3D shape image, the contours of visual objects are first extracted and preprocessed frame by frame to create perfect single-pixel width. That is, the object contours are 8-connected and only one path exists between any two neighboring contour points. A new metric called shape activity is then applied to assess the shape variation of objects within each frame. On the basis of this assessment, the entire frames are classified into two categories:intra-coding and predictive inter-coding frames. If the shape activity within a frame is large, then intra-coding will be implemented; otherwise, intra-coding will be conducted. For an intra-coding frame, it is encoded on the basis of linearity and direction constraints within chain links to achieve high coding efficiency. For an inter-coding frame, it is encoded using one of the three coding modes, namely, contour-based MCP, DCP, or joint MCP and DCP, to efficiently remove the intra-view temporal correlation and the inter-view spatial correlation among object contours and improve coding efficiency. The principles of MCP and DCP for 3D shape video are similar to those for 3D video. However, the correlation among object contours is dissimilar to that between video textures. In 3D video, the textures are two dimensional, whereas object contour is one dimensional. Video textures can generally be viewed as rigid, whereas the shape of an object contour often changes non-regularly. Usually, a small variation of object contour in a frame may considerably decrease the correlation between consecutive frames. In addition, correlations may decrease more quickly than textures with an increase in time interval. Hence, conventional prediction techniques for 3D video are unsuitable for 3D shape video. In our coding scheme, a new prediction structure is developed to effectively exploit the intra-view temporal correlation and the inter-view spatial correlation among object contours to efficiently encode 3D shape video.Result Experiments are conducted to evaluate the performance of the proposed scheme, and results of partial comparison with several well-known methods are presented. The experimental results show that our scheme outperforms classic and state-of-the-art methods and the average compression efficiency can be improved by 9.3% to 64.8%.Conclusion The proposed scheme can effectively remove the intra-view temporal correlation and the inter-view spatial correlation among object contours. This scheme can also achieve high coding efficiency that outperforms those of existing methods and has potential in many object-based image and video applications, such as object-based coding, editing, and retrieval.
Keywords

订阅号|日报