1. 浙江万里学院宁波市DSP重点实验室, 宁波 315100;
2. 宁波大学电路与系统研究所, 宁波 315211
 收稿日期: 2017-10-17; 修回日期: 2018-01-26 基金项目: 国家自然科学基金项目(61671412);浙江省自然科学基金项目(LY14F010006, LY14F040002);宁波市自然基金项目(2013A61006);宁波市科技惠民项目(2017C50011);河南省高校科技创新团队项目(18IRTSTHN016);国网宁波电力公司创新咨询项目 第一作者简介: 朱仲杰, 1976年生, 男, 教授, 2004年于浙江大学获电子科学与技术专业博士学位, 主要研究方向为2D与3D视频编码与传输。E-mail:zhongjiezhu@yeah.net;王玉儿, 女, 助理研究员, 主要研究方向为视频编码、信息隐藏。E-mail:365401628@qq.com;蒋刚毅, 男, 教授, 主要研究方向包括数字视频编码与通信、多视点视频信号处理、数字水印与信息隐藏等。E-mail:gangyijiang@126.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)07-0953-08

# 关键词

3D视频; 形状编码; 多模式编码; 预测编码; 链码

Multi-mode shape coding for 3D video
Zhu Zhongjie1, Wang Yuer1, Jiang Gangyi2
1. Ningbo Key Lab.of DSP, Zhejiang Wanli University, Ningbo 315100, China;
2. Institute of Technology, Ningbo University, Ningbo 315211, China
Supported by: National Natural Science Foundation of China(61671412)

# Abstract

Objective Three dimensional video has attracted considerable attention from the image processing community due to its satisfactory performance in various applications, including 3D television, free-view television, free-view video, and immersive teleconference.Compared with traditional block-based techniques, object-based methods have the merits of flexible interactivity and efficient resource usage and are thus favored in many practical applications.Hence, object-based 3D video, whose efficient shape coding is a key technique in practical applications, is an important developing trend.Shape coding has been considerably studied, and many methods have been proposed.However, most of the existing shape coding methods are mainly proposed for image or video shape coding and seldom for 3D video shape coding.A straightforward approach to encoding the shapes of 3D video objects is through the use of the same techniques for image or video objects.However, this method of coding does not completely exploit the redundancy among 3D shape videos, thus resulting in poor coding efficiency.Strong inter-frame redundancy exists across time and view directions in a 3D video sequence.Therefore, most of the existing 3D video coding schemes jointly adopt motion-compensated prediction (MCP) and disparity-compensated prediction (DCP) techniques to achieve high coding efficiency.3D video and 3D shape video share certain similarities; thus, correlations may also exist among object contours across time and view directions, which may be exploited in shape coding to improve coding efficiency.Hence, with this speculation and consideration of the requirements of practical object-based 3D video applications, an efficient multi-mode 3D video shape coding scheme is proposed in this study.This scheme is based on contour and chain representation, where the correlation among object contours across time and view directions is exploited to achieve high coding efficiency. Method For a given 3D shape image, the contours of visual objects are first extracted and preprocessed frame by frame to create perfect single-pixel width.That is, the object contours are 8-connected and only one path exists between any two neighboring contour points.A new metric called shape activity is then applied to assess the shape variation of objects within each frame.On the basis of this assessment, the entire frames are classified into two categories:intra-coding and predictive inter-coding frames.If the shape activity within a frame is large, then intra-coding will be implemented; otherwise, intra-coding will be conducted.For an intra-coding frame, it is encoded on the basis of linearity and direction constraints within chain links to achieve high coding efficiency.For an inter-coding frame, it is encoded using one of the three coding modes, namely, contour-based MCP, DCP, or joint MCP and DCP, to efficiently remove the intra-view temporal correlation and the inter-view spatial correlation among object contours and improve coding efficiency.The principles of MCP and DCP for 3D shape video are similar to those for 3D video.However, the correlation among object contours is dissimilar to that between video textures.In 3D video, the textures are two dimensional, whereas object contour is one dimensional.Video textures can generally be viewed as rigid, whereas the shape of an object contour often changes non-regularly.Usually, a small variation of object contour in a frame may considerably decrease the correlation between consecutive frames.In addition, correlations may decrease more quickly than textures with an increase in time interval.Hence, conventional prediction techniques for 3D video are unsuitable for 3D shape video.In our coding scheme, a new prediction structure is developed to effectively exploit the intra-view temporal correlation and the inter-view spatial correlation among object contours to efficiently encode 3D shape video. Result Experiments are conducted to evaluate the performance of the proposed scheme, and results of partial comparison with several well-known methods are presented.The experimental results show that our scheme outperforms classic and state-of-the-art methods and the average compression efficiency can be improved by 9.3% to 64.8%. Conclusion The proposed scheme can effectively remove the intra-view temporal correlation and the inter-view spatial correlation among object contours.This scheme can also achieve high coding efficiency that outperforms those of existing methods and has potential in many object-based image and video applications, such as object-based coding, editing, and retrieval.

# Key words

three dimensional video; shape coding; multi-mode coding; predictive coding; chain code

# 0 引言

3D视频是一种能够提供多个视点信息和实现立体感知的新型视频。近年来随着3D技术的逐渐成熟和3D视频产业的迅猛崛起, 使得具有立体感和高端真实感的3D视频正发展成为大众化的视觉体验需求, 越来越受到学术界和产业界的关注和重视, 未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景[1]。同时, 对象基处理技术具有更好的语义理解、表达和交互性, 在图像和和视频领域的应用越来越广泛, 如对象基编码、对象基检索、对象基内容分析与理解等[2-3]。因此, 对象基3D视频技术是未来3D视频技术的重要发展趋势。在对象基3D视频应用中, 由于形状是进行视觉对象定义、表示与处理的关键信息, 因此高效形状编码是对象基3D视频应用中的一个核心和关键问题。

# 1 轮廓提取与形状活动性分析

 $s{a_j} = \frac{{\left\| {{\mathit{\boldsymbol{Z}}_j}-{\mathit{\boldsymbol{Z}}_{j-1}}} \right\|}}{{\left\| {{\mathit{\boldsymbol{Z}}_j}} \right\|}}$ (1)

 ${\mathit{\boldsymbol{Z}}_j} = \left( {\frac{1}{{{\mathit{M}^j}}}\sum\limits_{i = 0}^{{\mathit{M}^j}-1} {\mathit{N}_i^j}, {\mathit{M}^j}} \right)$ (2)

 ${\mathit{\boldsymbol{Z}}_{j-1}} = \left( {\frac{1}{{{\mathit{M}^{j-1}}}}\sum\limits_{i = 0}^{{\mathit{M}^{j-1}} - 1} {\mathit{N}_i^{j - 1}}, {\mathit{M}^{j - 1}}} \right)$ (3)

# 2 轮廓基预测与补偿

1) 全局匹配。全局匹配的目的是在参考帧${f^r}\left( {x, y} \right)$中寻找与${\mathit{\boldsymbol{C}}_i}$最相似的一个字段$\mathit{\boldsymbol{C}}_p^r$作为参考字段。设$\{ \mathit{\boldsymbol{C}}_k^r\} \;(k = 0, \ldots, {M^r}-1)$表示${f^r}\left( {x, y} \right)$中的所有轮廓子段集合, ${M^r}$表示子段的数量, 则对于给定子段${\mathit{\boldsymbol{C}}_i}$, 其参考匹配子段为

 $\mathit{\boldsymbol{C}}_p^r = \mathop {{\rm{arg}}\;{\rm{min}}}\limits_{\mathit{\boldsymbol{C}}_k^r, k \in [0, {\mathit{\boldsymbol{M}}^r}]} {\rm{}}\mathit{\boldsymbol{J}}(\mathit{\boldsymbol{C}}_k^r, {\mathit{\boldsymbol{C}}_i})$ (4)

# 5 结论

3D视频是一种能够提供多个视点信息和实现立体感知的新型视频, 未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基处理技术具有更好的语义理解、表达和交互性, 因此, 对象基3D视频技术是未来3D视频技术的重要发展趋势。为此, 本文在前期图像和视频形状编码研究工作的基础上, 研究了基于轮廓和链码表示的高效3D视频形状编码方法, 提出了基于轮廓的运动补偿预测(CB-MCP)和视差补偿预测技术(CB-DCP), 可以充分利用3D对象形状轮廓的视点间和视点内的时域与空域相关性从而进行高效压缩。实验结果显示其编码效率优于现有的同类方法。所提算法能广泛应用于对象基编码、对象基检索以及对象交互等图像和视频应用领域。另外, 提出的预测结构较为简单, 预测性能存在提升空间。为此, 下一步拟对此进行重点研究, 改进和优化预测结构和预测方法以进一步提高编码效率。

