发布时间: 2018-07-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.170533
2018 | Volume 23 | Number 7

图像处理和编码

多模式3维视频形状编码

朱仲杰¹, 王玉儿¹, 蒋刚毅²

1. 浙江万里学院宁波市DSP重点实验室, 宁波 315100;

2. 宁波大学电路与系统研究所, 宁波 315211

收稿日期: 2017-10-17; 修回日期: 2018-01-26

基金项目: 国家自然科学基金项目(61671412);浙江省自然科学基金项目(LY14F010006, LY14F040002);宁波市自然基金项目(2013A61006);宁波市科技惠民项目(2017C50011);河南省高校科技创新团队项目(18IRTSTHN016);国网宁波电力公司创新咨询项目

第一作者简介: 朱仲杰, 1976年生, 男, 教授, 2004年于浙江大学获电子科学与技术专业博士学位, 主要研究方向为2D与3D视频编码与传输。E-mail:zhongjiezhu@yeah.net;
王玉儿, 女, 助理研究员, 主要研究方向为视频编码、信息隐藏。E-mail:365401628@qq.com;
蒋刚毅, 男, 教授, 主要研究方向包括数字视频编码与通信、多视点视频信号处理、数字水印与信息隐藏等。E-mail:gangyijiang@126.com.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2018)07-0953-08

摘要

目的具有立体感和高端真实感的3D视频正越来越受到学术界和产业界的关注和重视, 未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基3D视频是未来3D视频技术的重要发展趋势, 其中高效形状编码是对象基3D视频应用中的关键问题。但现有形状编码方法主要针对图像和视频对象, 面向3D视频的形状编码算法还很少。为此, 基于对象基3D视频的应用需求, 提出一种基于轮廓和链码表示的高效多模式3D视频形状编码方法。方法对于给定的3D视频形状序列逐帧进行对象轮廓提取并预处理后, 进行对象轮廓活动性分析, 将形状图像分成帧内模式编码图像和帧间预测模式编码图像。对于帧内编码图像, 基于轮廓内链码方向约束和线性特征进行高效编码。对于帧间编码图像, 采用基于链码表示的轮廓基运动补偿预测、视差补偿预测、联合运动与视差补偿预测等多种模式进行编码, 以充分利用视点内对象轮廓的帧间时域相关性和视点间对象轮廓的空域相关性, 从而达到高效编码的目的。结果实验仿真结果显示所提算法性能优于经典和现有的最新同类方法, 压缩效率平均能提高9.3%到64.8%不等。结论提出的多模式3D视频形状编码方法可以有效去除对象轮廓的帧间和视点间冗余, 能够进行高效编码压缩, 性能优于现有同类方法, 可广泛应用于对象基编码、对象基检索、对象基内容分析与理解等。

关键词

3D视频; 形状编码; 多模式编码; 预测编码; 链码

Multi-mode shape coding for 3D video

Zhu Zhongjie¹, Wang Yuer¹, Jiang Gangyi²

1. Ningbo Key Lab.of DSP, Zhejiang Wanli University, Ningbo 315100, China;

2. Institute of Technology, Ningbo University, Ningbo 315211, China

Supported by: National Natural Science Foundation of China(61671412)

Abstract

Objective Three dimensional video has attracted considerable attention from the image processing community due to its satisfactory performance in various applications, including 3D television, free-view television, free-view video, and immersive teleconference.Compared with traditional block-based techniques, object-based methods have the merits of flexible interactivity and efficient resource usage and are thus favored in many practical applications.Hence, object-based 3D video, whose efficient shape coding is a key technique in practical applications, is an important developing trend.Shape coding has been considerably studied, and many methods have been proposed.However, most of the existing shape coding methods are mainly proposed for image or video shape coding and seldom for 3D video shape coding.A straightforward approach to encoding the shapes of 3D video objects is through the use of the same techniques for image or video objects.However, this method of coding does not completely exploit the redundancy among 3D shape videos, thus resulting in poor coding efficiency.Strong inter-frame redundancy exists across time and view directions in a 3D video sequence.Therefore, most of the existing 3D video coding schemes jointly adopt motion-compensated prediction (MCP) and disparity-compensated prediction (DCP) techniques to achieve high coding efficiency.3D video and 3D shape video share certain similarities; thus, correlations may also exist among object contours across time and view directions, which may be exploited in shape coding to improve coding efficiency.Hence, with this speculation and consideration of the requirements of practical object-based 3D video applications, an efficient multi-mode 3D video shape coding scheme is proposed in this study.This scheme is based on contour and chain representation, where the correlation among object contours across time and view directions is exploited to achieve high coding efficiency. Method For a given 3D shape image, the contours of visual objects are first extracted and preprocessed frame by frame to create perfect single-pixel width.That is, the object contours are 8-connected and only one path exists between any two neighboring contour points.A new metric called shape activity is then applied to assess the shape variation of objects within each frame.On the basis of this assessment, the entire frames are classified into two categories:intra-coding and predictive inter-coding frames.If the shape activity within a frame is large, then intra-coding will be implemented; otherwise, intra-coding will be conducted.For an intra-coding frame, it is encoded on the basis of linearity and direction constraints within chain links to achieve high coding efficiency.For an inter-coding frame, it is encoded using one of the three coding modes, namely, contour-based MCP, DCP, or joint MCP and DCP, to efficiently remove the intra-view temporal correlation and the inter-view spatial correlation among object contours and improve coding efficiency.The principles of MCP and DCP for 3D shape video are similar to those for 3D video.However, the correlation among object contours is dissimilar to that between video textures.In 3D video, the textures are two dimensional, whereas object contour is one dimensional.Video textures can generally be viewed as rigid, whereas the shape of an object contour often changes non-regularly.Usually, a small variation of object contour in a frame may considerably decrease the correlation between consecutive frames.In addition, correlations may decrease more quickly than textures with an increase in time interval.Hence, conventional prediction techniques for 3D video are unsuitable for 3D shape video.In our coding scheme, a new prediction structure is developed to effectively exploit the intra-view temporal correlation and the inter-view spatial correlation among object contours to efficiently encode 3D shape video. Result Experiments are conducted to evaluate the performance of the proposed scheme, and results of partial comparison with several well-known methods are presented.The experimental results show that our scheme outperforms classic and state-of-the-art methods and the average compression efficiency can be improved by 9.3% to 64.8%. Conclusion The proposed scheme can effectively remove the intra-view temporal correlation and the inter-view spatial correlation among object contours.This scheme can also achieve high coding efficiency that outperforms those of existing methods and has potential in many object-based image and video applications, such as object-based coding, editing, and retrieval.

Key words

three dimensional video; shape coding; multi-mode coding; predictive coding; chain code

0 引言

3D视频是一种能够提供多个视点信息和实现立体感知的新型视频。近年来随着3D技术的逐渐成熟和3D视频产业的迅猛崛起, 使得具有立体感和高端真实感的3D视频正发展成为大众化的视觉体验需求, 越来越受到学术界和产业界的关注和重视, 未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景^[1]。同时, 对象基处理技术具有更好的语义理解、表达和交互性, 在图像和和视频领域的应用越来越广泛, 如对象基编码、对象基检索、对象基内容分析与理解等^[2-3]。因此, 对象基3D视频技术是未来3D视频技术的重要发展趋势。在对象基3D视频应用中, 由于形状是进行视觉对象定义、表示与处理的关键信息, 因此高效形状编码是对象基3D视频应用中的一个核心和关键问题。

形状编码根据原理可以分为基于位图和基于轮廓的编码方法。基于位图的方法一般是将对象形状用一个二值掩模图像表示, 对形状编码等效于对二值掩模图像进行编码。此类方法的典型代表有JBIG(bi-level image experts group), JBIG2 (bi-level image experts group, version 2), MPEG-4 (moving picture experts group 4)基于上下文的算术编码(MPEG-4 CAE)等^[4-6]。不同于基于位图的方法, 基于轮廓的形状编码首先提取对象轮廓, 然后对轮廓曲线进行编码, 包括基于链码和基于曲线拟合的方法等。近年来, 有很多学者从事形状编码方面的研究并取得了创新性研究成果。例如, 文献[7]提出了一种基于算术编码的二值图像形状编码方法。它利用目标轮廓中存在的局部线性边缘来增强算术编码上下文建模的准确性, 从而提高编码效率。文献[8]提出了一种四分树结构的基于上下文算术编码的形状编码方案。Lai等人研究了率失真优化形状编码中的边缘选择与优化编码问题, 提出了基于8分区和16分区的边缘编码方案, 可以降低待编码定点数从而提高编码效率^[9]。文献[10]提出了一种基于图像相关性的高效形状编码方法, 利用图像内容和对象形状之间的相关性来提高编码效率。在文献[11]中, 通过分析对象轮廓链码的空域相关性和线性特征, 前期提出了基于轮廓和链码表示的高效形状编码方法。对于给定的形状图像, 提取对象轮廓和细化成严格单像素宽度后将其转换成链码表示, 并基于方向相关性将其分割成若干子段, 使得每个子段最多包含2个基本方向码, 在编码时每个链路只需要1个比特表示。同时结合线性检测, 分离出对象轮廓中的长线性子段采用行程编码进行高效压缩。实验结果显示, 与其他同类方法相比能大幅提高压缩效率。在此基础上, 进一步研究提出了结合空时预测的形状编码方案, 通过同时利用轮廓链码的帧内空域相关性和帧间时域相关性进一步提高了编码效率^[12]。

目前的形状编码主要针对图像和视频对象, 面向3D视频的形状编码算法还很少。与图像和视频对象相比, 3D视频对象除了存在帧内轮廓的空域相关性和帧间轮廓的时域相关性外, 还存在视域轮廓间的空域相关性。因此, 在进行3D视频形状编码时可以综合利用这些相关性以提高编码效率。为此, 本文在前期图像和视频形状编码研究工作的基础上, 研究并提出了一种基于轮廓和链码表示的高效多模式3D视频形状编码算法。重点研究了轮廓基的运动补偿预测(CB-MCP)和轮廓基的视差补偿预测技术(CB-DCP)以充分利用3D对象形状轮廓的视点间和视点内的时域与空域相关性从而进行高效压缩。图 1给出了所提算法的总体框图。对于给定的3D视频形状序列, 首先逐帧提取对象轮廓并进行轮廓细化、轮廓段分割、链码转换与表示等预处理。然后对每帧的对象轮廓进行活动性分析, 如果某帧的轮廓与其前帧相比变化较大则接下来对该帧采用帧内模式进行编码, 否则对其采用帧间预测模式进行编码。对帧内编码图像接下来对每个轮廓段基于方向相关性将其分割成若干链码子段, 使得每个子段最多包含2个基本方向码, 在编码时每个链路只需要1个比特表示。同时对每个子段进行直线段检测分离出长直线子段, 对普通链码子段和含直线链码子段进行差异化熵编码以提高编码效率。对于帧间预测编码图像, 首先采用CB-MCP、CB-DCP或者联合CB-MCP与CB-DCP等多种模式进行补偿与预测充分去除对象轮廓的帧间相关性, 从而达到高效编码的目的提高编码效率。

图 1 算法整体框图

Fig. 1 Diagram of the proposed scheme

1 轮廓提取与形状活动性分析

由于所提算法是基于轮廓的编码方法, 为此首先逐帧提取对象轮廓并将其细化成严格单像素宽度, 然后基于交叉点将每帧内对象轮廓分成若干子段并转换成链码表示, 具体采用文献[11]类似的方法。对于给定的某帧形状图像${f_j}\left( {x, y} \right)$, 在完成轮廓提起、细化、子段分割和链码转后, 接下来对其进行轮廓活动性分析, 以决定后续对其是进行帧内编码还是帧间预测编码。设$\{ \mathit{\boldsymbol{C}}_i^j\} (i = 0, \ldots, {M^j}-1)$表示${f_j}\left( {x, y} \right)$中的子段集合, ${M^j}$表示子段数量。其中子段$\mathit{\boldsymbol{C}}_i^j$所包含的链路集合用$ \{ {l_{in}}\} \;(n = 0, \ldots, N_i^j-1)$表示。即$\mathit{\boldsymbol{C}}_i^j = \{ {l_{in}}\} $, 其中${l_{in}}$表示第$\mathit{n}$个链路, $N_i^j$表示$\mathit{\boldsymbol{C}}_i^j$中的链路总数。令$s{a_j}$表示${f_j}\left( {x, y} \right)$的轮廓活动性, 定义为

$ s{a_j} = \frac{{\left\| {{\mathit{\boldsymbol{Z}}_j}-{\mathit{\boldsymbol{Z}}_{j-1}}} \right\|}}{{\left\| {{\mathit{\boldsymbol{Z}}_j}} \right\|}} $

(1)

$ {\mathit{\boldsymbol{Z}}_j} = \left( {\frac{1}{{{\mathit{M}^j}}}\sum\limits_{i = 0}^{{\mathit{M}^j}-1} {\mathit{N}_i^j}, {\mathit{M}^j}} \right) $

(2)

$ {\mathit{\boldsymbol{Z}}_{j-1}} = \left( {\frac{1}{{{\mathit{M}^{j-1}}}}\sum\limits_{i = 0}^{{\mathit{M}^{j-1}} - 1} {\mathit{N}_i^{j - 1}}, {\mathit{M}^{j - 1}}} \right) $

(3)

式中, ${M^{j-1}}$表示${f_j}\left( {x, y} \right)$同视点内前帧图像${f_{j-1}}\left( {x, y} \right)$中的子段数量。${\mathit{N}_i^{j-1}}$表示${f_{j-1}}\left( {x, y} \right)$中子段$\mathit{\boldsymbol{C}}_i^{j-1}$的链路数量。在所提3D视频编码中, 如果$s{a_j}$大于事先给定的某个阈值$ T$, 则对${f_j}\left( {x, y} \right)$进行帧内编码, 否则对其进行帧间预测编码。本文实验中$T$设为0.5。

2 轮廓基预测与补偿

本文采用的预测结构如图 2所示。对于给定的具有多个视点的3D形状视频, 首先选取其中的若干视点作为主视点, 其他视点作为辅视点。对主视点采用CB-MCP进行编码, 对辅视点采用CB-MCP与CB-DCP进行联合编码。在具体编码时采用和多视点视频类似的方法, 先将多视点形状视频分割成若干个MOP, 编码时候逐个MOP进行。

图 2 基于轮廓和链码表示的帧间预测结构

Fig. 2 Inter-frame prediction structure based on contour and chain representation

对于MOP中的某帧$f\left( {x, y} \right)$, 经轮廓活动性分析后如果决定对其进行帧间预测编码, 则接下来对进行基于轮廓和链码表示的运动估计与补偿预测或视差估计与补偿预测。运动估计与补偿预测和视差估计与补偿预测的原理类似, 差别仅在于参考帧的选取不同。设${f^r}\left( {x, y} \right)$表示$f\left( {x, y} \right)$的参考图像, 对$f\left( {x, y} \right)$中的某个子段${\mathit{\boldsymbol{C}}_i} = \{ {l_{in}}\} \;\;(n = 0, \ldots, {N_i}-1)$, 其运动/视差估计与补偿预测主要分为全局匹配和局部匹配两个关键步骤。

1) 全局匹配。全局匹配的目的是在参考帧${f^r}\left( {x, y} \right)$中寻找与${\mathit{\boldsymbol{C}}_i}$最相似的一个字段$\mathit{\boldsymbol{C}}_p^r$作为参考字段。设$\{ \mathit{\boldsymbol{C}}_k^r\} \;(k = 0, \ldots, {M^r}-1)$表示${f^r}\left( {x, y} \right)$中的所有轮廓子段集合, ${M^r}$表示子段的数量, 则对于给定子段${\mathit{\boldsymbol{C}}_i}$, 其参考匹配子段为

$ \mathit{\boldsymbol{C}}_p^r = \mathop {{\rm{arg}}\;{\rm{min}}}\limits_{\mathit{\boldsymbol{C}}_k^r, k \in [0, {\mathit{\boldsymbol{M}}^r}]} {\rm{}}\mathit{\boldsymbol{J}}(\mathit{\boldsymbol{C}}_k^r, {\mathit{\boldsymbol{C}}_i}) $

(4)

式中, $\mathit{\boldsymbol{J}}\left( {\mathit{\boldsymbol{C}}_k^r, {\mathit{\boldsymbol{C}}_i}} \right)$是衡量${\mathit{\boldsymbol{C}}_k^r}$和${{\mathit{\boldsymbol{C}}_i}}$相似性的目标函数, 定义为

$ \begin{array}{l} \mathit{\boldsymbol{J}}\left( {\mathit{\boldsymbol{C}}_k^r, {\mathit{\boldsymbol{C}}_i}} \right) = \left| {N_k^r-{N_i}} \right| + \alpha \sum\limits_{l = 0}^7 {\left| {P\left( {l|\mathit{\boldsymbol{C}}_k^r} \right)-P\left( {l|{\mathit{\boldsymbol{C}}_i}} \right)} \right|} + \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\beta \cdot(\left\| {{\mathit{\boldsymbol{X}}_S}(\mathit{\boldsymbol{C}}_k^r)-{\mathit{\boldsymbol{X}}_S}({\mathit{\boldsymbol{C}}_i})} \right\| + \\ {\rm{ }}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\left\| {{\mathit{\boldsymbol{X}}_E}(\mathit{\boldsymbol{C}}_k^r) - {\mathit{\boldsymbol{X}}_E}({\mathit{\boldsymbol{C}}_i})} \right\|) \end{array} $

(5)

式中, ${N_k^r}$和${{N_i}}$分别表示${\mathit{\boldsymbol{C}}_k^r}$和${{\mathit{\boldsymbol{C}}_i}}$的长度, $\alpha $和$\beta $是加权系数, ${\mathit{\boldsymbol{X}}_S}\left( {} \right)$和${\mathit{\boldsymbol{X}}_E}\left( {} \right)$分别表示起始端点和终点的坐标矢量, ${P\left( {l|\mathit{\boldsymbol{C}}_k^r} \right)}$和${P\left( {l|{\mathit{\boldsymbol{C}}_i}} \right)}$分别表示${\mathit{\boldsymbol{C}}_k^r}$和${{\mathit{\boldsymbol{C}}_i}}$关于链路$l$的直方图$l \in \left\{ {0, 1, \ldots, 7} \right\}$, 定义为

$ P(l|\mathit{\boldsymbol{C}}_k^r) = \frac{{\sum\limits_{n = 0}^{N_k^r-1} {\delta ({l_{kn}}^r-l)} }}{{N_k^r}}, l \in \left\{ {0, 1, \ldots, 7} \right\} $

(6)

$ P(l|{\mathit{\boldsymbol{C}}_i}) = \frac{{\sum\limits_{n = 0}^{{N_i}-1} {\delta ({l_{in}}-l)} }}{{{N_i}}}, l \in \left\{ {0, 1, \ldots, 7} \right\} $

(7)

式中, $\{ {l_{kn}}^r\} (n = 0, \ldots, N_k^r-1)$表示$\mathit{\boldsymbol{C}}_k^r$中所有链路集合, $\delta \left( n \right)$是单位脉冲函数。

2) 局部匹配。在局部匹配阶段, 首先将${\mathit{\boldsymbol{C}}_i}$分割成若干个匹配单元$\{ p{u_j}\} (j = 0, \ldots, {N^u}-1)$, ${{u_l}}$表示匹配单元长度, ${N^u}$表示${\mathit{\boldsymbol{C}}_i}$中匹配单元的数量, 即

$ {N^u} = \left\lceil {{N_i}/{u_l}} \right\rceil $

(8)

式中, $\left\lceil {{N_i}/{u_l}} \right\rceil $表示不小于${{N_i}/{u_l}}$的最小整数。对${\mathit{\boldsymbol{C}}_i}$中的每个预测单元$p{u_j}$接下来以$\mathit{\boldsymbol{C}}_p^r$参考进行1维运动/视差估计以找到与$p{u_j}$最相似的一个链路子段, 并记录运动/视差位移量$s{d_j}$, 即

$ s{d_j} = \mathop {{\rm{argmax}}}\limits_{p \in \left[{-a, a} \right]} \sum\limits_{k = -\frac{{{u_l} -1}}{2}}^{\frac{{{u_l} -1}}{2}} {{e_{jk}}\left( p \right)}, 0 \le j \le {N^u} - 1 $

(9)

$ \begin{array}{l} {e_{jk}}\left( p \right) = \\ \left\{ {\begin{array}{*{20}{c}} 1&{{l_{i\left( {j\cdot{u_l} + \frac{{{u_l}-1}}{2} + k} \right)}} = l_{i\left( {_{j\cdot\left\lceil {N_i^r/{N^u}} \right\rceil + \left\lceil {\frac{1}{2}\cdot\left\lceil {N_i^r/{N^u}} \right\rceil } \right\rceil + k + p}\;\;\;} \right)}^r}\\ 0&{{l_{i\left( {j\cdot{u_l} + \frac{{{u_l}-1}}{2} + k} \right)}} \ne l_{i\left( {_{j\cdot\left\lceil {N_i^r/{N^u}} \right\rceil + \left\lceil {\frac{1}{2}\cdot\left\lceil {N_i^r/{N^u}} \right\rceil } \right\rceil + k + p}\;\;\;} \right)}^r} \end{array}} \right. \end{array} $

(10)

式中, $\left[{-a, a} \right]$表示搜索范围。

在完成${\mathit{\boldsymbol{C}}_i}$中的所有子段的运动/视差估计后, 可以得到${\mathit{\boldsymbol{C}}_i}$经运动/视差补偿后的预测误差

$ \mathit{\boldsymbol{C}}_i^e = \{ l_{in}^e\}, \;\;0 \le n \le {N_i}-1 $

(11)

$ \begin{array}{l} l_{in}^e = l_{i\left( {j\cdot{u_l} + \frac{{{u_l}-1}}{2} + k} \right)}^e = \\ \left\{ \begin{array}{l} 0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{e_{jk}}(s{d_j}) = 1\\ {l_{i\left( {j\cdot{u_l} + \frac{{{u_l}-1}}{2} + k} \right)}}\;\;\;\;\;\;\;{e_{jk}}(s{d_j}) = 0 \end{array} \right. \end{array} $

(12)

式中, $n = j{u_l} + \frac{{{u_l}-1}}{2} + k\;, \;0 \le j \le {N^u}-1\;, \;-\frac{{{u_l} - 1}}{2} \le k \le \frac{{{u_l} - 1}}{2}$。

3 多模式编码

3D形状视频中的每一帧经过形状活动性分析后决定是进行帧内编码还是帧间预测编码。对于帧内编码图像, 在完成轮廓提取、细化和轮廓段划分等处理步骤后进一步进行直线段检测, 如果轮廓段中有长的直线段则将其分离进行独立编码。然后对直线子段和普通非直线子段进行差异化熵编码, 具体采用和文献[11]相同的编码策略。

对于帧间预测编码图像, 在完成轮廓提取和预处理后基于图 2进行运动/视差估计与补偿预测, 包括MCP、DCP、双向MCP、双向DCP、联合MCP和DCP等多种预测模式, 具体根据帧所在时空位置而定。对于帧间编码图像中的某个轮廓段${\mathit{\boldsymbol{C}}_i}$, 在完成预测补偿后, 对其编码等效于对运动/视差矢量集$\{ s{d_j}\} (j = 0, \ldots, {N^u}-1)$和预测残差$\mathit{\boldsymbol{C}}_i^e$编码。由于${\mathit{\boldsymbol{C}}_i}$和其参考段$\mathit{\boldsymbol{C}}_p^r$的相关性, 经过预测补偿后$\mathit{\boldsymbol{C}}_i^e$中的大部分链路值为0, 而且0链路的行程往往较大。为此, 为了提高编码效率对$\mathit{\boldsymbol{C}}_i^e$中的0链路进行行程编码, 对非0链路采用帧内编码中的普通子段相同的编码策略, 利用前后链路之间的约束特性和空域相关性进行编码。对于$\{ s{d_j}\} $, 基于经典的Huffman编码进行无失真编码。

4 实验结果

为了检验所提算法的性能, 进行了实验仿真。使用了3个具有两个视点的3D视频序列, 具体参数如表 1所示。实验中, 对给定的每个3D形状视频序列, 选取左视点作为主视点, 右视点作为辅视点。编码时首先将其分割成若干个图片组, 然后逐个图片组进行。对图片组中的每个原始二值形状图像首先提取对象轮廓并进行轮廓细化、链码转换、轮廓段划分等预处理。然后对象轮廓进行活动性分析确定其编码模式, 是采用帧内编码还是进行帧间预测编码。对帧内编码图像进一步进行轮廓内直线段检测分离出长直线子段, 然后对直线子段和非直线普通子段进行差异化熵编码以提高编码效率。对于帧间预测编码图像, 首先进行基于轮廓和链码表示的运动/视差估计与补偿预测以去除帧间对象轮廓的相关性, 然后对预测后的运动/视差矢量和预测残差进行编码。

表 1 测试序列
Table 1 Test sequences

下载CSV

测试序列	帧	格式
Boy	1~100	540×480
Car	1~100	540×480
Lakeside	1~100	540×480

图 3给出了原始测试序列第1帧的形状图像和提取后的最终轮廓边缘图像。在本文算法中, 为了充分利用对象轮廓间的空域和时域相关性, 要求提取的边缘轮廓必须是严格单像素的, 即除了起始点和轮廓交叉点外, 对于对象轮廓上的每个边缘点, 在其8-邻域内存在且只存在2个相邻的轮廓边缘点。图 3中的轮廓即是经过边缘提取和细化处理后的严格单像素宽度。

图 3 测试序列第1帧原始形状图像和提取的最终对象轮廓

Fig. 3 Original shape images and the extracted final object contours of the first image pairs((a) left and right shape images and extracted object contours of Boy sequence; (b) left and right shape images and extracted object contours of Car sequence; (c) left and right shape images and extracted object contours of Lakeside sequence)

运动/视差估计与补偿是本文算法的核心与关键, 预测单元的长度和搜索区间的大小直接影响编码后的平均码长。图 4给出了本文算法在不同预测单元长度和不同搜索区间大小下的编码结果。可以看出, 搜索区别越大, 编码后的平均码长越短, 编码效率一般会越高。但编码效率与预测单元的长度并不是单调性函数。一般来说, 选取较长的预测单元可以减少预测单元的数量, 从而减少运动矢量的数据量。但长度过长可能会导致预测残差增大反而增加残差的编码比特。

图 4 采用不同长度预测单元和搜索范围的对比实验结果

Fig. 4 Comparison results with different prediction unit lengths and search scopes((a) Boy sequence; (b) Car sequence; (c) Lakeside sequence)

表 2给出了本文算法与现有主要同类形状编码方法的对比实验结果, 包括经典的MPEG-4 CAE Inter ^[6]和我们前期提出的文献[11]和文献[12]中的方法。从表 3可以看出所提算法压缩效率均高于这3种方法。为了进一步定量考察所提算法与已有方法的相对压缩性能, 给出相对压缩比特率的定义, 即

$ \gamma = \frac{{{\eta _p}}}{{{\eta _i}}} $

(13)

表 2 与现有方法的对比实验结果(${u_l} = 9, a = 32$)
Table 2 Comparison results against the existing methods(${u_l} = 9, a = 32$)

下载CSV

测试序列	CAE Inter	文献[11]	文献[12]	本文算法
Boy	2 322	1 026	879	758
Car	1 445	345	278	249
Lakeside	3 857	2 367	2 234	2 151

表 3 所提方法与现有方法的$\gamma $值
Table 3 Comparison results against the existing methods with respect to $\gamma $

下载CSV

/%
测试序列	CAE Inter	文献[11]	文献[12]
Boy	32.6	73.9	86.2
Car	17.2	72.2	89.6
Lakeside	55.8	90.9	96.3
平均值	35.2	79.0	90.7

式中, $\gamma $表示相对压缩比特率, ${\eta _p}$和${\eta _i}$分别表示所提算法与现有方法编码压缩后的比特率, 单位为比特/像素。表 3给出了本文算法对比上述3种方法的$\gamma $值。可以看出, 与现有方法相比, 本文算法能大幅降低形状图像编码压缩后的平均比特率, 分别是CAE Inter的35.2%、文献[11]的79.0%和文献[12]的90.7%。

5 结论

3D视频是一种能够提供多个视点信息和实现立体感知的新型视频, 未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基处理技术具有更好的语义理解、表达和交互性, 因此, 对象基3D视频技术是未来3D视频技术的重要发展趋势。为此, 本文在前期图像和视频形状编码研究工作的基础上, 研究了基于轮廓和链码表示的高效3D视频形状编码方法, 提出了基于轮廓的运动补偿预测(CB-MCP)和视差补偿预测技术(CB-DCP), 可以充分利用3D对象形状轮廓的视点间和视点内的时域与空域相关性从而进行高效压缩。实验结果显示其编码效率优于现有的同类方法。所提算法能广泛应用于对象基编码、对象基检索以及对象交互等图像和视频应用领域。另外, 提出的预测结构较为简单, 预测性能存在提升空间。为此, 下一步拟对此进行重点研究, 改进和优化预测结构和预测方法以进一步提高编码效率。

参考文献

[1] Gao Y, Wang M, Tao D C, et al. 3-D object retrieval and recognition with hypergraph analysis[J]. IEEE Transactions on Image Processing, 2012, 21(9): 4290–4303. [DOI:10.1109/TIP.2012.2199502]

[2] Zhu Z J, Wang Y E, Jiang G Y. Unsupervised segmentation of natural images based on statistical modeling[J]. Neurocomputing, 2017, 252: 95–101. [DOI:10.1016/j.neucom.2016.03.117]

[3] Zhu Z J, Wang Y E, Jiang G Y. On multi-view video segmentation for object-based coding[J]. Digital Signal Processing, 2012, 22(6): 954–960. [DOI:10.1016/j.dsp.2012.05.006]

[4] ISO/IEC JTC1/SC29. ISO/IEC-11544 Coded representation of picture and audio information-progressive bi-level image compression[S]. Japan: ISO/IEC, 1993.

[5] ISO/IEC JTC1/SC29. ISO/IEC-14492 Coded representation of picture and audio information-lossy/lossless coding of bi-Level images (JBIG2)[S]. Japan: ISO/IEC, 2000.

[6] ISO/IEC JTC1/SC29. ISO/IEC-14496-2 Information technology-coding of audio-visual objects-part 2: visual[S]. Japan: ISO/IEC, 1999.

[7] Aghito S M, Forchhammer S. Context-based coding of bilevel images enhanced by digital straight line analysis[J]. IEEE Transactions on Image Processing, 2006, 15(8): 2120–2130. [DOI:10.1109/TIP.2006.875168]

[8] Shen Z L, Frater M R, Arnold J F. Quad-tree block-based binary shape coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(6): 845–850. [DOI:10.1109/TCSVT.2008.919086]

[9] Lai Z Y, Zhang F, Lin W S. Operational rate-distortion shape coding with dual error regularization[C]//Proceedings of 2014 IEEE International Conference on Image Processing. Paris, France: IEEE, 2014, 5547-5550.[DOI:10.1109/ICIP.2014.7026122]

[10] Luo H T. Image-dependent shape coding and representation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(3): 345–354. [DOI:10.1109/TCSVT.2004.842596]

[11] Zhu Z J, Wang Y E, Jiang G Y. High efficient shape coding based on the representation of contour and chain code[J]. Journal on Communications, 2014, 35(8): 8–14. [朱仲杰, 王玉儿, 蒋刚毅. 基于轮廓和链码表示的高效形状编码[J]. 通信学报, 2014, 35(8): 8–14. ] [DOI:10.3969/j.issn.1000-436x.2014.08.002]

[12] Zhu Z J, Wang Y E, Jiang G Y. Spatio-temporal shape prediction and efficient coding[J]. Journal of Image and Graphics, 2016, 21(1): 1–7. [朱仲杰, 王玉儿, 蒋刚毅. 空时形状预测与高效编码[J]. 中国图象图形学报, 2016, 21(1): 1–7. ] [DOI:10.11834/jig.20160101]