Print

发布时间: 2019-03-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180476
2019 | Volume 24 | Number 3




    图像分析和识别    




  <<上一篇 




  下一篇>> 





渐进式前景更新和区域关联的图像协同分割
expand article info 姚拓中1, 左文辉2, 安鹏1, 宋加涛1
1. 宁波工程学院电信学院, 宁波 315016;
2. 浙江大学信电系, 杭州 310027

摘要

目的 图像协同分割技术是通过多幅参考图像以实现前景目标与背景区域的分离,并已被广泛应用于图像分类和目标识别等领域中。不过,现有多数的图像协同分割算法只适用于背景变化较大且前景几乎不变的环境。为此,提出一种新的无监督协同分割算法。方法 本文方法是无监督式的,在分级图像分割的基础上通过渐进式优化框架分别实现前景和背景模型的更新估计,同时结合图像内部和不同图像之间的分级区域相似度关联进一步增强上述模型估计的鲁棒性。该无监督的方法不需要进行预先样本学习,能够同时处理两幅或多幅图像且适用于同时存在多个前景目标的情况,并且能够较好地适应前景物体类的变化。结果 通过基于iCoseg和MSRC图像集的实验证明,该算法无需图像间具有显著的前景和背景差异这一约束,与现有的经典方法相比更适用于前景变化剧烈以及同时存在多个前景目标等更为一般化的图像场景中。结论 该方法通过对分级图像分割得到的超像素外观分布分别进行递归式估计来实现前景和背景的有效区分,并同时融合了图像内部以及不同图像区域之间的区域关联性来增加图像前景和背景分布估计的一致性。实验表明当前景变化显著时本文方法相比于现有方法具有更为鲁棒的表现。

关键词

图像协同分割; 分级图像分割; 渐进式前景估计; 分级区域关联; 归一化割

Image co-segmentation with progressive foreground updating and hierarchical region correlation
expand article info Yao Tuozhong1, Zuo Wenhui2, An Peng1, Song Jiatao1
1. School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315016, China;
2. College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Supported by: Young Scientists Fund of National Natural Science Foundation of China (61502256)

Abstract

Objective As a hotspot in computer vision, image co-segmentation is a research branch of the classic image segmentation problem that uses multiple images to separate foreground objects from background regions in an image. It has been widely used in many fields, such as image classification, object recognition, and 3D object reconstruction. Image co-segmentation has become an ill-conditioned and challenging problem due to many factors, such as viewpoint change and intraclass diversity of the foreground objects in the image. Most current image co-segmentation algorithms have limits in performance, which only work efficiently in images with dramatic background and minimal foreground changes. Method This study proposes a new unsupervised algorithm that optimizes foreground/background estimation progressively. Our proposed algorithm has three advantages:1) it is unsupervised and does not need sample learning, 2) it can be used to co-segment multiple images simultaneously or an image with multiple foreground objects, 3) it is more adaptable to dramatic intraclass variations than previous algorithms. The main steps of our algorithm are as follows. A classic hierarchical segmentation is first utilized to generate a multiscale superpixel set. Different Gaussian mixture models are then used to estimate the foreground and background distributions on the basis of classic color and texture descriptors at the superpixel level. A Markov random field (MRF) model is used to estimate the annotation of each superpixel by solving a traditional energy minimization problem. In our MRF model, each node represents a superpixel or pixel. The first two unary potentials denote the possibilities of a superpixel or pixel belonging to the foreground or background, and the last pairwise potential penalizes the annotation consistency among superpixels in different images. This energy minimization can be solved by a classic graph cut. Unlike most image co-segmentation algorithms, the foreground and background models are progressively estimated based on the initial superpixel annotation from the pre-learned object detector. These models use the annotation in the current step to update the superpixel annotation in the next step for foreground and background distribution updating until these distributions are no longer optimized significantly. Intra- and inter-image similarity correlations in different superpixel levels are integrated into our iterative-type framework to increase the robustness of foreground and background model estimation. Each image is divided into a series of segmentation levels by hierarchical segmentation, and three matrices are used to model the semantic correlations among different regions. An affinity matrix $\mathit{\boldsymbol{A}}$ is utilized to define the relationship among neighboring superpixels inside one image. A constraint matrix $\mathit{\boldsymbol{C}}$ is defined to describe the hierarchical relation among different segmentation levels. Another affinity matrix $\mathit{\boldsymbol{M}}$ is utilized to define the relationship among superpixels in different images. A normalized affinity matrix is then defined based on $\mathit{\boldsymbol{P}}$ and a new matrix $\mathit{\boldsymbol{Q}}$ created based on $\mathit{\boldsymbol{C}}$ to project $\mathit{\boldsymbol{P}}$ into the solution space. The optimal annotation of superpixel pairs inside one image and in different images can be achieved by classic normalized cuts. Thus, a new pairwise potential is added to our MRF model for penalizing the corresponding superpixel pairs with different annotations in different images. Result In our experiment, iCoseg and MSRC datasets are utilized to compare the performance of our algorithm with those of several state-of-the-art algorithms. Experimental results demonstrate that our proposed algorithm can achieve the highest segmentation accuracy and mean of segmentation accuracy in most object classes, which imply that our algorithm does not need large foreground and background differences and can be used for generalized images with dramatic foreground changes and different foreground objects. In some object classes, such as "Skating" and "Panda", however, our algorithm is inefficient because of the inaccurate initial distribution estimation from the out-of-date object detector, and our iterative-type framework still cannot help the distribution estimation to jump out of a local minimum. Nonetheless, our algorithm can be significantly improved by using state-of-the-art deep learning-based object detectors, such as Mask-RCNN. Conclusion This study proposes a novel unsupervised image co-segmentation algorithm, which iteratively estimates the appearance distribution of each superpixel by hierarchical image segmentation to distinguish the foreground from background. Regional semantic correlations inside one image and in different images are considered a new pairwise potential in the MRF model to increase the consistency of foreground and background distribution. Our detailed experiment shows that our proposed algorithm can achieve a more robust performance than those of state-of-the-art algorithms and can be used to co-segment multiple images with dramatic foreground changes and multiple foreground objects.

Key words

image co-segmentation; hierarchical image segmentation; progressive foreground estimation; hierarchical region correlation; normalized cut

0 引言

作为计算机视觉领域的一大研究热点,图像分割技术具有广阔的应用前景,并在近二十年里获得了快速发展。其中,图像协同分割技术是该领域的一个研究分支,其目标是在给出两幅及以上数目图像的基础上将图像中的前景目标与背景区域区分开来。然而,由于拍摄视角和姿态变化以及前景物体的类内多样性等因素的存在,求解该病态问题显得颇具挑战性。迄今为止,研究学者们已经将图像协同分割技术应用到图像检索[1]、视觉摘要生成[2]、医学图像分析[3]、3维物体重建[4]和目标识别[5]等诸多热门领域中。

Rother等人[1]于2006年首先提出了图像协同分割的概念,他们在马尔可夫随机场能量函数中结合了图像间的一致性约束。与采用L1正则化的[1]方法不同,Mukherjee等人[6]使用了L2正则化来度量前景直方图之间的相似性。Hochbaum等人[7]提出了一种奖励模型以满足子模的条件,进而可使用经典的图切割法对协同分割问题进行有效求解。不过,上述早期的方法仅能从不同的背景中提取近似相同的前景物体。

近些年的图像协同分割研究开始聚焦于以下几个方面:首先,从仅能协同分割两幅图像发展到可以协同分割多幅图像,Joulin等人[8]在协同分割时考虑了更多的外观变化,其将协同分割问题看做判别性聚类(discriminative clustering)问题;Batra等人[2]提出了一种交互式协同分割算法,通过简单的人工干预来实现更为精准的前景和背景区分;Collins等人[9]将随机游走算法应用到协同分割中,将前景区域的直方图需要彼此匹配作为约束,而Lee等人[10]在文献[9]的基础上提出了多重随机游走策略,通过彼此的交互从静态分布中获得协同分割结果;与此同时,部分协同分割算法致力于获取前景区域之间的关联性:Chang等人[11]使用显著性模型来排除图像间不经常出现的像素区域;Vicente等人[12]从每幅图像中生成候选的物体分割块,并使用随机森林分类器对每个候选分割块对进行评分;Wang等人[13]使用了功能图来表示图像对之间一致性的外观关联性,并联合优化所有图像的分割置信度图。其次,适用于同时存在多个前景目标的情况成为了发展趋势:Kim等人[14]证明了在线性各向异性扩散条件下系统的温度具有子模性,这样协同分割问题可在贪心算法基础上通过对温度最大化进行求解,该算法具有良好的可扩展性,不过却需要人工设定前景目标的个数;Kim等人[15]和Ma等人[16]均致力于对具有多个前景重复出现的图像集进行前景区域的协同分割,不过其仅适用于高度满足背景单一且前景几乎不变的约束。除此之外,虽然也有一些基于监督式[12]和半监督式[17]的协同分割技术,不过目前绝大多数主流的技术均是无监督式的[18-20],无需耗时的人工标注。

1 算法描述

本文提出了一种新的全自动图像协同分割算法,其具有以下四大特点:1)无监督式的,不需要进行预先样本学习;2)能够同时处理两幅或多幅图像;3)适用于同时存在多个前景目标的情况;4)能够较好地适应前景物体类的变化。

本文算法采用传统的马尔可夫随机场框架,其由分级图像分割后得到的超像素节点所构成,节点索引表示为$ \nu \in {\mathit{\boldsymbol{\nu }}_{\rm{r}}} \cup {\mathit{\boldsymbol{\nu }}_{\rm{p}}}$,其中节点$ {\mathit{\boldsymbol{\nu }}_{\rm{r}}}\left( k \right)$${\mathit{\boldsymbol{\nu }}_{\rm{p}}}\left( k \right) $分别表示第$ k$幅图像的超像素和像素。马尔可夫随机场可由一组布尔型随机变量所组成$\mathit{\boldsymbol{X}} = \{ {X_i}, i \in {\mathit{\boldsymbol{\nu }}_{\rm{r}}}\} \cup \{ {X_j}, j \in {\mathit{\boldsymbol{\nu }}_{\rm{p}}}\} $,其中$\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{0}} $$\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{1}} $分别代表背景和前景。本文将图像协同分割问题看做能量函数$ E(\mathit{\boldsymbol{X}})$最小化过程,并通过经典的图切割算法进行求解, 即

$ E\left( \mathit{\boldsymbol{X}} \right) = {\lambda _1}{E_{\rm{p}}} + {\lambda _2}{E_{\rm{r}}} + {\lambda _3}{E_{\rm{m}}} $ (1)

式中,一元项$ {E_{\rm{p}}}$${E_{\rm{r}}} $分别代表了像素和超像素属于前景或背景的概率,并被赋予相应的权重$ λ_{1}$$ λ_{2}$;而二元项$ {E_{\rm{m}}}$的作用是对图像之间超像素标注的一致性进行约束,其权重为$ λ_{3}$。本文算法以递归的方式渐进式优化协同分割效果,即在前景/背景估计的基础上最小化能量函数$ E(\mathit{\boldsymbol{X}})$,并将获得的像素/超像素标注用于更新前景/背景估计直到收敛。

1.1 渐进式前景/背景估计

采用基于分级的无监督图像分割法gPb-owt-ucm [20],将图像集$ \mathit{\boldsymbol{I}}=\{\mathit{\boldsymbol{I}}_{1}, \mathit{\boldsymbol{I}}_{2}, \cdots, \mathit{\boldsymbol{I}}_{n}\}$中的每幅图像均进行基于$ l=1, \cdots, L(L=4)$层的超像素分割操作,从而得到不同尺度的超像素集合$ \mathit{\boldsymbol{R}}$,如图 1所示。超像素集${\mathit{\boldsymbol{\nu }}_{\rm{r}}} $包含$ \mathit{\boldsymbol{R}}$中的每个超像素:${\mathit{\boldsymbol{\nu }}_{\rm{r}}} = {\mathit{\boldsymbol{R}}_k}, \forall {\mathit{\boldsymbol{I}}_k} \in \mathit{\boldsymbol{I}} $

图 1 利用分级图像分割法生成多尺度超像素集
Fig. 1 Use hierarchical image segmentation to generate multi-scale superpixel set

图 1将前景和背景分布分别建模,使得具有相似前景和变化背景的假设被依赖图像内容的前景和背景分布所替代,并将其结合到本文的递归框架中对其进行渐进式优化,从而即使在前景变化较为剧烈的情况下依然能够获得稳定的协同分割结果。

由于在无监督的协同分割框架中无法直接计算像素/超像素属于前景/背景的似然度,因此不能直接获取上述分布。为此,借用经典的物体分割算法[21]来实现图像中前景和背景像素的初始标注。接着,利用传统的颜色和纹理特征来建立前景和背景分布。首先,通过高斯混合模型(GMM)估计像素的颜色分布,图像$k$的前景和背景GMM分别定义为$ H_k^{\rm{f}}$$ H_k^{\rm{b}}$。其次,采用彩色SIFT[22]作为纹理描述符来训练超像素的外观模型,图像k的纹理分类器则表示为$ F_{k}$。那么,式(1)能量函数中代表像素节点的一元项定义为颜色似然分布$ P(C|H)$的对数,即

$ \begin{array}{l} {E_{\rm{p}}} = - \mathop \sum \limits_k^N \mathop \sum \limits_{j \in {\mathit{\boldsymbol{\nu }}_{\rm{p}}}\left( k \right)} {\log _2}(P({C_j}|H_k^{\rm{f}}){X_j} + \\ \;\;\;\;\;\;\;\;\;\;\;\;\;P({C_j}|H_k^{\rm{b}}){{\bar X}_j}) \end{array} $ (2)

式中,$ C_{j}$为像素$ j$的颜色RGB值,$ {{\bar X}_j}$表示$ X_{j}$的求反。$ H_k^{\rm{f}}$为通过除了图像以外的每幅图像的前景像素训练得到的高斯混合模型,类似地,$ H_k^{\rm{b}}$则为基于背景的高斯混合模型。

代表超像素节点的一元项定义为基于纹理描述符的概率估计对数,即

$ {E_r} = - \mathop \sum \limits_k^N \mathop \sum \limits_{i \in {\mathit{\boldsymbol{\nu }}_{\rm{r}}}\left( k \right)} {\log _2}(P_k^{\rm{f}}({T_i}){X_i} + P_k^{\rm{b}}({T_i}){{\bar X}_i}) $ (3)

式中,$ P_k^{\rm{f}}$($ T_{i}$)表示分类器$ F_{k}$输出的超像素$ i$的前景标注估计,而$ F_{k}$通过线性支持向量机训练纹理描述符$ T_{i}$得到,$ P_k^{\rm{b}}$($ T_{i}$)则表示$ F_{k}$输出的背景标注估计。

渐进式前景估计算法利用$ H_k^{\rm{f}}$$ H_k^{\rm{b}}$$ F_{k}$,最小化式(1)能量函数并得到像素/超像素的标注$ {\mathit{\boldsymbol{X}}^*}$,并通过递归的方式在上一次循环得到的$ {\mathit{\boldsymbol{X}}^*}$基础上更新估计$ H_k^{\rm{f}}$$ H_k^{\rm{b}}$,直到算法收敛。

具体算法如下:

初始化标注

$ X_{i}$, $ X_{j}$←初始化标注, $ \forall i \in {\mathit{\boldsymbol{\nu }}_{\rm{r}}}, \forall j \in {\mathit{\boldsymbol{\nu }}_{\rm{p}}}$

For

估计$ H_k^{\rm{f}}$${\rm{GMM}}({X_j} = 1), \forall {\mathit{\boldsymbol{I}}_k} \in \mathit{\boldsymbol{I}}$

估计$ H_k^{\rm{b}}$${\rm{GMM}}({X_j} = 0), \forall {\mathit{\boldsymbol{I}}_k} \in \mathit{\boldsymbol{I}}$

训练分类器${F_k} \leftarrow {X_i}, \forall {\mathit{\boldsymbol{I}}_k} \in \mathit{\boldsymbol{I}} $

标注估计:$ {\mathit{\boldsymbol{X}}^*}$$\mathop {\arg \min }\limits_\mathit{\boldsymbol{X}} E\left( \mathit{\boldsymbol{X}} \right) $

更新标注$ X_{i}$, $ X_{j}$$ {\mathit{\boldsymbol{X}}^*}$

若与前一次递归相比,标注发生变化的像素百分比小于5%时,End For

1.2 分级区域关联

在求解协同分割问题时,挖掘图像间的一致性约束很有价值。为此,本文不仅考虑多尺度条件下图像内部超像素之间的关联性,而且还加入了不同图像间超像素的关联性,并通过分级图模型对上述区域关联性进行建模。

利用分级图像聚类的思想[23]来实现区域关联。假设参与协同分割的$ N$幅图像集合为${\mathit{\boldsymbol{I}}^{1 \cdots N}} $,定义$ \mathit{\boldsymbol{P}}$为分割矩阵,对角矩阵$ \mathit{\boldsymbol{A}}$$ \mathit{\boldsymbol{C}}$分别为亲密度矩阵和约束矩阵,即

$ \begin{array}{l} \mathit{\boldsymbol{P}} = \left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{P}}^1}}\\ \vdots \\ {{\mathit{\boldsymbol{P}}^N}} \end{array}} \right)\;\;\mathit{\boldsymbol{A}} = \left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{A}}^1}}& \cdots &\mathit{\boldsymbol{M}}\\ \vdots &{}& \vdots \\ {{\mathit{\boldsymbol{M}}^{\rm{T}}}}& \cdots &{{\mathit{\boldsymbol{A}}^N}} \end{array}} \right)\\ {\mathit{\boldsymbol{P}}^i} = \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{P}}_1^i}\\ \vdots \\ {\mathit{\boldsymbol{P}}_L^i} \end{array}} \right)\;\;\mathit{\boldsymbol{C}} = \left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{C}}^1}}& \cdots &\mathit{\boldsymbol{0}}\\ \vdots &{}& \vdots \\ {{\mathit{\boldsymbol{0}}^{\rm{T}}}}& \cdots &{{\mathit{\boldsymbol{C}}^N}} \end{array}} \right)\\ \;\;\;\;\;\;{\mathit{\boldsymbol{A}}^i} = \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{A}}_1^i}& \cdots &\mathit{\boldsymbol{0}}\\ \vdots &{}& \vdots \\ \mathit{\boldsymbol{0}}& \cdots &{\mathit{\boldsymbol{A}}_L^i} \end{array}} \right) \end{array} $ (4)

式中,$ \mathit{\boldsymbol{A}}$的内部元素$ \mathit{\boldsymbol{M}}$为稀疏矩阵。在第$ l$$ (l = 1, \cdots , L)$中:$ {P_l} \in {\left\{ {0, 1} \right\}^{{N_l} \times 2}}$,即如果超像素节点$i \in {\mathit{\boldsymbol{V}}_c}, {P_l}\left( {i, c} \right) = 1 $,否则$ P_{l}(i, c)=0$

超像素$ i$$ j$之间的相似度系数$ A_{l}(i, j)$可通过计算直方图距离$ D$和超像素$ i$$ j$之间的共同边界$ w(i, j)$来得到,即

$ {A_l}\left( {i, j} \right) = \frac{{w\left( {i, j} \right)}}{{\mathop \sum \limits_{k \in {N_i}} w\left( {i, k} \right)}} \cdot {{\rm{e}}^{ - {{\left\| {D\left( {{H_i}, {H_j}} \right)} \right\|}^2}/{\alpha _H}}} $ (5)

式中,$ D$$ w(i, j)$的定义如下,即

$ \begin{array}{l} D = {\chi ^2}({H_X}, {H_Y}) = \frac{1}{2}\mathop \sum \limits_k \frac{{{{({H_X}\left( k \right) - {H_Y}\left( k \right))}^2}}}{{{H_X}\left( k \right) + {H_Y}\left( k \right)}}\\ w\left( {i, j} \right) = \mathop \sum \limits_{p = 1}^n \mathop \sum \limits_{q = 1}^n \left\{ {\begin{array}{*{20}{l}} 1&{{G_{p, q}} = i, {G_{p + 1, q + 1}} = j}\\ 0&{其他} \end{array}} \right. \end{array} $ (6)

式中,在计算$ D$时分别使用了颜色、彩色SIFT[22]、纹理基元[24]、曲率[25]和边缘方向5种直方图作为超像素特征。$ w(i, j)$可通过统计与图像尺寸相同的二值化共生矩阵$ \mathit{\boldsymbol{G}}$得到,而$ N_{i}$为超像素$ i$的近邻超像素个数。

式(4)中的约束矩阵$ \mathit{\boldsymbol{C}}$用于描述不同分割层中具有父子从属关系的超像素之间的标注一致性。定义如下超像素的父子关系:超像素$ i$的子节点定义为$ d∈\mathit{\boldsymbol{D}}_{i}$,且$ i$$ d$位于相邻的分割层中,即$ l_{d}=l_{i}-1$。根据gPb-owt-ucm算法的特性,低分割层上的超像素仅从属于高分割层上的某一个父节点超像素,即超像素$ \mathit{\boldsymbol{D}}_{i}$的边界为超像素$ i$的边界。本文使用尺寸大小为$ N_{l+1}×N_{l}$的约束矩阵$ \mathit{\boldsymbol{C}}_{l, l+1}$来定义两个不同分割层之间的关系,即

$ \begin{array}{l} {\mathit{\boldsymbol{C}}^i} = \left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{C}}_{1, 2}}}&{ - {\mathit{\boldsymbol{I}}_2}}&\mathit{\boldsymbol{0}}\\ \vdots&\vdots&\vdots \\ \mathit{\boldsymbol{0}}&{{\mathit{\boldsymbol{C}}_{L - 1, L}}}&{ - {\mathit{\boldsymbol{I}}_L}} \end{array}} \right)\\ {\mathit{\boldsymbol{C}}_{l, l + 1}}\left( {f, c} \right) = \left\{ {\begin{array}{*{20}{l}} {{S_{\rm{f}}}/{S_{\rm{c}}}}&{f \in {\mathit{\boldsymbol{D}}_{\rm{c}}}}\\ 0&{其他} \end{array}} \right. \end{array} $ (7)

式中,$ {S_{\rm{f}}}$$ {S_{\rm{c}}}$分别表示父节点与子节点超像素的面积。

式(4)中的稀疏矩阵$ \mathit{\boldsymbol{M}}$描述了图像间的关系,即

$ \begin{array}{l} M\left( {i, j} \right) = \beta \left( {i, j} \right) \cdot {{\rm{e}}^{ - \frac{{{{\left\| {D\left( {{H_i}, {H_j}} \right)} \right\|}^2}}}{{{\sigma _H}}} - \frac{{{{\left\| {D\left( {{L_i}, {L_j}} \right)} \right\|}^2}}}{{{\sigma _S}}} - V(i, j)}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;i \in \mathit{\boldsymbol{I}}, j \in \mathit{\boldsymbol{J}}\\ \beta \left( {i, j} \right) = \beta \left( {\mathit{\boldsymbol{I}}, \mathit{\boldsymbol{J}}} \right) = \left\{ {\begin{array}{*{20}{l}} {\frac{{\sum {\mathit{\boldsymbol{A}}^I} + \sum {\mathit{\boldsymbol{A}}^J}}}{{{\mathit{\boldsymbol{I}}_{{N_4}}} \times {\mathit{\boldsymbol{J}}_{{N_4}}}}}}&{\beta \left( {\mathit{\boldsymbol{I}}, \mathit{\boldsymbol{J}}} \right) > t}\\ 0&{其他} \end{array}} \right. \end{array} $ (8)

式中,$ N_{4}$为最低分割层中超像素的总数。在这里,依旧采用式(6)中的5种直方图作为超像素特征,并通过$ {\chi ^2}$度量计算直方图距离。向量$ \mathit{\boldsymbol{V}}$由超像素中心的$ x$$ y$坐标以及离心率所构成,并通过欧氏距离$ V\left( {i, j} \right) = {\left\| {{\mathit{\boldsymbol{V}}_i} - {\mathit{\boldsymbol{V}}_j}} \right\|^2}/{\sigma _F}$度量彼此的相似度。对称强度系数$ β(i, j)$通过图像$ I$$ J$的相似度系数${\mathit{\boldsymbol{A}}^I}$$ {\mathit{\boldsymbol{A}}^J}$之和与两幅图像之间的边界连接数像素总数相除得到。

图 2描述了图像内部和图像间的区域关联属性。在图 2中,图像被划分为一系列的分割层,描述图像内部超像素相似度的矩阵$ \mathit{\boldsymbol{A}}$通过近邻的超像素来定义,并通过被标注为红色的共享边界长度进行加权得到。分割层之间的关联性用黄色线段表示,并通过约束矩阵$ \mathit{\boldsymbol{C}}$来描述。表示图像间超像素相似度的矩阵$ \mathit{\boldsymbol{M}}$通过图像最低分割层之间构建的全连接图得到,用绿色线段表示。

图 2 图像内部和图像之间的分级区域关联
Fig. 2 Hierarchical region correlation between different superpixels in both one image and image pairs

为了求解描述最终协同分割结果的二进制分割矩阵$ \mathit{\boldsymbol{X}}$,定义$\mathit{\boldsymbol{P}} = {\mathit{\boldsymbol{D}}^{ - 1/2}}\mathit{\boldsymbol{A}}{\mathit{\boldsymbol{D}}^{ - 1/2}} $为归一化亲密度矩阵,而对角矩阵$D\left( {i, i} \right) = \mathop \sum \limits_j A\left( {i, j} \right) $。通过创建矩阵$ \mathit{\boldsymbol{Q}}$,结合约束矩阵$ \mathit{\boldsymbol{C}}$并将$ \mathit{\boldsymbol{P}}$投影到解空间中,即

$ \mathit{\boldsymbol{Q}} = \mathit{\boldsymbol{I}} - {\mathit{\boldsymbol{D}}^{ - 1/2}}{\mathit{\boldsymbol{C}}^{\rm{T}}}{(\mathit{\boldsymbol{C}}{\mathit{\boldsymbol{D}}^{ - 1}}{\mathit{\boldsymbol{C}}^{\rm{T}}})^{ - 1}}{\mathit{\boldsymbol{D}}^{ - 1/2}} $ (9)

根据经典的瑞利—里兹(Rayleigh-Ritz)理论,可求解出矩阵$ \mathit{\boldsymbol{QPQ}}$的前$ K=2$个特征向量$\mathit{\boldsymbol{\nu }} $。将$ \mathit{\boldsymbol{\nu }}$进行归一化并搜索最优的离散解$ \mathit{\boldsymbol{X}}$,使其满足

$ \begin{array}{l} \;\;\max \varepsilon \left( \mathit{\boldsymbol{x}} \right) = \frac{1}{2}\mathop \sum \limits_{c = 1}^2 \frac{{\mathit{\boldsymbol{X}}_c^{\rm{T}}\mathit{\boldsymbol{A}}{\mathit{\boldsymbol{X}}_c}}}{{\mathit{\boldsymbol{X}}_c^{\rm{T}}\mathit{\boldsymbol{D}}{\mathit{\boldsymbol{X}}_c}}}\\ {\rm{s}}{\rm{.t}}.\;\;\;\mathit{\boldsymbol{X}} \in {\left\{ {0, 1} \right\}^{N \times 2}}, \mathit{\boldsymbol{X}}{\mathit{\boldsymbol{1}}_{2 \times 1}} = {\mathit{\boldsymbol{1}}_N} \end{array} $ (10)

式中,$ \mathit{\boldsymbol{1}}_{N}$$ N×1$的向量。该模型可通过归一化割法对$ \mathit{\boldsymbol{X}}$进行求解[26]

根据求解得到的$ \mathit{\boldsymbol{X}}$,式(1)能量函数中的二元项用于对图像之间不同标注的待匹配超像素进行惩罚。将其写成

(11)

式中,$\mathit{\boldsymbol{\varepsilon }} $是待匹配区域, 当待匹配的超像素对具有相同的前景/背景标注时惩罚项等于0,否则等于1。

2 实验结果与分析

为了验证本文算法的性能,分别采用卡内基梅隆—康奈尔大学的iCoseg图像集[2]和微软剑桥研究院的MSRC图像集[27]作为实验数据集并使用分割精度(SA)和平均分割精度(MSA)来衡量算法的协同分割性能。

在实验中,本文算法相关参数的设定方式如下:

1) GMM的成员数量通过分级聚类法自动确定;

2) 通过交叉验证,设置式(1)中一元项权重因子:$ λ_{1}=λ_{2}=2.5$,二元项权重因子$ λ_{3}=1$

3) 与前一次递归相比, 标注发生变化的像素百分比小于5%时,算法停止。

2.1 iCoseg数据集实验

iCoseg图像集是专门用来测试协同分割算法所建立的图像集,包含了众多具有不同视角、亮度和形变的物体类,以及具有复杂的背景和遮挡情况。为了便于和其他方法进行比较,选用了与文献[12]实验中相同的16个子类。本文方法分别与文献[8]方法(A1)、基于监督的方法(A2)[12]和基于无监督的方法(A3)[15]这3种已有的经典图像协同分割算法进行比较。图 3图 4表 1分别定性和定量地给出了本文方法和上述3种已有方法的对比结果。

图 3 第1组部分物体类的图像协同分割结果(iCoseg数据集)
Fig. 3 Image cosegmentation results of some object classes (iCoseg dataset) of group 1
图 4 第2组部分物体类的图像协同分割结果(iCoseg数据集)
Fig. 4 Image cosegmentation results of some object classes (iCoseg dataset) of group 2

表 1 不同图像协同分割算法的SA和MSA统计对比(iCoseg数据集)
Table 1 SA and MSA comparisons between different image cosegmentation algorithms (iCoseg dataset)

下载CSV
/%
物体类 A1 A2 A3 本文算法 物体类 A1 A2 A3 本文算法
Alaskan bear 74.8 90 73.5 91.6 Kite 87 90.3 93.8 96.8
Red Sox Players 73 90.9 81.4 93 Kite panda 73.2 90.2 87.4 70.3
Stonehenge1 56.6 63.3 77.1 84 Gymnastics 90.9 91.7 78.2 85.1
Stonehenge2 86 88.8 81.9 92.2 Skating 82.1 77.5 72.9 76.8
Hot Balloons 85.2 90.1 83 93.6 Liberty Statue 90.6 93.8 82.5 95.2
Ferrari 85 89.9 88.7 78.7 Liverpool FC 76.4 87.5 85 83.4
Taj Mahal 73.7 91.1 84.9 92.5 Brown Bear 74 95.3 78.1 96
Elephants 70.1 43.1 83.3 79.3 MSA 78.9 85.3 82.7 86.5
Pandas 84 92.7 76.9 65.8
注:加粗字体为最优结果。

图 3图 4表 1不难发现,相比于其他3种已有方法,本文方法在诸如“Alaskan bear”,“Stonehenge”和“Liberty Statue”等多数物体类中均取得了最高的SA,并且在所有方法中具有最高的MSA。上述结果表明,当图像前景发生剧烈变化或者存在多个前景目标时,本文方法在大多数情况下依然能够较为准确地预测前景和背景分布。不过,本文方法在物体类“Panda”和“Kite panda”中表现不佳。对于“Panda”类,其原因是前景分布的预测陷入熊猫的局部白色毛皮区域之中,从而导致物体识别算法的初始化不够准确,这在同一类别物体内部存在较大颜色变化时显得尤为明显。对于“Skating”类,由于滑雪者的身体和衣服部分存在显著颜色差异,从而前景分布预测陷入仅覆盖外套的局部最小,进而导致滑雪者的鞋子和头发等区域被错误地标注为背景。

2.2 MSRC数据集实验

MSRC数据集是用来测试物体语义分割算法所建立的图像集,在该数据集中同一物体类具有众多不同实例的图像。为了便于算法对比,本文选用了与文献[12]方法实验中相同的8个子类。图 5表 2分别定量和定性地给出了本文方法和上述3种已有方法的对比结果。从图 5表 2可以看到,本文算法在前景发生剧烈变化以及前景/背景差异不显著的情况下依然取得了良好的结果。与3种已有方法相比,本文方法在许多物体类中具有最高的SA,并取得了最高的MSA。不过,本文方法对于某些物体类的表现不如现有的方法。图 5第6行给出了一个典型例子,可以看到本文方法将部分车灯区域误判为背景,这主要是由于前景物体存在较大的类内颜色变化。在图 5第8行中的结果也是类似的原因,由于前景分布的估计陷入猫的局部棕色毛皮区域之中,从而导致部分属于猫躯干的白色皮毛区域没有被正确分割。

图 5 部分物体类的协同分割结果(MSRC数据集)
Fig. 5 Image cosegmentation results of some object classes (MSRC dataset)

表 2 不同图像协同分割算法的SA和MSA统计对比(MSRC数据集)
Table 2 SA and MSA comparisons between different image cosegmentation algorithms (MSRC dataset)

下载CSV
/%
物体类 A1 A2 A3 本文算法
Cow 81.6 94.2 76.3 94.8
Horse 80.1 74.4 75.8 83.9
Plane 73.8 83 79.7 81.2
Face 84.3 82 86 87.3
Cars (back) 85.1 78.3 85.2 80
Cars (front) 87.7 79.6 84.4 85.5
Bike 63.3 65.9 73 76.4
Cat 74.4 92.3 80.1 71.7
MSA 76.2 78.8 77.3 80.3
注:加粗字体为最优结果。

3 结论

提出了一种新的无监督协同分割算法。该方法通过对分级图像分割得到的超像素的外观分布分别进行递归式估计来实现前景和背景的有效区分,并同时融合了图像内部以及不同图像区域之间的关联性来增加图像前景和背景分布估计的一致性。本文算法克服了以往方法存在的需要具有明显前景和背景差异的先验约束,并通过实验证明了当前景变化显著时相比于现有的经典方法尤其是基于无监督的方法具有更为鲁棒的表现。接下来,将在现在热门的卷积神经网络基础上设计相应的图像共分割算法,并结合语义匹配[28]的概念以获得更为鲁棒的效果。

参考文献

  • [1] Rother C, Minka T, Blake A, et al. Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs[C]//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY, USA: IEEE, 2006: 993-1000.[DOI:10.1109/CVPR.2006.91]
  • [2] Batra D, Kowdle A, Parikh D, et al. iCoseg: interactive co-segmentation with intelligent scribble guidance[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 3169-3176.[DOI:10.1109/CVPR.2010.5540080]
  • [3] Vitaladevuni S N, Basri R. Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 2203-2210.[DOI:10.1109/CVPR.2010.5539901]
  • [4] Kowdle A, Batra D, Chen W C, et al. iModel: interactive co-segmentation for object of interest 3d modeling[C]//Proceedings of the 11th European Conference on Trends and Topics in Computer Vision. Heraklion, Crete, Greece: Springer, 2010: 211-224.[DOI:10.1007/978-3-642-35740-4_17]
  • [5] Gallagher A C, Chen T. Clothing cosegmentation for recognizing people[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[DOI:10.1109/CVPR.2008.4587481]
  • [6] Mukherjee L, Singh V, Peng J M. Scale invariant cosegmentation for image groups[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 1881-1888.[DOI:10.1109/CVPR.2011.5995420]
  • [7] Hochbaum D S, Singh V. An efficient algorithm for co-segmentation[C]//Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 269-276.[DOI:10.1109/ICCV.2009.5459261]
  • [8] Joulin A, Bach F, Ponce J. Discriminative clustering for image co-segmentation[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1943-1950.[DOI:10.1109/CVPR.2010.5539868]
  • [9] Collins M D, Xu J, Grady L, et al. Random walks based multi-image segmentation: quasiconvexity results and GPU-based solutions[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 1656-1663.[DOI:10.1109/CVPR.2012.6247859]
  • [10] Lee C, Jang W D, Sim J Y, et al. Multiple random walkers and their application to image cosegmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3837-3845.[DOI:10.1109/CVPR.2015.7299008]
  • [11] Chang K Y, Liu T L, Lai S H. From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 2129-2136.[DOI:10.1109/CVPR.2011.5995415]
  • [12] Vicente S, Rother C, Kolmogorov V. Object cosegmentation[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 2217-2224.[DOI:10.1109/CVPR.2011.5995530]
  • [13] Wang F, Huang Q X, Guibas L J. Image co-segmentation via consistent functional maps[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 849-856.[DOI:10.1109/ICCV.2013.110]
  • [14] Kim G, Xing E P, Li F F, et al. Distributed cosegmentation via submodular optimization on anisotropic diffusion[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 169-176.[DOI:10.1109/ICCV.2011.6126239]
  • [15] Kim G, Xing E P. On multiple foreground cosegmentation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 837-844.[DOI:10.1109/CVPR.2012.6247756]
  • [16] Ma T Y, Latecki L J. Graph transduction learning with connectivity constraints with application to multiple foreground cosegmentation[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 1955-1962.[DOI:10.1109/CVPR.2013.255]
  • [17] Wang Z X, Liu R J. Semi-supervised learning for large scale image cosegmentation[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 393-400.[DOI:10.1109/ICCV.2013.56]
  • [18] Dai J F, Wu Y N, Zhou J, et al. Cosegmentation and cosketch by unsupervised learning[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 1305-1312.[DOI:10.1109/ICCV.2013.165]
  • [19] Taniai T, Sinha S N, Sato Y. Joint recovery of dense correspondence and cosegmentation in two images[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 4246-4255.[DOI:10.1109/CVPR.2016.460]
  • [20] Joulin A, Bach F, Ponce J. Multi-class cosegmentation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 542-549.[DOI:10.1109/CVPR.2012.6247719]
  • [21] Carreira J, Sminchisescu C. Constrained parametric min-cuts for automatic object segmentation[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 3241-3248.[DOI:10.1109/CVPR.2010.5540063]
  • [22] Van de Sande K, Gevers T, Snoek C. Evaluating color descriptors for object and scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1582–1596. [DOI:10.1109/TPAMI.2009.154]
  • [23] Kim E, Li H S, Huang X L. A hierarchical image clustering cosegmentation framework[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 686-693.[DOI:10.1109/CVPR.2012.6247737]
  • [24] Deselaers T, Ferrari V. Global and efficient self-similarity for object classification and detection[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1633-1640.[DOI:10.1109/CVPR.2010.5539775]
  • [25] Bullard J W, Garboczi E J, Carter W C, et al. Numerical methods for computing interfacial mean curvature[J]. Computational Materials Science, 1995, 4(2): 103–116. [DOI:10.1016/0927-0256(95)00014-H]
  • [26] Yu S X, Shi J B. Segmentation given partial grouping constraints[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 173–183. [DOI:10.1109/TPAMI.2004.1262179]
  • [27] Shotton J, Winn J, Rother C, et al. TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation[C]//Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer, 2006: 1-15.[DOI:10.1007/11744023_1]
  • [28] Yang F, Li X, Cheng H, et al. Object-aware dense semantic correspondence[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 4151-4159.[DOI:10.1109/CVPR.2017.442]