发布时间: 2019-03-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180476 2019 | Volume 24 | Number 3 图像分析和识别

1. 宁波工程学院电信学院, 宁波 315016;
2. 浙江大学信电系, 杭州 310027
 收稿日期: 2018-08-09; 修回日期: 2018-09-10 基金项目: 国家自然科学基金青年科学基金项目(61502256);浙江省重点研发计划基金项目(2018C01086);宁波市自然科学基金项目(2018A610160) 第一作者简介: 姚拓中, 1983年生, 男, 讲师, 主要研究方向为计算机视觉、机器学习。E-mail:thomasyao@zju.edu.cn;安鹏, 男, 教授, 主要研究方向为机器人和嵌入式系统。E-mail:anp04@126.com;宋加涛, 男, 教授, 主要研究方向为模式识别和图像处理。E-mail:sjt6612@163.com. 中图法分类号: TP391.4 文献标识码: A 文章编号: 1006-8961(2019)03-0366-10

# 关键词

Image co-segmentation with progressive foreground updating and hierarchical region correlation
Yao Tuozhong1, Zuo Wenhui2, An Peng1, Song Jiatao1
1. School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315016, China;
2. College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Supported by: Young Scientists Fund of National Natural Science Foundation of China (61502256)

# Abstract

Objective As a hotspot in computer vision, image co-segmentation is a research branch of the classic image segmentation problem that uses multiple images to separate foreground objects from background regions in an image. It has been widely used in many fields, such as image classification, object recognition, and 3D object reconstruction. Image co-segmentation has become an ill-conditioned and challenging problem due to many factors, such as viewpoint change and intraclass diversity of the foreground objects in the image. Most current image co-segmentation algorithms have limits in performance, which only work efficiently in images with dramatic background and minimal foreground changes. Method This study proposes a new unsupervised algorithm that optimizes foreground/background estimation progressively. Our proposed algorithm has three advantages:1) it is unsupervised and does not need sample learning, 2) it can be used to co-segment multiple images simultaneously or an image with multiple foreground objects, 3) it is more adaptable to dramatic intraclass variations than previous algorithms. The main steps of our algorithm are as follows. A classic hierarchical segmentation is first utilized to generate a multiscale superpixel set. Different Gaussian mixture models are then used to estimate the foreground and background distributions on the basis of classic color and texture descriptors at the superpixel level. A Markov random field (MRF) model is used to estimate the annotation of each superpixel by solving a traditional energy minimization problem. In our MRF model, each node represents a superpixel or pixel. The first two unary potentials denote the possibilities of a superpixel or pixel belonging to the foreground or background, and the last pairwise potential penalizes the annotation consistency among superpixels in different images. This energy minimization can be solved by a classic graph cut. Unlike most image co-segmentation algorithms, the foreground and background models are progressively estimated based on the initial superpixel annotation from the pre-learned object detector. These models use the annotation in the current step to update the superpixel annotation in the next step for foreground and background distribution updating until these distributions are no longer optimized significantly. Intra- and inter-image similarity correlations in different superpixel levels are integrated into our iterative-type framework to increase the robustness of foreground and background model estimation. Each image is divided into a series of segmentation levels by hierarchical segmentation, and three matrices are used to model the semantic correlations among different regions. An affinity matrix $\mathit{\boldsymbol{A}}$ is utilized to define the relationship among neighboring superpixels inside one image. A constraint matrix $\mathit{\boldsymbol{C}}$ is defined to describe the hierarchical relation among different segmentation levels. Another affinity matrix $\mathit{\boldsymbol{M}}$ is utilized to define the relationship among superpixels in different images. A normalized affinity matrix is then defined based on $\mathit{\boldsymbol{P}}$ and a new matrix $\mathit{\boldsymbol{Q}}$ created based on $\mathit{\boldsymbol{C}}$ to project $\mathit{\boldsymbol{P}}$ into the solution space. The optimal annotation of superpixel pairs inside one image and in different images can be achieved by classic normalized cuts. Thus, a new pairwise potential is added to our MRF model for penalizing the corresponding superpixel pairs with different annotations in different images. Result In our experiment, iCoseg and MSRC datasets are utilized to compare the performance of our algorithm with those of several state-of-the-art algorithms. Experimental results demonstrate that our proposed algorithm can achieve the highest segmentation accuracy and mean of segmentation accuracy in most object classes, which imply that our algorithm does not need large foreground and background differences and can be used for generalized images with dramatic foreground changes and different foreground objects. In some object classes, such as "Skating" and "Panda", however, our algorithm is inefficient because of the inaccurate initial distribution estimation from the out-of-date object detector, and our iterative-type framework still cannot help the distribution estimation to jump out of a local minimum. Nonetheless, our algorithm can be significantly improved by using state-of-the-art deep learning-based object detectors, such as Mask-RCNN. Conclusion This study proposes a novel unsupervised image co-segmentation algorithm, which iteratively estimates the appearance distribution of each superpixel by hierarchical image segmentation to distinguish the foreground from background. Regional semantic correlations inside one image and in different images are considered a new pairwise potential in the MRF model to increase the consistency of foreground and background distribution. Our detailed experiment shows that our proposed algorithm can achieve a more robust performance than those of state-of-the-art algorithms and can be used to co-segment multiple images with dramatic foreground changes and multiple foreground objects.

# Key words

image co-segmentation; hierarchical image segmentation; progressive foreground estimation; hierarchical region correlation; normalized cut

# 0 引言

Rother等人[1]于2006年首先提出了图像协同分割的概念，他们在马尔可夫随机场能量函数中结合了图像间的一致性约束。与采用L1正则化的[1]方法不同，Mukherjee等人[6]使用了L2正则化来度量前景直方图之间的相似性。Hochbaum等人[7]提出了一种奖励模型以满足子模的条件，进而可使用经典的图切割法对协同分割问题进行有效求解。不过，上述早期的方法仅能从不同的背景中提取近似相同的前景物体。

# 1 算法描述

 $E\left( \mathit{\boldsymbol{X}} \right) = {\lambda _1}{E_{\rm{p}}} + {\lambda _2}{E_{\rm{r}}} + {\lambda _3}{E_{\rm{m}}}$ (1)

# 1.1 渐进式前景/背景估计

$X_{i}$, $X_{j}$←初始化标注, $\forall i \in {\mathit{\boldsymbol{\nu }}_{\rm{r}}}, \forall j \in {\mathit{\boldsymbol{\nu }}_{\rm{p}}}$

For

# 1.2 分级区域关联

 ${A_l}\left( {i, j} \right) = \frac{{w\left( {i, j} \right)}}{{\mathop \sum \limits_{k \in {N_i}} w\left( {i, k} \right)}} \cdot {{\rm{e}}^{ - {{\left\| {D\left( {{H_i}, {H_j}} \right)} \right\|}^2}/{\alpha _H}}}$ (5)

 $\begin{array}{l} {\mathit{\boldsymbol{C}}^i} = \left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{C}}_{1, 2}}}&{ - {\mathit{\boldsymbol{I}}_2}}&\mathit{\boldsymbol{0}}\\ \vdots&\vdots&\vdots \\ \mathit{\boldsymbol{0}}&{{\mathit{\boldsymbol{C}}_{L - 1, L}}}&{ - {\mathit{\boldsymbol{I}}_L}} \end{array}} \right)\\ {\mathit{\boldsymbol{C}}_{l, l + 1}}\left( {f, c} \right) = \left\{ {\begin{array}{*{20}{l}} {{S_{\rm{f}}}/{S_{\rm{c}}}}&{f \in {\mathit{\boldsymbol{D}}_{\rm{c}}}}\\ 0&{其他} \end{array}} \right. \end{array}$ (7)

 $\mathit{\boldsymbol{Q}} = \mathit{\boldsymbol{I}} - {\mathit{\boldsymbol{D}}^{ - 1/2}}{\mathit{\boldsymbol{C}}^{\rm{T}}}{(\mathit{\boldsymbol{C}}{\mathit{\boldsymbol{D}}^{ - 1}}{\mathit{\boldsymbol{C}}^{\rm{T}}})^{ - 1}}{\mathit{\boldsymbol{D}}^{ - 1/2}}$ (9)

 $\begin{array}{l} \;\;\max \varepsilon \left( \mathit{\boldsymbol{x}} \right) = \frac{1}{2}\mathop \sum \limits_{c = 1}^2 \frac{{\mathit{\boldsymbol{X}}_c^{\rm{T}}\mathit{\boldsymbol{A}}{\mathit{\boldsymbol{X}}_c}}}{{\mathit{\boldsymbol{X}}_c^{\rm{T}}\mathit{\boldsymbol{D}}{\mathit{\boldsymbol{X}}_c}}}\\ {\rm{s}}{\rm{.t}}.\;\;\;\mathit{\boldsymbol{X}} \in {\left\{ {0, 1} \right\}^{N \times 2}}, \mathit{\boldsymbol{X}}{\mathit{\boldsymbol{1}}_{2 \times 1}} = {\mathit{\boldsymbol{1}}_N} \end{array}$ (10)

 (11)

# 2 实验结果与分析

1) GMM的成员数量通过分级聚类法自动确定；

2) 通过交叉验证，设置式(1)中一元项权重因子：$λ_{1}=λ_{2}=2.5$，二元项权重因子$λ_{3}=1$

3) 与前一次递归相比, 标注发生变化的像素百分比小于5%时，算法停止。

# 2.1 iCoseg数据集实验

iCoseg图像集是专门用来测试协同分割算法所建立的图像集，包含了众多具有不同视角、亮度和形变的物体类，以及具有复杂的背景和遮挡情况。为了便于和其他方法进行比较，选用了与文献[12]实验中相同的16个子类。本文方法分别与文献[8]方法(A1)、基于监督的方法(A2)[12]和基于无监督的方法(A3)[15]这3种已有的经典图像协同分割算法进行比较。图 3图 4表 1分别定性和定量地给出了本文方法和上述3种已有方法的对比结果。

Table 1 SA and MSA comparisons between different image cosegmentation algorithms (iCoseg dataset)

 /% 物体类 A1 A2 A3 本文算法 物体类 A1 A2 A3 本文算法 Alaskan bear 74.8 90 73.5 91.6 Kite 87 90.3 93.8 96.8 Red Sox Players 73 90.9 81.4 93 Kite panda 73.2 90.2 87.4 70.3 Stonehenge1 56.6 63.3 77.1 84 Gymnastics 90.9 91.7 78.2 85.1 Stonehenge2 86 88.8 81.9 92.2 Skating 82.1 77.5 72.9 76.8 Hot Balloons 85.2 90.1 83 93.6 Liberty Statue 90.6 93.8 82.5 95.2 Ferrari 85 89.9 88.7 78.7 Liverpool FC 76.4 87.5 85 83.4 Taj Mahal 73.7 91.1 84.9 92.5 Brown Bear 74 95.3 78.1 96 Elephants 70.1 43.1 83.3 79.3 MSA 78.9 85.3 82.7 86.5 Pandas 84 92.7 76.9 65.8 注：加粗字体为最优结果。

# 2.2 MSRC数据集实验

MSRC数据集是用来测试物体语义分割算法所建立的图像集，在该数据集中同一物体类具有众多不同实例的图像。为了便于算法对比，本文选用了与文献[12]方法实验中相同的8个子类。图 5表 2分别定量和定性地给出了本文方法和上述3种已有方法的对比结果。从图 5表 2可以看到，本文算法在前景发生剧烈变化以及前景/背景差异不显著的情况下依然取得了良好的结果。与3种已有方法相比，本文方法在许多物体类中具有最高的SA，并取得了最高的MSA。不过，本文方法对于某些物体类的表现不如现有的方法。图 5第6行给出了一个典型例子，可以看到本文方法将部分车灯区域误判为背景，这主要是由于前景物体存在较大的类内颜色变化。在图 5第8行中的结果也是类似的原因，由于前景分布的估计陷入猫的局部棕色毛皮区域之中，从而导致部分属于猫躯干的白色皮毛区域没有被正确分割。

Table 2 SA and MSA comparisons between different image cosegmentation algorithms (MSRC dataset)

 /% 物体类 A1 A2 A3 本文算法 Cow 81.6 94.2 76.3 94.8 Horse 80.1 74.4 75.8 83.9 Plane 73.8 83 79.7 81.2 Face 84.3 82 86 87.3 Cars (back) 85.1 78.3 85.2 80 Cars (front) 87.7 79.6 84.4 85.5 Bike 63.3 65.9 73 76.4 Cat 74.4 92.3 80.1 71.7 MSA 76.2 78.8 77.3 80.3 注：加粗字体为最优结果。

# 参考文献

• [1] Rother C, Minka T, Blake A, et al. Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs[C]//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY, USA: IEEE, 2006: 993-1000.[DOI:10.1109/CVPR.2006.91]
• [2] Batra D, Kowdle A, Parikh D, et al. iCoseg: interactive co-segmentation with intelligent scribble guidance[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 3169-3176.[DOI:10.1109/CVPR.2010.5540080]
• [3] Vitaladevuni S N, Basri R. Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 2203-2210.[DOI:10.1109/CVPR.2010.5539901]
• [4] Kowdle A, Batra D, Chen W C, et al. iModel: interactive co-segmentation for object of interest 3d modeling[C]//Proceedings of the 11th European Conference on Trends and Topics in Computer Vision. Heraklion, Crete, Greece: Springer, 2010: 211-224.[DOI:10.1007/978-3-642-35740-4_17]
• [5] Gallagher A C, Chen T. Clothing cosegmentation for recognizing people[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[DOI:10.1109/CVPR.2008.4587481]
• [6] Mukherjee L, Singh V, Peng J M. Scale invariant cosegmentation for image groups[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 1881-1888.[DOI:10.1109/CVPR.2011.5995420]
• [7] Hochbaum D S, Singh V. An efficient algorithm for co-segmentation[C]//Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 269-276.[DOI:10.1109/ICCV.2009.5459261]
• [8] Joulin A, Bach F, Ponce J. Discriminative clustering for image co-segmentation[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1943-1950.[DOI:10.1109/CVPR.2010.5539868]
• [9] Collins M D, Xu J, Grady L, et al. Random walks based multi-image segmentation: quasiconvexity results and GPU-based solutions[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 1656-1663.[DOI:10.1109/CVPR.2012.6247859]
• [10] Lee C, Jang W D, Sim J Y, et al. Multiple random walkers and their application to image cosegmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3837-3845.[DOI:10.1109/CVPR.2015.7299008]
• [11] Chang K Y, Liu T L, Lai S H. From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 2129-2136.[DOI:10.1109/CVPR.2011.5995415]
• [12] Vicente S, Rother C, Kolmogorov V. Object cosegmentation[C]//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE, 2011: 2217-2224.[DOI:10.1109/CVPR.2011.5995530]
• [13] Wang F, Huang Q X, Guibas L J. Image co-segmentation via consistent functional maps[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 849-856.[DOI:10.1109/ICCV.2013.110]
• [14] Kim G, Xing E P, Li F F, et al. Distributed cosegmentation via submodular optimization on anisotropic diffusion[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 169-176.[DOI:10.1109/ICCV.2011.6126239]
• [15] Kim G, Xing E P. On multiple foreground cosegmentation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 837-844.[DOI:10.1109/CVPR.2012.6247756]
• [16] Ma T Y, Latecki L J. Graph transduction learning with connectivity constraints with application to multiple foreground cosegmentation[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 1955-1962.[DOI:10.1109/CVPR.2013.255]
• [17] Wang Z X, Liu R J. Semi-supervised learning for large scale image cosegmentation[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 393-400.[DOI:10.1109/ICCV.2013.56]
• [18] Dai J F, Wu Y N, Zhou J, et al. Cosegmentation and cosketch by unsupervised learning[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 1305-1312.[DOI:10.1109/ICCV.2013.165]
• [19] Taniai T, Sinha S N, Sato Y. Joint recovery of dense correspondence and cosegmentation in two images[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 4246-4255.[DOI:10.1109/CVPR.2016.460]
• [20] Joulin A, Bach F, Ponce J. Multi-class cosegmentation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 542-549.[DOI:10.1109/CVPR.2012.6247719]
• [21] Carreira J, Sminchisescu C. Constrained parametric min-cuts for automatic object segmentation[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 3241-3248.[DOI:10.1109/CVPR.2010.5540063]
• [22] Van de Sande K, Gevers T, Snoek C. Evaluating color descriptors for object and scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1582–1596. [DOI:10.1109/TPAMI.2009.154]
• [23] Kim E, Li H S, Huang X L. A hierarchical image clustering cosegmentation framework[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 686-693.[DOI:10.1109/CVPR.2012.6247737]
• [24] Deselaers T, Ferrari V. Global and efficient self-similarity for object classification and detection[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1633-1640.[DOI:10.1109/CVPR.2010.5539775]
• [25] Bullard J W, Garboczi E J, Carter W C, et al. Numerical methods for computing interfacial mean curvature[J]. Computational Materials Science, 1995, 4(2): 103–116. [DOI:10.1016/0927-0256(95)00014-H]
• [26] Yu S X, Shi J B. Segmentation given partial grouping constraints[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 173–183. [DOI:10.1109/TPAMI.2004.1262179]
• [27] Shotton J, Winn J, Rother C, et al. TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation[C]//Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer, 2006: 1-15.[DOI:10.1007/11744023_1]
• [28] Yang F, Li X, Cheng H, et al. Object-aware dense semantic correspondence[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 4151-4159.[DOI:10.1109/CVPR.2017.442]