目的 图像协同分割技术通过多幅参考图像以实现前景目标与背景区域的分离，并已被广泛应用于图像分类和目标识别等领域中。不过，现有多数的图像协同分割算法只适用于背景变化较大且前景几乎不变的环境。方法 本文提出的新方法是无监督式的，在分级图像分割的基础上通过渐进式优化框架分别实现前景和背景模型的更新估计，同时结合图像内部和不同图像之间的分级区域相似度关联进一步增强上述模型估计的鲁棒性。该无监督的方法不需要进行预先样本学习，能够同时处理两幅或多幅图像且适用于同时存在多个前景目标的情况，并且能够较好地适应前景物体类的变化。结果 通过基于iCoseg和MSRC图像集的实验证明，该算法无需图像间具有显著的前景和背景差异这一约束，和现有的经典方法相比更适用于前景变化剧烈以及同时存在多个前景目标等更为一般化的图像场景中。结论 本文提出了一种新的无监督协同分割算法。该方法通过对分级图像分割得到的超像素外观分布分别进行递归式估计来实现前景和背景的有效区分，并同时融合了图像内部以及不同图像区域之间的区域关联性来增加图像前景和背景分布估计的一致性。实验表明当前景变化显著时本文方法相比于现有方法具有更为鲁棒的表现。
Objective As one of the hotspots in computer vision, image cosegmentaion is a research branch of the classic image segmentation problem which uses multiple images to separate the foreground objects from background regions in an image and has been widely used in many fields such as image classification, object recognition and 3D object reconstruction. Due to many factors such as viewpoint change and intra-class diversity of the foreground objects in the image, image cosegmentaion has become an ill-conditioned and challenging problem. Most of the current image cosegmentaion algorithms have limits in performance which only work well in images with dramatic background and little foreground changes. Method In this paper, we proposed a new unsupervised algorithm which optimizes the foreground/background estimation progressively. Our proposed algorithm has three advantages: 1) it is unsupervised which doesn’t need sample learning; 2) it can be used to cosegment multiple images simultaneously or an image with multiple foreground objects; 3) it is more adaptable to dramatic intra-class variations than previous algorithms. The main steps of our algorithm are as follows: Firstly, a classic hierarchical segmentation is utilized to generate multi-scale superpixel set. Then, different Gaussian Mixture Models are used to estimate the foreground and background distributions respectively based on classic color and texture descriptors in superpixel level. We use a Markov Random Field (MRF) model to estimate the annotation of each superpixel based on solving a traditional energy minimization problem. In our MRF model, each node represents a superpixel or pixel. The first two unary potentials denotes the possibilities of superpixel or pixel belongs to foreground or background and the last pairwise potential penalizes the annotation consisitency between superpixels in different images. This energy minimization can be solved by classic graph cut. Unlike most image cosegmentation algorithms, we estimate the foreground and background models progressively based on initial superpixel annotation from the pre-learned object detector and use the annotation in current step to update the superpixel annotation in next step for foreground and background distribution updating until these distributions are no longer optimized significantly. Furthermore, both intra-image and inter-image similarity correlation in different superpixel levels are integrated into our iterative type framework for increasing the robustness of foreground and background model estimation. Each image is divided into a series of segmentation levels by hierarchical segmentation and three matrices are used to model the semantic correlations between different regions. An affinity matrix is utilized to define the relationship between neighboring superpixels inside one image. A constraint matrix is defined to describe the hierarchical relation between different segmentation levels. Another affinity matrix is utilized to define the relationship between superpixels in different images. Then, we define a normalized affinity matrix based on and create a new matrix based on to project into the solution space. The optimal annotation of superpixel pairs inside one image and in different images can be achieved by classic normalized cuts. Thus, a new pairwise potential is added into our MRF model for penalizing the corresponding superpixel pairs with different annotation in different images. Result In our experiment, iCoseg and MSRC datasets are utilized for comparing the performance of our algorithm with several state-of-the-art algorithms. The experimental results demonstrate that our proposed algorithm can achieve highest segmentation accuracy and mean of segmentation accuracy in most object classes which shows that our algorithm doesn’t need the large foreground and background differences and can be used for more generalized images with dramatic foreground change and different foreground objects. However, in some object classes such as “Skating” and “Panda”, our algorithm doesn’t work well because of inaccurate initial distribution estimation from the out-of-date object detector and our iterative type framework still can’t help the distribution estimation jump out of the local minimum. But it can be significantly improved by using state-of-the-art deep learning based object detector such as Mask-RCNN. Conclusion This paper proposed a novel unsupervised image cosegmentation algorithm which iteratively estimates the appearance distribution of each superpixel by hierarchical image segmentation to distinguish the foreground from background. Meanwhile, regional semantic correlations inside one image and in different images are both considered as a new pairwise potential in MRF model to increase the consistency of foreground and background distribution. Our detailed experiment shows that our proposed algorithm can achieve more robust performance than state-of-the-art algorithms which can be used for cosegmenting multiple images with dramatic foreground change and multiple foreground objects.