Current Issue Cover
特征一致性约束的视频目标分割

征煜1, 陈亚当1, 郝川艳2(1.南京信息工程大学计算机与软件学院, 南京 210044;2.南京邮电大学教育科学与技术学院, 南京 210023)

摘 要
目的 视频目标分割是计算机视觉领域的一个重要方向,已有的一些方法在面对目标形状不规则、帧间运动存在干扰信息和运动速度过快等情况时,显得无能为力。针对以上不足,提出基于特征一致性的分割算法。方法 本文分割算法框架是基于马尔可夫随机场(Markov random field,MRF)的图论方法。使用高斯混合模型,对预先给定的已标记区域分别进行颜色特征的建模,获得分割的数据项。结合颜色、光流方向等多种特征,建立时空平滑项。在此基础之上,加入基于特征一致性的能量约束项,以增强分割结果的外观一致性。这项添加的能量本身属于一种高阶能量约束,会显著增加能量优化的计算复杂度。为此,添加辅助结点,以解决能量的优化问题,从而提高算法速度。结果 在DAVIS_2016(densely annotated video segmentation)数据集上对该算法进行评估与测试,并与最新的基于图论的方法进行对比分析,对比算法主要有HVS(efficient hierarchical graph-based video segmentation)、NLC(video segmentation by non-local consensus voting)、BVS(bilateral space video segmentation)和OFL(video segmentation via object flow)。本文算法的分割结果精度排在第2,比OFL算法略低1.6%;在算法的运行速度方面,本文算法领先于对比方法,尤其是OFL算法的近6倍。结论 所提出的分割算法在MRF框架的基础之上融合了特征一致性的约束,在不增加额外计算复杂度的前提下,提高了分割精度,提升了算法运行速度。
关键词
Video object segmentation algorithm based on consistent features

Zheng Yu1, Chen Yadang1, Hao Chuanyan2(1.School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.School of Education Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)

Abstract
Objective Video object segmentation is an important topic in the field of computer vision. However, the existing segmentation methods are unable to address some issues, such as irregular objects, noise optical flows, and fast movements. To this end, this paper proposes an effective and efficient algorithm that solves these issues based on feature consistency. Method The proposed segmentation algorithm framework is based on the graph theory method of Markov random field (MRF). First, the Gaussian mixture model (GMM) is applied to model the color features of pre-specified marked areas, and the segmented data items are obtained. Second, a spatiotemporal smoothing term is established by combining various characteristics, such as color and optical flow direction. The algorithm then adds energy constraints based on feature consistency to enhance the appearance consistency of the segmentation results. The added energy belongs to a higher-order energy constraint, thereby significantly increasing the computational complexity of energy optimization. The energy optimization problem is solved by adding auxiliary nodes to improve the speed of the algorithm. The higher-order constraint term comes from the idea of text classification, which is used in this paper to model the higher-order term of the segmentation equation. Each super-pixel point corresponds to a text, and the scale-invariant feature transform (SIFT) feature point in the super-pixel point is used as a word in the text. The higher-order term is modeled afterward via extraction and clustering. Given the running speed of the algorithm, auxiliary nodes are added to optimize the high-order term. The high-order term is approximated to the data and smoothing items, and then the graph cutting algorithm is used to complete the segmentation. Result The test data were taken from the DAVIS_2016(densely annotated video segmentation) dataset, which contains 50 sets of data, of which 30 and 20 are sets of training and verification data, respectively. This dataset has a resolution of 854×480 pixels. Given that many methods are based on MRF expansion, α=0.3 and β=0.2 are empirically set in the proposed algorithm to maintain a capability balance among the data, smoothing, and feature consistency items. Similar to extant methods, the number of submodels used to establish the Gaussian mixture model for the front/background is set to 5,σh=σh=0.1. This paper focuses on the verification and evaluation of the proposed feature consistency constraint terms and sets β=0 and β=0.2 to divide the videos under the constraint condition. The experimental results show that the IoU score with higher-order constraints is 10.2% higher than that without higher-order constraints. To demonstrate its effectiveness, the proposed method is compared with some other classical video segmentation algorithms based on graph theory. The experimental results highlight the competitive segmentation effect of the proposed algorithm. Meanwhile, the average IoU score reported in this paper is slightly lower than that of the video segmenfation via object flow(OFL) algorithm because the latter continuously iteratively optimizes the optical flow calculation results to achieve a relatively high segmentation accuracy. The proposed algorithm takes nearly 10 seconds on average to segment each frame, which is shorter than the running time of other algorithms. For instance, although the OFL algorithm reports a slightly higher accuracy, its average processing time for each frame is approximately 1 minute, which is 6 times longer than that of the proposed algorithm. In sum, the proposed algorithm can achieve the same segmentation effect with a much lower computational complexity than the OFL algorithm. However, the accuracy of its segmentation results is 1.6% lower than that of the results obtained by the OFL algorithm. Nevertheless, in terms of running speed, the proposed algorithm is ahead of other methods and is approximately 6 times faster than the OFL algorithm. Conclusion Experimental results show that when the current/background color is not clear enough, the foreground object and the background are often confused, thereby resulting in incorrect segmentation. However, when the global feature consistency constraint is added, the proposed algorithm can optimize the segmentation result of each frame by the feature statistics of the entire video. By using global information to optimize local information, the proposed segmentation method shows strong robustness to random noise, irregular motions, blurry backgrounds, and other problems in the video. According to the experimental results, the proposed algorithm spends most of its time in calculating the optical flow and can be replaced by a more efficient motion estimation algorithm in the future. However, compared with other segmentation algorithms, the proposed method shows great advantages in its performance. Based on the MRF framework, the proposed segmentation algorithm integrates the constraints of feature consistency and improves both segmentation accuracy and operation speed without increasing computational complexity. However, this method has several shortcomings. First, given that the proposed algorithm segments a video based on super pixels, the segmentation results depend on the segmentation accuracy of these super pixels. Second, the proposed high-order feature energy constraint has no obvious effect on feature-free regions because the SIFT feature points detected in similar regions will be greatly reduced, thereby creating super-pixel blocks that are unable to detect a sufficient number of feature points, which subsequently influences the global statistics of front/background features and prevents the proposed method from optimizing the segmentation results of feature-free regions. Similar to traditional methods, the optical flow creates a bottleneck in the performance of the proposed method. Therefore, additional efforts should be devoted in finding a highly efficient replacement strategy. As mentioned before, the method based on graph theory (including the proposed method) still lags behind the current end-to-end video segmentation methods based on convolutional neural network (CNN) in terms of segmentation accuracy. Future works should then attempt to combine these two approaches to benefit from their respective advantages.
Keywords

订阅号|日报