特征一致性约束的视频目标分割
Video object segmentation algorithm based on consistent features
- 2020年25卷第8期 页码:1558-1566
收稿:2019-11-19,
修回:2020-3-10,
录用:2020-3-17,
纸质出版:2020-08-16
DOI: 10.11834/jig.190571
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-11-19,
修回:2020-3-10,
录用:2020-3-17,
纸质出版:2020-08-16
移动端阅览
目的
2
视频目标分割是计算机视觉领域的一个重要方向,已有的一些方法在面对目标形状不规则、帧间运动存在干扰信息和运动速度过快等情况时,显得无能为力。针对以上不足,提出基于特征一致性的分割算法。
方法
2
本文分割算法框架是基于马尔可夫随机场(Markov random field,MRF)的图论方法。使用高斯混合模型,对预先给定的已标记区域分别进行颜色特征的建模,获得分割的数据项。结合颜色、光流方向等多种特征,建立时空平滑项。在此基础之上,加入基于特征一致性的能量约束项,以增强分割结果的外观一致性。这项添加的能量本身属于一种高阶能量约束,会显著增加能量优化的计算复杂度。为此,添加辅助结点,以解决能量的优化问题,从而提高算法速度。
结果
2
在DAVIS_2016(densely annotated video segmentation)数据集上对该算法进行评估与测试,并与最新的基于图论的方法进行对比分析,对比算法主要有HVS(efficient hierarchical graph-based video segmentation)、NLC(video segmentation by non-local consensus voting)、BVS(bilateral space video segmentation)和OFL(video segmentation via object flow)。本文算法的分割结果精度排在第2,比OFL算法略低1.6%;在算法的运行速度方面,本文算法领先于对比方法,尤其是OFL算法的近6倍。
结论
2
所提出的分割算法在MRF框架的基础之上融合了特征一致性的约束,在不增加额外计算复杂度的前提下,提高了分割精度,提升了算法运行速度。
Objective
2
Video object segmentation is an important topic in the field of computer vision. However
the existing segmentation methods are unable to address some issues
such as irregular objects
noise optical flows
and fast movements. To this end
this paper proposes an effective and efficient algorithm that solves these issues based on feature consistency.
Method
2
The proposed segmentation algorithm framework is based on the graph theory method of Markov random field (MRF). First
the Gaussian mixture model (GMM) is applied to model the color features of pre-specified marked areas
and the segmented data items are obtained. Second
a spatiotemporal smoothing term is established by combining various characteristics
such as color and optical flow direction. The algorithm then adds energy constraints based on feature consistency to enhance the appearance consistency of the segmentation results. The added energy belongs to a higher-order energy constraint
thereby significantly increasing the computational complexity of energy optimization. The energy optimization problem is solved by adding auxiliary nodes to improve the speed of the algorithm. The higher-order constraint term comes from the idea of text classification
which is used in this paper to model the higher-order term of the segmentation equation. Each super-pixel point corresponds to a text
and the scale-invariant feature transform (SIFT) feature point in the super-pixel point is used as a word in the text. The higher-order term is modeled afterward via extraction and clustering. Given the running speed of the algorithm
auxiliary nodes are added to optimize the high-order term. The high-order term is approximated to the data and smoothing items
and then the graph cutting algorithm is used to complete the segmentation.
Result
2
The test data were taken from the DAVIS_2016(densely annotated video segmentation) dataset
which contains 50 sets of data
of which 30 and 20 are sets of training and verification data
respectively. This dataset has a resolution of 854×480 pixels. Given that many methods are based on MRF expansion
α
=0.3 and
β
=0.2 are empirically set in the proposed algorithm to maintain a capability balance among the data
smoothing
and feature consistency items. Similar to extant methods
the number of submodels used to establish the Gaussian mixture model for the front/background is set to 5
σ
h
=
σ
h
=0.1. This paper focuses on the verification and evaluation of the proposed feature consistency constraint terms and sets
β
=0 and
β
=0.2 to divide the videos under the constraint condition. The experimental results show that the IoU score with higher-order constraints is 10.2% higher than that without higher-order constraints. To demonstrate its effectiveness
the proposed method is compared with some other classical video segmentation algorithms based on graph theory. The experimental results highlight the competitive segmentation effect of the proposed algorithm. Meanwhile
the average IoU score reported in this paper is slightly lower than that of the video segmenfation via object flow(OFL) algorithm because the latter continuously iteratively optimizes the optical flow calculation results to achieve a relatively high segmentation accuracy. The proposed algorithm takes nearly 10 seconds on average to segment each frame
which is shorter than the running time of other algorithms. For instance
although the OFL algorithm reports a slightly higher accuracy
its average processing time for each frame is approximately 1 minute
which is 6 times longer than that of the proposed algorithm. In sum
the proposed algorithm can achieve the same segmentation effect with a much lower computational complexity than the OFL algorithm. However
the accuracy of its segmentation results is 1.6% lower than that of the results obtained by the OFL algorithm. Nevertheless
in terms of running speed
the proposed algorithm is ahead of other methods and is approximately 6 times faster than the OFL algorithm.
Conclusion
2
Experimental results show that when the current/background color is not clear enough
the foreground object and the background are often confused
thereby resulting in incorrect segmentation. However
when the global feature consistency constraint is added
the proposed algorithm can optimize the segmentation result of each frame by the feature statistics of the entire video. By using global information to optimize local information
the proposed segmentation method shows strong robustness to random noise
irregular motions
blurry backgrounds
and other problems in the video. According to the experimental results
the proposed algorithm spends most of its time in calculating the optical flow and can be replaced by a more efficient motion estimation algorithm in the future. However
compared with other segmentation algorithms
the proposed method shows great advantages in its performance. Based on the MRF framework
the proposed segmentation algorithm integrates the constraints of feature consistency and improves both segmentation accuracy and operation speed without increasing computational complexity. However
this method has several shortcomings. First
given that the proposed algorithm segments a video based on super pixels
the segmentation results depend on the segmentation accuracy of these super pixels. Second
the proposed high-order feature energy constraint has no obvious effect on feature-free regions because the SIFT feature points detected in similar regions will be greatly reduced
thereby creating super-pixel blocks that are unable to detect a sufficient number of feature points
which subsequently influences the global statistics of front/background features and prevents the proposed method from optimizing the segmentation results of feature-free regions. Similar to traditional methods
the optical flow creates a bottleneck in the performance of the proposed method. Therefore
additional efforts should be devoted in finding a highly efficient replacement strategy. As mentioned before
the method based on graph theory (including the proposed method) still lags behind the current end-to-end video segmentation methods based on convolutional neural network (CNN) in terms of segmentation accuracy. Future works should then attempt to combine these two approaches to benefit from their respective advantages.
Bao L C, Wu B Y and Liu W. 2018. CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 5977-5986[ DOI: 10.1109/CVPR.2018.00626 http://dx.doi.org/10.1109/CVPR.2018.00626 ]
Boykov Y and Kolmogorov V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124-1137[DOI:10.1109/TPAMI.2004.60]
Chen Y D, Hao C Y, Liu A X and Wu E H. 2019a. Appearance-consistent video object segmentation based on a multinomial event model. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(2):1934-1945[DOI:10.1145/3321507]
Chen Y D, Hao C Y, Liu A X and Wu E H. 2019b. Multilevel model for video object segmentation based on supervision optimization. IEEE Transactions on Multimedia, 21(8):1934-1945[DOI:10.1109/TMM.2018.2890361]
Chen Y D, Hao C Y, Wu W and Wu E H. 2018. Efficient frame-sequential label propagation for video object segmentation. Multimedia Tools and Applications, 77(5):6117-6133[DOI:10.1007/s11042-017-4520-5]
Faktor A and Irani M. 2014. Video segmentation by non-local consensus voting//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press: #21[ DOI: 10.5244/C.28.21 http://dx.doi.org/10.5244/C.28.21 ]
Grundmann M, Kwatra V, Han M and Essa I. 2010. Efficient hierarchical graph-based video segmentation//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 2141-2148[ DOI: 10.1109/CVPR.2010.5539893 http://dx.doi.org/10.1109/CVPR.2010.5539893 ]
Jang W D, Lee C and Kim C S. 2016. Primary object segmentation in videos via alternate convex optimization of foreground and background distributions//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 696-704[ DOI: 10.1109/CVPR.2016.82 http://dx.doi.org/10.1109/CVPR.2016.82 ]
Keuper M, Andres B and Brox T. 2015. Motion trajectory segmentation via minimum cost multicuts//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3271-3279[ DOI: 10.1109/ICCV.2015.374 http://dx.doi.org/10.1109/ICCV.2015.374 ]
Koh Y J and Kim C S. 2017. Primary object segmentation in videos based on region augmentation and reduction//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii: IEEE: 7417-7425[ DOI: 10.1109/CVPR.2017.784 http://dx.doi.org/10.1109/CVPR.2017.784 ]
Maninis K K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D and Van Gool L. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6):1515-1530[DOI:10.1109/TPAMI.2018.2838670]
Märki N, Perazzi F, Wang O and Sorkine-Hornung A. 2016. Bilateral space video segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 743-751[ DOI: 10.1109/CVPR.2016.87 http://dx.doi.org/10.1109/CVPR.2016.87 ]
Ochs P, Malik J and Brox T. 2014. Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1187-1200[DOI:10.1109/TPAMI.2013.242]
Papazoglou A and Ferrari V. 2013. Fast object segmentation in unconstrained video//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1777-1784[ DOI: 10.1109/ICCV.2013.223 http://dx.doi.org/10.1109/ICCV.2013.223 ]
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M and Sorkine-Hornung A. 2016. A benchmark dataset and evaluation methodology for video object segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 724-732[ DOI: 10.1109/CVPR.2016.85 http://dx.doi.org/10.1109/CVPR.2016.85 ]
Perazzi F, Wang O, Gross M and Sorkine-Hornung A. 2015. Fully connected object proposals for video segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3227-3234[ DOI: 10.1109/ICCV.2015.369 http://dx.doi.org/10.1109/ICCV.2015.369 ]
Tsai Y H, Yang M H and Black M J. 2016. Video segmentation via object flow//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 3899-3908[ DOI: 10.1109/CVPR.2016.423 http://dx.doi.org/10.1109/CVPR.2016.423 ]
Wang B T, Fu Z H, Xiong H K and Zheng Y F. 2017. Transductive video segmentation on tree-structured model. IEEE Transactions on Circuits and Systems for Video Technology, 27(5):992-1005[DOI:10.1109/TCSVT.2016.2527378]
Wang W G, Shen J B and Porikli F. 2015. Saliency-aware geodesic video object segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3395-3402[ DOI: 10.1109/CVPR.2015.7298961 http://dx.doi.org/10.1109/CVPR.2015.7298961 ]
Wen L Y, Du D W, Lei Z, Li S Z and Yang M H. 2015. JOTS: joint online tracking and segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 2226-2234[ DOI: 10.1109/CVPR.2015.7298835 http://dx.doi.org/10.1109/CVPR.2015.7298835 ]
Xiao F Y and Lee Y J. 2016. Track and segment: an iterative unsupervised approach for video object proposals//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 933-942[ DOI: 10.1109/CVPR.2016.107 http://dx.doi.org/10.1109/CVPR.2016.107 ]
Xu C L and Corso J J. 2016. LIBSVX:a supervoxel library and benchmark for early video processing. International Journal of Computer Vision, 119(3):272-290[DOI:10.1007/s11263-016-0906-5]
Yang J, Price B, Shen X H, Lin Z and Yuan J S. 2016. Fast appearance modeling for automatic primary video object segmentation. IEEE Transactions on Image Processing, 25(2):503-515[DOI:10.1109/TIP.2015.2500820]
相关作者
相关机构
京公网安备11010802024621