特征一致性约束的视频目标分割

征煜; 陈亚当; 郝川艳

doi:10.11834/jig.190571

图像分析和识别 | 浏览量 : 0 下载量: 30 CSCD: 3

PDF
导出
分享
收藏
专辑

特征一致性约束的视频目标分割
Video object segmentation algorithm based on consistent features
2020年25卷第8期页码：1558-1566
收稿：2019-11-19，

修回：2020-3-10，

录用：2020-3-17，

纸质出版：2020-08-16
DOI： 10.11834/jig.190571
稿件说明：

移动端阅览

征煜, 陈亚当, 郝川艳. 特征一致性约束的视频目标分割[J]. 中国图象图形学报, 2020,25(8):1558-1566. DOI： 10.11834/jig.190571.

Yu Zheng, Yadang Chen, Chuanyan Hao. Video object segmentation algorithm based on consistent features[J]. Journal of Image and Graphics, 2020, 25(8): 1558-1566. DOI： 10.11834/jig.190571.

摘要

目的

视频目标分割是计算机视觉领域的一个重要方向，已有的一些方法在面对目标形状不规则、帧间运动存在干扰信息和运动速度过快等情况时，显得无能为力。针对以上不足，提出基于特征一致性的分割算法。

方法

本文分割算法框架是基于马尔可夫随机场(Markov random field，MRF)的图论方法。使用高斯混合模型，对预先给定的已标记区域分别进行颜色特征的建模，获得分割的数据项。结合颜色、光流方向等多种特征，建立时空平滑项。在此基础之上，加入基于特征一致性的能量约束项，以增强分割结果的外观一致性。这项添加的能量本身属于一种高阶能量约束，会显著增加能量优化的计算复杂度。为此，添加辅助结点，以解决能量的优化问题，从而提高算法速度。

结果

在DAVIS_2016(densely annotated video segmentation)数据集上对该算法进行评估与测试，并与最新的基于图论的方法进行对比分析，对比算法主要有HVS(efficient hierarchical graph-based video segmentation)、NLC(video segmentation by non-local consensus voting)、BVS(bilateral space video segmentation)和OFL(video segmentation via object flow)。本文算法的分割结果精度排在第2，比OFL算法略低1.6%；在算法的运行速度方面，本文算法领先于对比方法，尤其是OFL算法的近6倍。

结论

所提出的分割算法在MRF框架的基础之上融合了特征一致性的约束，在不增加额外计算复杂度的前提下，提高了分割精度，提升了算法运行速度。

Abstract

Objective

Video object segmentation is an important topic in the field of computer vision. However

the existing segmentation methods are unable to address some issues

such as irregular objects

noise optical flows

and fast movements. To this end

this paper proposes an effective and efficient algorithm that solves these issues based on feature consistency.

Method

The proposed segmentation algorithm framework is based on the graph theory method of Markov random field (MRF). First

the Gaussian mixture model (GMM) is applied to model the color features of pre-specified marked areas

and the segmented data items are obtained. Second

a spatiotemporal smoothing term is established by combining various characteristics

such as color and optical flow direction. The algorithm then adds energy constraints based on feature consistency to enhance the appearance consistency of the segmentation results. The added energy belongs to a higher-order energy constraint

thereby significantly increasing the computational complexity of energy optimization. The energy optimization problem is solved by adding auxiliary nodes to improve the speed of the algorithm. The higher-order constraint term comes from the idea of text classification

which is used in this paper to model the higher-order term of the segmentation equation. Each super-pixel point corresponds to a text

and the scale-invariant feature transform (SIFT) feature point in the super-pixel point is used as a word in the text. The higher-order term is modeled afterward via extraction and clustering. Given the running speed of the algorithm

auxiliary nodes are added to optimize the high-order term. The high-order term is approximated to the data and smoothing items

and then the graph cutting algorithm is used to complete the segmentation.

Result

The test data were taken from the DAVIS_2016(densely annotated video segmentation) dataset

which contains 50 sets of data

of which 30 and 20 are sets of training and verification data

respectively. This dataset has a resolution of 854×480 pixels. Given that many methods are based on MRF expansion

=0.3 and

=0.2 are empirically set in the proposed algorithm to maintain a capability balance among the data

smoothing

and feature consistency items. Similar to extant methods

the number of submodels used to establish the Gaussian mixture model for the front/background is set to 5

=0.1. This paper focuses on the verification and evaluation of the proposed feature consistency constraint terms and sets

=0 and

=0.2 to divide the videos under the constraint condition. The experimental results show that the IoU score with higher-order constraints is 10.2% higher than that without higher-order constraints. To demonstrate its effectiveness

the proposed method is compared with some other classical video segmentation algorithms based on graph theory. The experimental results highlight the competitive segmentation effect of the proposed algorithm. Meanwhile

the average IoU score reported in this paper is slightly lower than that of the video segmenfation via object flow(OFL) algorithm because the latter continuously iteratively optimizes the optical flow calculation results to achieve a relatively high segmentation accuracy. The proposed algorithm takes nearly 10 seconds on average to segment each frame

which is shorter than the running time of other algorithms. For instance

although the OFL algorithm reports a slightly higher accuracy

its average processing time for each frame is approximately 1 minute

which is 6 times longer than that of the proposed algorithm. In sum

the proposed algorithm can achieve the same segmentation effect with a much lower computational complexity than the OFL algorithm. However

the accuracy of its segmentation results is 1.6% lower than that of the results obtained by the OFL algorithm. Nevertheless

in terms of running speed

the proposed algorithm is ahead of other methods and is approximately 6 times faster than the OFL algorithm.

Conclusion

Experimental results show that when the current/background color is not clear enough

the foreground object and the background are often confused

thereby resulting in incorrect segmentation. However

when the global feature consistency constraint is added

the proposed algorithm can optimize the segmentation result of each frame by the feature statistics of the entire video. By using global information to optimize local information

the proposed segmentation method shows strong robustness to random noise

irregular motions

blurry backgrounds

and other problems in the video. According to the experimental results

the proposed algorithm spends most of its time in calculating the optical flow and can be replaced by a more efficient motion estimation algorithm in the future. However

compared with other segmentation algorithms

the proposed method shows great advantages in its performance. Based on the MRF framework

the proposed segmentation algorithm integrates the constraints of feature consistency and improves both segmentation accuracy and operation speed without increasing computational complexity. However

this method has several shortcomings. First

given that the proposed algorithm segments a video based on super pixels

the segmentation results depend on the segmentation accuracy of these super pixels. Second

the proposed high-order feature energy constraint has no obvious effect on feature-free regions because the SIFT feature points detected in similar regions will be greatly reduced

thereby creating super-pixel blocks that are unable to detect a sufficient number of feature points

which subsequently influences the global statistics of front/background features and prevents the proposed method from optimizing the segmentation results of feature-free regions. Similar to traditional methods

the optical flow creates a bottleneck in the performance of the proposed method. Therefore

additional efforts should be devoted in finding a highly efficient replacement strategy. As mentioned before

the method based on graph theory (including the proposed method) still lags behind the current end-to-end video segmentation methods based on convolutional neural network (CNN) in terms of segmentation accuracy. Future works should then attempt to combine these two approaches to benefit from their respective advantages.

关键词

Keywords

references

Bao L C, Wu B Y and Liu W. 2018. CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 5977-5986[ DOI: 10.1109/CVPR.2018.00626 http://dx.doi.org/10.1109/CVPR.2018.00626 ]

Boykov Y and Kolmogorov V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124-1137[DOI:10.1109/TPAMI.2004.60]

Chen Y D, Hao C Y, Liu A X and Wu E H. 2019a. Appearance-consistent video object segmentation based on a multinomial event model. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(2):1934-1945[DOI:10.1145/3321507]

Chen Y D, Hao C Y, Liu A X and Wu E H. 2019b. Multilevel model for video object segmentation based on supervision optimization. IEEE Transactions on Multimedia, 21(8):1934-1945[DOI:10.1109/TMM.2018.2890361]

Chen Y D, Hao C Y, Wu W and Wu E H. 2018. Efficient frame-sequential label propagation for video object segmentation. Multimedia Tools and Applications, 77(5):6117-6133[DOI:10.1007/s11042-017-4520-5]

Faktor A and Irani M. 2014. Video segmentation by non-local consensus voting//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press: #21[ DOI: 10.5244/C.28.21 http://dx.doi.org/10.5244/C.28.21 ]

Grundmann M, Kwatra V, Han M and Essa I. 2010. Efficient hierarchical graph-based video segmentation//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 2141-2148[ DOI: 10.1109/CVPR.2010.5539893 http://dx.doi.org/10.1109/CVPR.2010.5539893 ]

Jang W D, Lee C and Kim C S. 2016. Primary object segmentation in videos via alternate convex optimization of foreground and background distributions//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 696-704[ DOI: 10.1109/CVPR.2016.82 http://dx.doi.org/10.1109/CVPR.2016.82 ]

Keuper M, Andres B and Brox T. 2015. Motion trajectory segmentation via minimum cost multicuts//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3271-3279[ DOI: 10.1109/ICCV.2015.374 http://dx.doi.org/10.1109/ICCV.2015.374 ]

Koh Y J and Kim C S. 2017. Primary object segmentation in videos based on region augmentation and reduction//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii: IEEE: 7417-7425[ DOI: 10.1109/CVPR.2017.784 http://dx.doi.org/10.1109/CVPR.2017.784 ]

Maninis K K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D and Van Gool L. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6):1515-1530[DOI:10.1109/TPAMI.2018.2838670]

Märki N, Perazzi F, Wang O and Sorkine-Hornung A. 2016. Bilateral space video segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 743-751[ DOI: 10.1109/CVPR.2016.87 http://dx.doi.org/10.1109/CVPR.2016.87 ]

Ochs P, Malik J and Brox T. 2014. Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1187-1200[DOI:10.1109/TPAMI.2013.242]

Papazoglou A and Ferrari V. 2013. Fast object segmentation in unconstrained video//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1777-1784[ DOI: 10.1109/ICCV.2013.223 http://dx.doi.org/10.1109/ICCV.2013.223 ]

Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M and Sorkine-Hornung A. 2016. A benchmark dataset and evaluation methodology for video object segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 724-732[ DOI: 10.1109/CVPR.2016.85 http://dx.doi.org/10.1109/CVPR.2016.85 ]

Perazzi F, Wang O, Gross M and Sorkine-Hornung A. 2015. Fully connected object proposals for video segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3227-3234[ DOI: 10.1109/ICCV.2015.369 http://dx.doi.org/10.1109/ICCV.2015.369 ]

Tsai Y H, Yang M H and Black M J. 2016. Video segmentation via object flow//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 3899-3908[ DOI: 10.1109/CVPR.2016.423 http://dx.doi.org/10.1109/CVPR.2016.423 ]

Wang B T, Fu Z H, Xiong H K and Zheng Y F. 2017. Transductive video segmentation on tree-structured model. IEEE Transactions on Circuits and Systems for Video Technology, 27(5):992-1005[DOI:10.1109/TCSVT.2016.2527378]

Wang W G, Shen J B and Porikli F. 2015. Saliency-aware geodesic video object segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3395-3402[ DOI: 10.1109/CVPR.2015.7298961 http://dx.doi.org/10.1109/CVPR.2015.7298961 ]

Wen L Y, Du D W, Lei Z, Li S Z and Yang M H. 2015. JOTS: joint online tracking and segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 2226-2234[ DOI: 10.1109/CVPR.2015.7298835 http://dx.doi.org/10.1109/CVPR.2015.7298835 ]

Xiao F Y and Lee Y J. 2016. Track and segment: an iterative unsupervised approach for video object proposals//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 933-942[ DOI: 10.1109/CVPR.2016.107 http://dx.doi.org/10.1109/CVPR.2016.107 ]

Xu C L and Corso J J. 2016. LIBSVX:a supervoxel library and benchmark for early video processing. International Journal of Computer Vision, 119(3):272-290[DOI:10.1007/s11263-016-0906-5]

Yang J, Price B, Shen X H, Lin Z and Yuan J S. 2016. Fast appearance modeling for automatic primary video object segmentation. IEEE Transactions on Image Processing, 25(2):503-515[DOI:10.1109/TIP.2015.2500820]

文章被引用时，请邮件提醒。

提交

外载荷的B样条曲线变形

显著性边缘引导下的图像抽象化

利用平稳光流估计的海上视频去抖