融合双目多维感知特征的立体视频显著性检测
Incorporation of multi-dimensional binocular perceptual characteristics to detect stereoscopic video saliency
- 2017年22卷第3期 页码:305-314
网络出版:2017-03-01,
纸质出版:2017
DOI: 10.11834/jig.20170304
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2017-03-01,
纸质出版:2017
移动端阅览
立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。
Stereoscopic three-dimensional (3D) video services
which aim to provide realistic and immersive experiences
have gained considerable acceptance and interest. Visual saliency detection can automatically predict
locate
and identify important visual information
as well as help machines to effectively filter valuable information from high-volume multimedia data. Saliency detection models are widely studied for static or dynamic 2D scenes. However
the saliency problem of stereoscopic 3D videos has received less attention. Moreover
few studies are related to dynamic 3D scenes. Given that 3D characteristics
such as depth and visual fatigue
affect the visual attention of humans
the saliency models of static or dynamic 2D scenes are not directly applicable for 3D scenes. To address the gap in the literature
we propose a novel model for 3D salient region detection in stereoscopic videos. The model utilizes multi-dimensional
perceptual
and binocular characteristics. The proposed model computes the visual salient region for stereoscopic videos from spatial
depth
and temporal domains of stereoscopic videos. The proposed algorithm is partitioned into four blocks:the measures of spatial
depth
temporal (motion) saliency
and fusion of the three conspicuity maps. In the spatial saliency module
the algorithm considers the spatial saliency in each frame of videos as a visual attention dimension. The Bayesian probabilistic framework is adopted to calculate the 2D static conspicuity map. The spatial saliency in the framework emerges naturally as self-information of visual features. These visual features are obtained from the spatial natural statistics of each stereoscopic 3D video frame rather than from a single test frame. In the depth saliency module
the algorithm considers depth as an additional visual attention dimension. Depth signals have specific characteristics that differ from those of natural signals. Therefore
the measure of depth saliency is derived from depth-perception characteristics. The model extracts the foreground saliency from a disparity map
which is combined with depth contrast to generate a depth conspicuity map. In the motion (temporal) saliency module
the algorithm considers motion as another visual dimension. The optical flow algorithm is applied to acquire the inter-frame motion information between adjacent frames. To reduce the computational complexity of optical flow algorithms
the model first extracts the salient region of the current frame in accordance with the previously obtained spatial conspicuity map and depth conspicuity map. The Lucas-Kanade optical flow algorithm is adopted to calculate the motion characteristics between local salient regions of adjacent frames
and the motion conspicuity map is produced by the regional motion vector map. In the fusion step
a new pooling approach is developed to combine the three conspicuity maps to obtain the final saliency map for stereoscopic 3D videos. This fusion approach is based on the principle that human visual systems simultaneously focus on a unique salient region and divert attention to several salient regions in a saliency map. To generate the final saliency maps of stereoscopic videos
the proposed approach replaces the conventional average weighted sum for the fusion of different features and uses a fusion method that is based on global-local difference. We evaluated the proposed scheme for stereoscopic video sequences with various scenarios. Moreover
we compared the proposed model with five other state-of-the-art saliency detection models. The experimental results indicated that the proposed model is efficient
effective
and has superior precision and recall with an 80% precision and 72% recall rate. The proposed model demonstrated its efficiency and effectiveness in saliency detection for stereoscopic videos. The model can be applied to stereoscopic videos or image coding
stereoscopic videos or image quality assessment
and object detection and recognition.
相关作者
相关机构
京公网安备11010802024621