稀疏深度特征对传统显著性检测的优化

洪施展; 曹铁勇; 方正; 项圣凯

doi:10.11834/jig.180626

图像分析和识别 | 浏览量 : 0 下载量: 4 CSCD: 0

PDF
导出
分享
收藏
专辑

稀疏深度特征对传统显著性检测的优化
Optimization of traditional saliency detection by sparse depth features
2019年24卷第9期页码：1493-1503
收稿：2018-11-19，

修回：2019-3-13，

纸质出版：2019-09-16
DOI： 10.11834/jig.180626
稿件说明：

移动端阅览

洪施展, 曹铁勇, 方正, 项圣凯. 稀疏深度特征对传统显著性检测的优化[J]. 中国图象图形学报, 2019,24(9):1493-1503. DOI： 10.11834/jig.180626.

Shizhan Hong, Tieyong Cao, Zheng Fang, Shengkai Xiang. Optimization of traditional saliency detection by sparse depth features[J]. Journal of Image and Graphics, 2019, 24(9): 1493-1503. DOI： 10.11834/jig.180626.

摘要

目的

显著性目标检测算法主要分为基于低级特征的传统方法和基于深度学习的新方法，传统方法难以捕获对象的高级语义信息，基于深度学习的新方法能捕获高级语义信息却忽略了边缘特征。为了充分发挥两种方法的优势，基于将二者结合的思路，本文利用稀疏能使得显著性对象指向性凝聚的优势，提出了一种基于稀疏自编码和显著性结果优化的方法。

方法

对VGG（visual geometry group）网络第4个池化层的特征图进行稀疏自编码处理，得到5张稀疏显著性特征图，再与传统方法得到的显著图一起输入卷积神经网络进行显著性结果优化。

结果

使用DRFI（discriminative regional feature integration）、HDCT（high dimensional color transform）、RRWR（regularized random walks ranking）和CGVS（contour-guided visual search）等传统方法在DUT-OMRON、ECSSD、HKU-IS和MSRA等公开数据集上进行实验，表明本文算法有效改善了显著性对象的F值和MAE（mean absolute error）值。在F值提高方面，优化后的DRFI方法提升最高，在HKU-IS数据集上提高了24.53%。在MAE值降低方面，CGVS方法降低最少，在ECSSD数据集上降低了12.78%，降低最多的接近50%。而且本模型结构简单，参数少，计算效率高，训练时间约5 h，图像的平均测试时间约为3 s，有很强的实际应用性。

结论

本文提出了一种显著性结果优化算法，实验结果表明算法有效改善了显著性对象F值和MAE值，在对显著性对象检测要求越来越准确的对象识别等任务中有较好的适应性和应用性前景。

Abstract

Objective

Saliency detection

as a preprocessing component of computer vision

has received increasing attention in the areas of object relocation

scene classification

semantic segmentation

and visual tracking. Although object detection has been greatly developed

it remains challenging because of a series of realistic factors

such as background complexity and attention mechanism. In the past

many significant target detection methods have been developed. These methods are mainly divided into traditional methods and new methods based on deep learning. The traditional approach is to find significant targets through low-level manual features

such as contrast

color

and texture. These general techniques are proven effective in maintaining image structure and reducing computational effort. However

these low-level features cause difficulty in capturing high-level semantic knowledge about objects and their surroundings. Therefore

these low-level feature-based methods do not achieve excellent results when salient objects are stripped from the stacked background. The saliency detection method based on deep learning mainly seeks significant targets by automatically extracting advanced features. However

most of these advanced models focus on the nonlinear combination of advanced features extracted from the final convolutional layer. The boundaries of salient objects are often extremely blurry due to the lack of low-level visual information such as edges. In these jobs

convolutional neural network (CNN) features are applied directly to the model without any processing. The features extracted from the CNN are generally high in dimension and contain a large amount of noise

thereby reducing the utilization efficiency of CNN features and revealing an opposite effect. Sparse methods can effectively aggregate the salient objects in a feature map and eliminate some of the noise interference. Sparse self-encoding is a sparse method. A traditional saliency recognition method based on sparse self-encoding and image fusion

combined with background prior and contrast analysis and VGG (visual geometry group) saliency calculation

is proposed to solve these problems.

Method

The proposed algorithm is mainly composed of the following:traditional saliency map extraction

VGG feature extraction

sparse self-encoding

and saliency result optimization. The traditional method to be improved is selected

and the corresponding saliency map is calculated. In this experiment

we select four traditional methods with excellent results

namely

discriminative regional feature integration (DRFI)

high-dimensional color transform (HDCT)

regularized random walks ranking (RRWR)

and contour-guided visual search (CGVS). Then

the VGG network is used to extract feature maps. The feature maps obtained by each pooled layer are sparsely self-encoded to obtain 25 sparse saliency feature maps. When a feature map is selected

excessive edge information and texture information are retained because the features extracted by the first three pooling layers are mainly low-level features

indicating duplicate effects with feature maps obtained by the conventional method; thus

the feature maps from low-level are not used. The comparison between the fourth and fifth feature maps shows that the feature information of the fifth pooling layer is excessively lost. After experimental verification

the fifth layer characteristic map exerts an interference effect. Thus

we use the feature map extracted from the fourth pooling layer. Then

these feature maps are placed into the sparse self-encoder to perform the sparse operation to obtain five feature maps. Each feature map is integrated with the corresponding saliency map obtained in the previous volume. Finally

the neural network performs the operation and calculates the final saliency map.

Result

Our experiments involved four open datasets:DUT-OMRON

ECSSD

HKU-IS

and MSRA. Then

we obtained half of the images from the four datasets used in the experiment to form a training set and the remaining four test sets. The results obtained can be extremely credible. The following conclusions are drawn from the experiment. 1) The proposed model greatly improves the F value in the four datasets of the four methods

including an increase of 24.53% in the HKU-IS dataset of the DRFI method. 2) The MAE (mean absolute error) value has also been greatly reduced

the least of which is reduced by 12.78% for the ECSSD dataset of the CGVS method and the highest of which is reduced by nearly 50%. 3) The proposed model network has few layers

few parameters

and short calculation time. The training time is approximately 2 h

and the average test time of the image is approximately 0.2 s. On the contrary

Liu chooses an image saliency optimization scheme using adaptive fusion. The training time is approximately 47 h

and the average test time of the image is 56.95 s. The proposed model greatly improves the computational efficiency. 4) The proposed model achieves a significant improvement for the four datasets

especially the HKU-IS and MSRA datasets. These datasets contain difficult images

thereby confirming the effectiveness of the proposed method.

Conclusion

A low-level feature map based on traditional models

such as a texture and high-level feature map of a sparsely self-encoded VGG network

is proposed to optimize saliency results and greatly improve saliency target recognition. The traditional methods based on DRFI

HDCT

RRWR

and CGVS are tested in the publicly significant object detection datasets DUT-OMRON

ECSSD

HKU-IS

and MSRA

respectively. The obtained F value and MAE value are significantly improved

thereby confirming the effectiveness of the proposed method. Moreover

the method steps and network structure are simple and easy to understand

the training takes little time

and popular promotion can be easily obtained. The limitation of the study is that some of the extracted feature maps are missing. In practice

only the fourth layer of VGG maps is selected

and not all useful information is fully utilized.

关键词

Keywords

references

Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11):1254-1259.[DOI:10.1109/34.730558]

Schälkopf B, Platt J, Hofmann T. Graph-based visual saliency[C]//Proceedings of Neural Information Processing Systems. Canada: MIT Press, 2007: 545-552.

Perazzi F, Krähenbühl P, Pritch Y, et al. Saliency filters: contrast based filtering for salient region detection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012: 733-740.[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]

Li Y, Hou X D, Koch C, et al. The secrets of salient object segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 280-287.[ DOI: 10.1109/CVPR.2014.43 http://dx.doi.org/10.1109/CVPR.2014.43 ]

Borji A. What is a salient object? A dataset and a baseline model for salient object detection[J]. IEEE Transactions on Image Processing, 2015, 24(2):742-756.[DOI:10.1109/TIP.2014.2383320]

Liu L, Kuang G Y. Overview of image textural feature extraction methods[J]. Journal of Image and Graphics, 2009, 14(4):622-635.

刘丽, 匡纲要.图像纹理特征提取方法综述[J].中国图象图形学报, 2009, 14(4):622-635.][DOI:10.11834/jig.20090409]

Li G B, Yu Y Z. Visual saliency detection based on multiscale deep CNN features[J]. IEEE Transactions on Image Processing, 2016, 25(11):5012-5024.[DOI:10.1109/TIP.2016.2602079]

Zhao R, Ouyang W L, Li H S, et al. Saliency detection by multi-context deep learning[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 1265-1274.[ DOI: 10.1109/CVPR.2015.7298731 http://dx.doi.org/10.1109/CVPR.2015.7298731 ]

Fang Z, Cao T Y, Hong S Z, et al. Saliency detection via fusion of deep model and traditional model[J]. Journal of Image and Graphics, 2018, 23(12):1864-1873.

方正, 曹铁勇, 洪施展, 等.融合深度模型和传统模型的显著性检测[J].中国图象图形学报, 2018, 23(12):1864-1873.][DOI:10.11834/jig.180073]

Lee G, Tai Y W, Kim J. Deep saliency with encoded low level distance map and high level features[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 660-668.[ DOI: 10.1109/CVPR.2016.78 http://dx.doi.org/10.1109/CVPR.2016.78 ]

Liu N, Han J W. Dhsnet: deep hierarchical saliency network for salient object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 678-686.[ DOI: 10.1109/CVPR.2016.80 http://dx.doi.org/10.1109/CVPR.2016.80 ]

Zhang W, Jiang G Y, Wang Z F, et al. Research on image multiple description coding[J]. Journal of Image and Graphics, 2004, 9(3):257-264.

张炜, 蒋刚毅, 汪增福, 等.图像信号的多描述编码方法[J].中国图象图形学报, 2004, 9(3):257-264.][DOI:10.11834/jig.20040347]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-11-18] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf .

Yang K F, Li H, Li C Y, et al. A unified framework for salient structure detection by contour-guided visual search[J]. IEEE Transactions on Image Processing, 2016, 25(8):3475-3488.[DOI:10.1109/TIP.2016.2572600]

Jiang H Z, Wang J D, Yuan Z J, et al. Salient object detection: a discriminative regional feature integration approach[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 2083-2090.[ DOI:10.1109/CVPR.2013.271 http://dx.doi.org/10.1109/CVPR.2013.271 ]

Kim J, Han D, Tai Y W, et al. Salient region detection via high-dimensional color transform[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 883-890.[ DOI:10.1109/CVPR.2014.118 http://dx.doi.org/10.1109/CVPR.2014.118 ]

Li C Y, Yuan Y C, Cai W D, et al. Robust saliency detection via regularized random walks ranking[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 2710-2717.[ DOI: 10.1109/CVPR.2015.7298887 http://dx.doi.org/10.1109/CVPR.2015.7298887 ]

Liu T, Yuan Z J, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2):353-367.[DOI:10.1109/TPAMI.2010.70]

Khuwuthyakorn P, Robles-Kelly A, Zhou J. Object of interest detection by saliency learning[C]//Proceedings of the 11th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag, 2010: 636-649.[ DOI: 10.1007/978-3-642-15552-9_46 http://dx.doi.org/10.1007/978-3-642-15552-9_46 ]

Yang J M, Yang M H. Top-down visual saliency via joint CRF and dictionary learning[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 201 2: 2296-2303.[ DOI: 10.1109/CVPR.2012.6247940 http://dx.doi.org/10.1109/CVPR.2012.6247940 ]

Tong N, Lu H C, Ruan X, et al. Salient object detection via bootstrap learning[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 1884-1892.[ DOI: 10.1109/CVPR.2015.7298798 http://dx.doi.org/10.1109/CVPR.2015.7298798 ]

Yan Q, Xu L, Shi J P, et al. Hierarchical saliency detection[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 1155-1162.[ DOI: 10.1109/CVPR.2013.153 http://dx.doi.org/10.1109/CVPR.2013.153 ]

Zhou X F, Liu Z, Sun G L, et al. Improving saliency detection via multiple kernel boosting and adaptive fusion[J]. IEEE Signal Processing Letters, 2016, 23(4):517-521.[DOI:10.1109/LSP.2016.2536743]