融合深度模型和传统模型的显著性检测

方正; 曹铁勇; 洪施展; 项圣凯

doi:10.11834/jig.180073

图像理解和计算机视觉 | 浏览量 : 0 下载量: 4 CSCD: 3

PDF
导出
分享
收藏
专辑

融合深度模型和传统模型的显著性检测
Saliency detection via fusion of deep model and traditional model
2018年23卷第12期页码：1864-1873
收稿：2018-03-16，

修回：2018-6-14，

纸质出版：2018-12-16
DOI： 10.11834/jig.180073
稿件说明：

移动端阅览

方正, 曹铁勇, 洪施展, 项圣凯. 融合深度模型和传统模型的显著性检测[J]. 中国图象图形学报, 2018,23(12):1864-1873. DOI： 10.11834/jig.180073.

Zheng Fang, Tieyong Cao, Shizhan Hong, Shengkai Xiang. Saliency detection via fusion of deep model and traditional model[J]. Journal of Image and Graphics, 2018, 23(12): 1864-1873. DOI： 10.11834/jig.180073.

摘要

目的

显著性检测是图像和视觉领域一个基础问题，传统模型对于显著性物体的边界保留较好，但是对显著性目标的自信度不够高，召回率低，而深度学习模型对于显著性物体的自信度高，但是其结果边界粗糙，准确率较低。针对这两种模型各自的优缺点，提出一种显著性模型以综合利用两种方法的优点并抑制各自的不足。

方法

首先改进最新的密集卷积网络，训练了一个基于该网络的全卷积网络（FCN）显著性模型，同时选取一个现有的基于超像素的显著性回归模型，在得到两种模型的显著性结果图后，提出一种融合算法，融合两种方法的结果以得到最终优化结果，该算法通过显著性结果Hadamard积和像素间显著性值的一对一非线性映射，将FCN结果与传统模型的结果相融合。

结果

实验在4个数据集上与最新的10种方法进行了比较，在HKU-IS数据集中，相比于性能第2的模型，

$$\text{F}$$

值提高了2.6%；在MSRA数据集中，相比于性能第2的模型，

$$\text{F}$$

值提高了2.2%，MAE降低了5.6%；在DUT-OMRON数据集中，相比于性能第2的模型，

$$\text{F}$$

值提高了5.6%，MAE降低了17.4%。同时也在MSRA数据集中进行了对比实验以验证融合算法的有效性，对比实验结果表明提出的融合算法改善了显著性检测的效果。

结论

本文所提出的显著性模型，综合了传统模型和深度学习模型的优点，使显著性检测结果更加准确。

Abstract

Objective

Saliency detection is a fundamental problem in computer vision and image processing

which aims to identify the most conspicuous objects or regions in an image. Saliency detection has been widely used in several visual applications

including object retargeting

scene classification

visual tracking

image retrieval

and semantic segmentation. In most traditional approaches

salient objects are derived based on the extracted features from pixels or regions. Final saliency maps consist of these regions with their saliency scores. The performance of these models rely on the segmentation methods and the selection of features. These approaches cannot produce satisfactory results when images with multiple salient objects or low-contrast contents are encountered. Traditional approaches preserve the boundaries well but with insufficient confidence of salient objects

which yield low recall rates. Convolution neural networks (CNNs) have been introduced in pixel-wise prediction problems

such as saliency detection

due to their outstanding performance in image classification tasks. CNNs redefine the saliency problem as a labeling problem where the feature selection between salient and non-salient objects is automatically performed through gradient descent. A CNN cannot be directly used to train a saliency model

and a CNN can be utilized in saliency detection by extracting a square patch around each pixel and by using the patch to predict the center pixel's class. Patches are frequently obtained from different resolutions of the input image to capture global information. Another method is the addition of up-sampled layers in the CNN. A modified CNN is called fully connected network (FCN)

which is first proposed for semantic segmentation. Most saliency detection CNN models use FCN to capture considerable global and local information. FCN is a popular model that modifies the CNN to fit dense prediction problem

which replaces the SoftMax and fully connected layers in the CNN into convolution and deconvolution layers. Compared with traditional methods

FCNs can accurately locate salient objects and yield their high confidence. However

the boundaries of salient objects are coarse and their precision rates are lower than the traditional approaches due to the down-sampling structure in FCNs. To deal with the limitations of the 2 kinds of saliency models

we proposed a novel composite saliency model that combines the advantages and restrains the drawbacks of two saliency models.

Method

In this study

a new FCN based on dense convolutional network (DenseNet) is built. For saliency detection

we replace the fully connected layer and final pooling layer into a 1×1 kernel size convolution layer and a deconvolution layer. A sigmoid layer is applied to obtain the saliency maps. In the training process

the saliency network end with a squared Euclidean loss layer for saliency regression. We fine-tune the pre-trained DenseNet-161 to train our saliency model. Our training set consists of 3 900 images that are randomly selected from 5 saliency public dataset

namely

ECSSD

SOD

HKU-IS

MSRA

and ICOSEG. Our saliency network is implemented in Caffe toolbox. The input images and ground-truth maps are resized to 500×500 for training

the momentum parameter is set to 0.99

the learning rate is set to 10

-10

and the weight decay is 0.000 5. The SGD learning procedure is accelerated using a NVIDIA GTX TITAN X GPU device

which takes approximately one day in 200 000 iterations. Then

we use a traditional saliency model. The selected model adopts multi-level segmentation to produce several segmentations of an image

where each superpixel is represented by a feature vector that contains different kinds of image features. A random forest is trained by those feature vectors to derive saliency maps. On the basis of the 2 models

we propose a fusion algorithm that combines the advantages of traditional approaches and deep learning methods. For an image

15 segmentations of the image are produced

and the saliency maps of all segmentations are derived by the random forest. Then

we use FCN to produce another type of saliency map of the image. The fusion algorithm applies the Hadamard produc

t on the 2 types of saliency maps

and the initial fusion result is obtained by averaging the Hadamard product results. Then

an adaptive threshold is used to fuse the initial fusion and FCN results by using a pixel-to-pixel map to obtain the final fusion result.

Result

We compared our model with 10 state-of-the-art saliency models

including the traditional approaches and deep learning methods on 4 public datasets

namely

DUT-OMRON

ECSSD

HKU-IS

and MSRA. The quantitative evaluation metrics contained F-measure

mean square error (MAE)

and PR curves

and we provided several saliency maps of each method for comparison. The experiment results show that our model outperforms all other methods in HKU-IS

MSRA

and DUT-OMRON datasets. The saliency maps showed that our model can produce refined results. We compared the performance of random forest

FCN

and final fusion results in verify the effectiveness of our fusion algorithm. Comparative experiments demonstrated that the fusion algorithm improves saliency detection. Compared with the random forest results in ECSSD

HKU-IS

MSRA

and DUT-OMRON

the F-measure (higher is better) increased by 6.2%

15.6%

5.7%

and 16.6% and MAE (i.e.

less is better) decreased by 17.4%

43.9%

33.3%

and 24.5% respectively. Compared with the FCN results in ECSSD

HKU-IS

MSRA

and DUT-OMRON

the F-measure increased by 2.2%

4.1%

5.7%

and 11.3%

respectively

and MAE decreased by 0.6%

10.7%

and 18.4% in ECSSD

MSRA

and DUT-OMRON

respectively. In addition

we conducted a series of comparative experiments in MSRA to clearly show the effectiveness of different steps of the fusion algorithm.

Conclusion

In this study

we proposed a composite saliency model that contains an FCN and a traditional model and a fusion algorithm to fuse 2 kinds of saliency maps. The experiment results show that our model outperforms several state-of-the-art saliency approaches and the fusion algorithm improves the performance.

关键词

Keywords

references

Ding Y Y, Xiao J, Yu J Y. Importance filtering for image retargeting[C ] //Proceedings of CVPR 2011. Colorado Springs: IEEE, 2011: 89-96.[ DOI: 10.1109/CVPR.2011.5995445 http://dx.doi.org/10.1109/CVPR.2011.5995445 ]

Siagian C, Itti L. Rapid biologically-inspired scene classification using features shared with visual attention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(2):300-312.[DOI:10.1109/TPAMI.2007.40]

Borji A, Frintrop S, Sihite D N, et al. Adaptive object tracking by learning background context[C ] //Proceedings of 2012 IEEE Computer Soc iety Conference on Computer Vision and Pattern Recognition Workshops. Providence: IEEE, 2012: 23-30.[ DOI: 10.1109/CVPRW.2012.6239191 http://dx.doi.org/10.1109/CVPRW.2012.6239191 ]

He J F, Feng J Y, Liu X L, et al. Mobile product search with bag of hash bits and boundary reranking[C ] //Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3005-3012.[ DOI: 10.1109/CVPR.2012.6248030 http://dx.doi.org/10.1109/CVPR.2012.6248030 ]

Donoser M, Urschler M, Hirzer M, et al. Saliency driven total variation segmentation[C ] //Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 817-824.[ DOI: 10.1109/ICCV.2009.5459296 http://dx.doi.org/10.1109/ICCV.2009.5459296 ]

Perazzi F, Krähenbühl P, Pritch Y, et al. Saliency filters: contrast based filtering for salient region detection[C ] //Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 733-740.[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]

Jiang H Z, Wang J D, Yuan Z J, et al. Salient object detection: a discriminative regional feature integration approach[C ] //Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2083-2090.[ DOI: 10.1109/CVPR.2013.271 http://dx.doi.org/10.1109/CVPR.2013.271 ]

Tong N, Lu H C, Zhang L H, et al. Saliency detection with multi-scale superpixels[J]. IEEE Signal Processing Letters, 2014, 21(9):1035-1039.[DOI:10.1109/LSP.2014.2323407]

Kim J, Han D, Tai Y W, et al. Salient region detection via high-dimensional color transform[C ] //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 883-890.[ DOI: 10.1109/CVPR.2014.118 http://dx.doi.org/10.1109/CVPR.2014.118 ]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv: 1409.1556, 2014.

He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Farabet C, Couprie C, Najman L, et al. Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1915-1929.[DOI:10.1109/TPAMI.2012.231]

Zhao R, Ouyang W L, Li H S, et al. Saliency detection by multi-context deep learning[C ] //Proceedings of 2015IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1265-1274.[ DOI: 10.1109/CVPR.2015.7298731 http://dx.doi.org/10.1109/CVPR.2015.7298731 ]

Li G, Yu Y. Visual saliency detection based on multiscale deep CNN features[J]. IEEE Transactions on Image Processing, 2016, 25(11):5012-5024.

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C ] //Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Li G B, Yu Y Z. Deep contrast learning for salient object detection[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 478-487.[ DOI: 10.1109/CVPR.2016.58 http://dx.doi.org/10.1109/CVPR.2016.58 ]

Luo Z M, Mishra A, Achkar A, et al. Non-local deep features for salient object detection[C ] //Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.: IEEE, 2017: 6593-6601.[ DOI: 10.1109/CVPR.2017.698 http://dx.doi.org/10.1109/CVPR.2017.698 ]

Zhang P P, Wang D, Lu H C, et al. Amulet: aggregating multi-level convolutional features for salient object detection[C ] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 202-211.[ DOI: 10.1109/ICCV.2017.31 http://dx.doi.org/10.1109/ICCV.2017.31 ]

Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C ] //Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

J. Shi, Q. Yan, L. Xu, and J. Jia, "Hierarchical image saliency detection on extended cssd, " IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 4, pp. 717-729, 2016.

Movahedi V, Elder J H. Design and perceptual validation of performance measures for salient object segmentation[C ] //Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco: IEEE, 2010: 49-56.[ DOI: 10.1109/CVPRW.2010.5543739 http://dx.doi.org/10.1109/CVPRW.2010.5543739 ]

Liu T, Yuan Z J, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2):353-367.[DOI:10.1109/TPAMI.2010.70]

Batra D, Kowdle A, Parikh D, et al. iCoseg: interactive co-segmentation with intelligent scribble guidance[C ] //Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 3169-3176.[ DOI: 10.1109/CVPR.2010.5540080 http://dx.doi.org/10.1109/CVPR.2010.5540080 ]

Yan Q, Xu L, Shi J P, et al. Hierarchical saliency detection[C ] //Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 1155-1162.[ DOI: 10.1109/CVPR.2013.153 http://dx.doi.org/10.1109/CVPR.2013.153 ]

Zhou X F, Liu Z, Sun G L, et al. Improving saliency detection via multiple kernel boosting and adaptive fusion[J]. IEEE Signal Processing Letters, 2016, 23(4):517-521.[DOI:10.1109/LSP.2016.2536743]

Li H Y, Lu H C, Lin Z, et al. Inner and inter label propagation:salient object detection in the wild[J]. IEEE Transactions on Image Processing,2015, 24(10):3176-3186.[DOI:10.1109/TIP.2015.2440174]

Li C Y, Yuan Y C, Cai W D, et al. Robust saliency detection via regularized random walks ranking[C ] //Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 2710-2717.[ DOI: 10.1109/CVPR.2015.7298887 http://dx.doi.org/10.1109/CVPR.2015.7298887 ]

Yang K F, Li H, Li C Y, et al. A unified framework for salient structure detection by contour-guided visual search[J]. IEEE Transactions on Image Processing, 2016, 25(8):3475-3488.[DOI:10.1109/TIP.2016.2572600]

Zhang P P, Wang D, Lu H C, et al. Learning uncertain convolutional features for accurate saliency detection[C ] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 212-221.[ DOI: 10.1109/ICCV.2017.32 http://dx.doi.org/10.1109/ICCV.2017.32 ]