融合深度模型和传统模型的显著性检测
Saliency detection via fusion of deep model and traditional model
- 2018年23卷第12期 页码:1864-1873
收稿:2018-03-16,
修回:2018-6-14,
纸质出版:2018-12-16
DOI: 10.11834/jig.180073
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-03-16,
修回:2018-6-14,
纸质出版:2018-12-16
移动端阅览
目的
2
显著性检测是图像和视觉领域一个基础问题,传统模型对于显著性物体的边界保留较好,但是对显著性目标的自信度不够高,召回率低,而深度学习模型对于显著性物体的自信度高,但是其结果边界粗糙,准确率较低。针对这两种模型各自的优缺点,提出一种显著性模型以综合利用两种方法的优点并抑制各自的不足。
方法
2
首先改进最新的密集卷积网络,训练了一个基于该网络的全卷积网络(FCN)显著性模型,同时选取一个现有的基于超像素的显著性回归模型,在得到两种模型的显著性结果图后,提出一种融合算法,融合两种方法的结果以得到最终优化结果,该算法通过显著性结果Hadamard积和像素间显著性值的一对一非线性映射,将FCN结果与传统模型的结果相融合。
结果
2
实验在4个数据集上与最新的10种方法进行了比较,在HKU-IS数据集中,相比于性能第2的模型,
$$\text{F}$$
值提高了2.6%;在MSRA数据集中,相比于性能第2的模型,
$$\text{F}$$
值提高了2.2%,MAE降低了5.6%;在DUT-OMRON数据集中,相比于性能第2的模型,
$$\text{F}$$
值提高了5.6%,MAE降低了17.4%。同时也在MSRA数据集中进行了对比实验以验证融合算法的有效性,对比实验结果表明提出的融合算法改善了显著性检测的效果。
结论
2
本文所提出的显著性模型,综合了传统模型和深度学习模型的优点,使显著性检测结果更加准确。
Objective
2
Saliency detection is a fundamental problem in computer vision and image processing
which aims to identify the most conspicuous objects or regions in an image. Saliency detection has been widely used in several visual applications
including object retargeting
scene classification
visual tracking
image retrieval
and semantic segmentation. In most traditional approaches
salient objects are derived based on the extracted features from pixels or regions. Final saliency maps consist of these regions with their saliency scores. The performance of these models rely on the segmentation methods and the selection of features. These approaches cannot produce satisfactory results when images with multiple salient objects or low-contrast contents are encountered. Traditional approaches preserve the boundaries well but with insufficient confidence of salient objects
which yield low recall rates. Convolution neural networks (CNNs) have been introduced in pixel-wise prediction problems
such as saliency detection
due to their outstanding performance in image classification tasks. CNNs redefine the saliency problem as a labeling problem where the feature selection between salient and non-salient objects is automatically performed through gradient descent. A CNN cannot be directly used to train a saliency model
and a CNN can be utilized in saliency detection by extracting a square patch around each pixel and by using the patch to predict the center pixel's class. Patches are frequently obtained from different resolutions of the input image to capture global information. Another method is the addition of up-sampled layers in the CNN. A modified CNN is called fully connected network (FCN)
which is first proposed for semantic segmentation. Most saliency detection CNN models use FCN to capture considerable global and local information. FCN is a popular model that modifies the CNN to fit dense prediction problem
which replaces the SoftMax and fully connected layers in the CNN into convolution and deconvolution layers. Compared with traditional methods
FCNs can accurately locate salient objects and yield their high confidence. However
the boundaries of salient objects are coarse and their precision rates are lower than the traditional approaches due to the down-sampling structure in FCNs. To deal with the limitations of the 2 kinds of saliency models
we proposed a novel composite saliency model that combines the advantages and restrains the drawbacks of two saliency models.
Method
2
In this study
a new FCN based on dense convolutional network (DenseNet) is built. For saliency detection
we replace the fully connected layer and final pooling layer into a 1×1 kernel size convolution layer and a deconvolution layer. A sigmoid layer is applied to obtain the saliency maps. In the training process
the saliency network end with a squared Euclidean loss layer for saliency regression. We fine-tune the pre-trained DenseNet-161 to train our saliency model. Our training set consists of 3 900 images that are randomly selected from 5 saliency public dataset
namely
ECSSD
SOD
HKU-IS
MSRA
and ICOSEG. Our saliency network is implemented in Caffe toolbox. The input images and ground-truth maps are resized to 500×500 for training
the momentum parameter is set to 0.99
the learning rate is set to 10
-10
and the weight decay is 0.000 5. The SGD learning procedure is accelerated using a NVIDIA GTX TITAN X GPU device
which takes approximately one day in 200 000 iterations. Then
we use a traditional saliency model. The selected model adopts multi-level segmentation to produce several segmentations of an image
where each superpixel is represented by a feature vector that contains different kinds of image features. A random forest is trained by those feature vectors to derive saliency maps. On the basis of the 2 models
we propose a fusion algorithm that combines the advantages of traditional approaches and deep learning methods. For an image
15 segmentations of the image are produced
and the saliency maps of all segmentations are derived by the random forest. Then
we use FCN to produce another type of saliency map of the image. The fusion algorithm applies the Hadamard produc
t on the 2 types of saliency maps
and the initial fusion result is obtained by averaging the Hadamard product results. Then
an adaptive threshold is used to fuse the initial fusion and FCN results by using a pixel-to-pixel map to obtain the final fusion result.
Result
2
We compared our model with 10 state-of-the-art saliency models
including the traditional approaches and deep learning methods on 4 public datasets
namely
DUT-OMRON
ECSSD
HKU-IS
and MSRA. The quantitative evaluation metrics contained F-measure
mean square error (MAE)
and PR curves
and we provided several saliency maps of each method for comparison. The experiment results show that our model outperforms all other methods in HKU-IS
MSRA
and DUT-OMRON datasets. The saliency maps showed that our model can produce refined results. We compared the performance of random forest
FCN
and final fusion results in verify the effectiveness of our fusion algorithm. Comparative experiments demonstrated that the fusion algorithm improves saliency detection. Compared with the random forest results in ECSSD
HKU-IS
MSRA
and DUT-OMRON
the F-measure (higher is better) increased by 6.2%
15.6%
5.7%
and 16.6% and MAE (i.e.
less is better) decreased by 17.4%
43.9%
33.3%
and 24.5% respectively. Compared with the FCN results in ECSSD
HKU-IS
MSRA
and DUT-OMRON
the F-measure increased by 2.2%
4.1%
5.7%
and 11.3%
respectively
and MAE decreased by 0.6%
10.7%
and 18.4% in ECSSD
MSRA
and DUT-OMRON
respectively. In addition
we conducted a series of comparative experiments in MSRA to clearly show the effectiveness of different steps of the fusion algorithm.
Conclusion
2
In this study
we proposed a composite saliency model that contains an FCN and a traditional model and a fusion algorithm to fuse 2 kinds of saliency maps. The experiment results show that our model outperforms several state-of-the-art saliency approaches and the fusion algorithm improves the performance.
Ding Y Y, Xiao J, Yu J Y. Importance filtering for image retargeting[C ] //Proceedings of CVPR 2011. Colorado Springs: IEEE, 2011: 89-96.[ DOI: 10.1109/CVPR.2011.5995445 http://dx.doi.org/10.1109/CVPR.2011.5995445 ]
Siagian C, Itti L. Rapid biologically-inspired scene classification using features shared with visual attention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(2):300-312.[DOI:10.1109/TPAMI.2007.40]
Borji A, Frintrop S, Sihite D N, et al. Adaptive object tracking by learning background context[C ] //Proceedings of 2012 IEEE Computer Soc iety Conference on Computer Vision and Pattern Recognition Workshops. Providence: IEEE, 2012: 23-30.[ DOI: 10.1109/CVPRW.2012.6239191 http://dx.doi.org/10.1109/CVPRW.2012.6239191 ]
He J F, Feng J Y, Liu X L, et al. Mobile product search with bag of hash bits and boundary reranking[C ] //Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3005-3012.[ DOI: 10.1109/CVPR.2012.6248030 http://dx.doi.org/10.1109/CVPR.2012.6248030 ]
Donoser M, Urschler M, Hirzer M, et al. Saliency driven total variation segmentation[C ] //Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 817-824.[ DOI: 10.1109/ICCV.2009.5459296 http://dx.doi.org/10.1109/ICCV.2009.5459296 ]
Perazzi F, Krähenbühl P, Pritch Y, et al. Saliency filters: contrast based filtering for salient region detection[C ] //Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 733-740.[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]
Jiang H Z, Wang J D, Yuan Z J, et al. Salient object detection: a discriminative regional feature integration approach[C ] //Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2083-2090.[ DOI: 10.1109/CVPR.2013.271 http://dx.doi.org/10.1109/CVPR.2013.271 ]
Tong N, Lu H C, Zhang L H, et al. Saliency detection with multi-scale superpixels[J]. IEEE Signal Processing Letters, 2014, 21(9):1035-1039.[DOI:10.1109/LSP.2014.2323407]
Kim J, Han D, Tai Y W, et al. Salient region detection via high-dimensional color transform[C ] //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 883-890.[ DOI: 10.1109/CVPR.2014.118 http://dx.doi.org/10.1109/CVPR.2014.118 ]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv: 1409.1556, 2014.
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Farabet C, Couprie C, Najman L, et al. Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1915-1929.[DOI:10.1109/TPAMI.2012.231]
Zhao R, Ouyang W L, Li H S, et al. Saliency detection by multi-context deep learning[C ] //Proceedings of 2015IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1265-1274.[ DOI: 10.1109/CVPR.2015.7298731 http://dx.doi.org/10.1109/CVPR.2015.7298731 ]
Li G, Yu Y. Visual saliency detection based on multiscale deep CNN features[J]. IEEE Transactions on Image Processing, 2016, 25(11):5012-5024.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C ] //Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Li G B, Yu Y Z. Deep contrast learning for salient object detection[C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 478-487.[ DOI: 10.1109/CVPR.2016.58 http://dx.doi.org/10.1109/CVPR.2016.58 ]
Luo Z M, Mishra A, Achkar A, et al. Non-local deep features for salient object detection[C ] //Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.: IEEE, 2017: 6593-6601.[ DOI: 10.1109/CVPR.2017.698 http://dx.doi.org/10.1109/CVPR.2017.698 ]
Zhang P P, Wang D, Lu H C, et al. Amulet: aggregating multi-level convolutional features for salient object detection[C ] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 202-211.[ DOI: 10.1109/ICCV.2017.31 http://dx.doi.org/10.1109/ICCV.2017.31 ]
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C ] //Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
J. Shi, Q. Yan, L. Xu, and J. Jia, "Hierarchical image saliency detection on extended cssd, " IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 4, pp. 717-729, 2016.
Movahedi V, Elder J H. Design and perceptual validation of performance measures for salient object segmentation[C ] //Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco: IEEE, 2010: 49-56.[ DOI: 10.1109/CVPRW.2010.5543739 http://dx.doi.org/10.1109/CVPRW.2010.5543739 ]
Liu T, Yuan Z J, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2):353-367.[DOI:10.1109/TPAMI.2010.70]
Batra D, Kowdle A, Parikh D, et al. iCoseg: interactive co-segmentation with intelligent scribble guidance[C ] //Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 3169-3176.[ DOI: 10.1109/CVPR.2010.5540080 http://dx.doi.org/10.1109/CVPR.2010.5540080 ]
Yan Q, Xu L, Shi J P, et al. Hierarchical saliency detection[C ] //Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 1155-1162.[ DOI: 10.1109/CVPR.2013.153 http://dx.doi.org/10.1109/CVPR.2013.153 ]
Zhou X F, Liu Z, Sun G L, et al. Improving saliency detection via multiple kernel boosting and adaptive fusion[J]. IEEE Signal Processing Letters, 2016, 23(4):517-521.[DOI:10.1109/LSP.2016.2536743]
Li H Y, Lu H C, Lin Z, et al. Inner and inter label propagation:salient object detection in the wild[J]. IEEE Transactions on Image Processing,2015, 24(10):3176-3186.[DOI:10.1109/TIP.2015.2440174]
Li C Y, Yuan Y C, Cai W D, et al. Robust saliency detection via regularized random walks ranking[C ] //Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 2710-2717.[ DOI: 10.1109/CVPR.2015.7298887 http://dx.doi.org/10.1109/CVPR.2015.7298887 ]
Yang K F, Li H, Li C Y, et al. A unified framework for salient structure detection by contour-guided visual search[J]. IEEE Transactions on Image Processing, 2016, 25(8):3475-3488.[DOI:10.1109/TIP.2016.2572600]
Zhang P P, Wang D, Lu H C, et al. Learning uncertain convolutional features for accurate saliency detection[C ] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 212-221.[ DOI: 10.1109/ICCV.2017.32 http://dx.doi.org/10.1109/ICCV.2017.32 ]
相关作者
相关机构
京公网安备11010802024621