多尺度卷积神经网络显著物体检测
A multi-scale convolutional neural network for salient object detection
- 2020年25卷第6期 页码:1116-1129
收稿:2019-08-19,
修回:2019-10-23,
录用:2019-10-30,
纸质出版:2020-06-16
DOI: 10.11834/jig.190395
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-08-19,
修回:2019-10-23,
录用:2019-10-30,
纸质出版:2020-06-16
移动端阅览
目的
2
传统显著性检测模型大多利用手工选择的中低层特征和先验信息进行物体检测,其准确率和召回率较低,随着深度卷积神经网络的兴起,显著性检测得以快速发展。然而,现有显著性方法仍存在共性缺点,难以在复杂图像中均匀地突显整个物体的明确边界和内部区域,主要原因是缺乏足够且丰富的特征用于检测。
方法
2
在VGG(visual geometry group)模型的基础上进行改进,去掉最后的全连接层,采用跳层连接的方式用于像素级别的显著性预测,可以有效结合来自卷积神经网络不同卷积层的多尺度信息。此外,它能够在数据驱动的框架中结合高级语义信息和低层细节信息。为了有效地保留物体边界和内部区域的统一,采用全连接的条件随机场(conditional random field,CRF)模型对得到的显著性特征图进行调整。
结果
2
本文在6个广泛使用的公开数据集DUT-OMRON(Dalian University of Technology and OMRON Corporation)、ECSSD(extended complex scene saliency dataset)、SED2(segmentation evalution database 2)、HKU、PASCAL-S和SOD(salient objects dataset)上进行了测试,并就准确率—召回率(precision-recall,PR)曲线、F测度值(F-measure)、最大F测度值、加权F测度值和均方误差(mean absolute error,MAE)等性能评估指标与14种最先进且具有代表性的方法进行比较。结果显示,本文方法在6个数据集上的F测度值分别为0.696、0.876、0.797、0.868、0.772和0.785;最大F测度值分别为0.747、0.899、0.859、0.889、0.814和0.833;加权F测度值分别为0.656、0.854、0.772、0.844、0.732和0.762;MAE值分别为0.074、0.061、0.093、0.049、0.099和0.124。无论是前景和背景颜色相似的图像集,还是多物体的复杂图像集,本文方法的各项性能均接近最新研究成果,且优于大多数具有代表性的方法。
结论
2
本文方法对各种场景的图像显著性检测都具有较强的鲁棒性,同时可以使显著性物体的边界和内部区域更均匀,检测结果更准确。
Objective
2
Salient object detection aims to localize and segment the most conspicuous and eye-attracting objects or regions in an image. Its results are usually expressed by saliency maps
in which the intensity of each pixel presents the strength of the probability that the pixel belongs to a salient region. Visual saliency detection has been used as a pre-processing step to facilitate a wide range of vision applications
including image and video compression
image retargeting
visual tracking
and robot navigation. Traditional saliency detection models focus on handcrafted features and prior information for detection
such as background prior
center prior
and contrast prior. However
these models are less applicable to a wide range of problems in practice. For example
salient objects are difficult to recognize when the background and salient objects share similar visual attributes. Moreover
failure may occur when multiple salient objects overlap partly or entirely with one another. With the rise of deep convolutional neural networks (CNNs)
visual saliency detection has achieved rapid progress in the recent years. It has been successful in overcoming the disadvantages of handcrafted-feature-based approaches and greatly enhancing the performance of saliency detection. These CNNs-based models have shown their superiority on feature extraction. They also efficiently capture high-level information on the objects and their cluttered surroundings
thus achieving better performance compared with the traditional methods
especially the emergence of fully convolutional networks (FCN). Most mainstream saliency detection algorithms are now based on FCN. The FCN model unifies the two stages of feature extraction and saliency calculation and optimizes it through supervised learning. As a result
the features extracted by FCN network have stronger advantages in expression and robustness than do handcrafted features. However
existing saliency approaches share common drawbacks
such as difficulties in uniformly highlighting the entire salient objects with explicit boundaries and heterogeneous regions in complex images. This drawback is largely due to the lack of sufficient and rich features for detecting salient objects.
Method
2
In this study
we propose a simple but efficient CNN for pixel-wise saliency prediction to capture various features simultaneously. It also utilizes ulti-scale information from different convolutional layers of a CNN. To design a FCN-like network that is capable of carrying out the task of pixel-level saliency inference
we develop a multi-scale deep CNN for discovering more information in saliency computation. The multi-scale feature extraction network generates feature maps with different resolution from different side outputs of convolutional layer groups of a base network. The shallow convolutional layers contain rich detailed structure information at the expense of global representation. By contrast
the deep convolutional layers contain rich semantic information but lack spatial context. It is also capable of incorporating high-level semantic cues and low-level detailed information in a data-driven framework. Finally
to efficiently preserve object boundaries and uniform interior region
we adopt a fully connected conditional random field (CRF) model to refine the estimated saliency map.
Result
2
Extensive experiments are conducted on the six most widely used and challenging benchmark datasets
namely
DUT-OMRON(Dalian University of Technology and OMRON Corporation)
ECSSD(extended complex scene saliency dataset)
SED2(segmentation evalution database 2)
HKU
PASCAL-S
and SOD (salient objects dataset). The F-measure scores of our proposed scheme on these six benchmark datasets are 0.696
0.876
0.797
0.868
0.772
and 0.785
respectively. The max F-measure scores are 0.747
0.899
0.859
0.889
0.814
and 0.833
respectively. The weighted F-measure scores are 0.656
0.854
0.772
0.844
0.732
and 0.762
respectively. The mean absolute error (MAE) scores are 0.074
0.061
0.093
0.049
0.099
and 0.124
respectively. We compare our proposed method with 14 state-of-the-art methods as well. Results demonstrate the efficiency and robustness of the proposed approach against the 14 state-of-the-art methods in terms of popular evaluation metrics.
Conclusion
2
We propose an efficient FCN-like salient object detection model that can generate rich and efficient features. The algorithm used in this study is robust to image saliency detection in various scenarios. Simultaneously
the boundary and inner area of the salient object are uniform
and the detection result is accurate.
Alpert S, Galun M, Brandt A and Basri R. 2012. Image segmentation by probabilistic bottom-up aggregation and cue integration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2):315-327[DOI:10.1109/TPAMI.2011.130]
Craye C, Filliat D and Goudou J F. 2016. Environment exploration for object-based visual saliency learning//Proceedings of 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE: 2303-2309[ DOI: 10.1109/ICRA.2016.7487379 http://dx.doi.org/10.1109/ICRA.2016.7487379 ]
Everingham M, Van Gool L, Williams C K I and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303-338[DOI:10.1007/s11263-009-0275-4]
Goferman S, Zelnik-Manor L and Tal A. 2012. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10):1915-1926[DOI:10.1109/TPAMI.2011.272]
Guo C L and Zhang L M. 2010. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1):185-198[DOI:10.1109/TIP.2009.2030969]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hong S, You T, Kwak S and Han B. 2015. Online tracking by learning discriminative saliency map with convolutional neural network//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR: 597-606 https://www.researchgate.net/publication/272845640_Online_Tracking_by_Learning_Discriminative_Saliency_Map_with_Convolutional_Neural_Network .
Itti L, Koch C and Niebur E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259[DOI:10.1109/34.730558]
Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S and Darrell T. 2014. Caffe: convolutional architecture for fast feature embedding//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, Florida, USA: ACM: 675-678[ DOI: 10.1145/2647868.2654889 http://dx.doi.org/10.1145/2647868.2654889 ]
Krähenbühl P and Koltun V. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials//Proceedings of 2011 Neural Information Processing Systems. Granada, Spain: [s.n.]: 109-117
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc: 1097-1105
Lee G, Tai Y W and Kim J. 2016. Deep saliency with encoded low level distance map and high level features//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 660-668[ DOI: 10.1109/CVPR.2016.78 http://dx.doi.org/10.1109/CVPR.2016.78 ]
Li C Y, Yuan Y C, Cai W D, Xia Y and Feng D D. 2015a. Robust saliency detection via regularized random walks ranking//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 2710-2717[ DOI: 10.1109/CVPR.2015.7298887 http://dx.doi.org/10.1109/CVPR.2015.7298887 ]
Li G B and Yu Y Z. 2015. Visual saliency based on multiscale deep features//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 5455-5463[ DOI: 10.1109/CVPR.2015.7299184 http://dx.doi.org/10.1109/CVPR.2015.7299184 ]
Li G B and Yu Y Z. 2016a. Deep contrast learning for salient object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 478-487[ DOI: 10.1109/CVPR.2016.58 http://dx.doi.org/10.1109/CVPR.2016.58 ]
Li G B and Yu Y Z. 2016b. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing, 25(11):5012-5024[DOI:10.1109/TIP.2016.2602079]
Li H Y, Lu H C, Lin Z, Shen X H and Price B. 2015b. Inner and inter label propagation:salient object detection in the wild. IEEE Transactions on Image Processing, 24(10):3176-3186[DOI:10.1109/TIP.2015.2440174]
Li X, Zhao L M, Wei L N, Yang M H, Zhuang Y T, Ling H B and Wang J D. 2016. DeepSaliency:multi-task deep neural network model for salient object detection. IEEE Transactions on Image Processing, 25(8):3919-3930[DOI:10.1109/TIP.2016.2579306]
Li Y, Hou X D, Koch C, Rehg J M and Yuille A L. 2014. The secrets of salient object segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 280-287[ DOI: 10.1109/CVPR.2014.43 http://dx.doi.org/10.1109/CVPR.2014.43 ]
Liu N and Han J W. 2016. DHSNet: deep hierarchical saliency network for salient object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 678-686[ DOI: 10.1109/CVPR.2016.80 http://dx.doi.org/10.1109/CVPR.2016.80 ]
Liu T,Yuan Z J, Sun J, Wang J D, Zheng N N, Tang X O and Shum H Y. 2011. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2):353-367[DOI:10.1109/TPAMI.2010.70]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Margolin R, Zelnik-Manor L and Tal A. 2014. How to evaluate foreground maps//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 248-255[ DOI: 10.1109/CVPR.2014.39 http://dx.doi.org/10.1109/CVPR.2014.39 ]
Movahedi V and Elder J H. 2010. Design and perceptual validation of performance measures for salient object segmentation//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 49-56[ DOI: 10.1109/CVPRW.2010.5543739 http://dx.doi.org/10.1109/CVPRW.2010.5543739 ]
Peng H W, Li B, Ling H B, Hu W M, Xiong W H and Maybank S J. 2017. Salient object detection via structured matrix decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):818-832[DOI:10.1109/TPAMI.2016.2562626]
Perazzi F, Krähenbuhl P, Pritch Y and Hornung A. 2012. Saliency filters: contrast based filtering for salient region detection//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 733-740[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]
Qin Y, Lu H C, Xu Y Q and Wang H. 2015. Saliency detection via cellular automata//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 110-119[ DOI: 10.1109/CVPR.2015.7298606 http://dx.doi.org/10.1109/CVPR.2015.7298606 ]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, CA, USA: [s.n.] https://www.researchgate.net/publication/265385906_Very_Deep_Convolutional_Networks_for_Large-Scale_Image_Recognition .
Tong N, Lu H C, Zhang L H and Ruan X. 2014. Saliency detection with multi-scale superpixels. IEEE Signal Processing Letters, 21(9):1035-1039[DOI:10.1109/LSP.2014.2323407]
Wang J D, Jiang H Z, Yuan Z J, Cheng M M, Hu X W and Zheng N N. 2017. Salient object detection:a discriminative regional feature integration approach. International Journal of Computer Vision, 123(2):251-268[DOI:10.1007/s11263-016-0977-3]
Wang L J, Lu H C, Ruan X and Yang M H. 2015. Deep networks for saliency detection via local estimation and global search//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 3183-3192[ DOI: 10.1109/CVPR.2015.7298938 http://dx.doi.org/10.1109/CVPR.2015.7298938 ]
Wang L Z, Wang L J, Lu H C, Zhang P P and Ruan X. 2019. Salient object detection with recurrent fully convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7):1734-1746[DOI:10.1109/TPAMI.2018.2846598]
Wang T T, Zhang L H, Lu H C, Sun C and Qi J Q. 2016. Kernelized subspace ranking for saliency detection//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 450-466[ DOI: 10.1007/978-3-319-46484-8_27 http://dx.doi.org/10.1007/978-3-319-46484-8_27 ]
Yan Q, Xu L, Shi J P and Jia J Y. 2013. Hierarchical saliency detection//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 1155-1162[ DOI: 10.1109/CVPR.2013.153 http://dx.doi.org/10.1109/CVPR.2013.153 ]
Yang C, Zhang L H, Lu H C, Ruan X and Yang M H. 2013. Saliency detection via graph-based manifold ranking//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 3166-3173[ DOI: 10.1109/CVPR.2013.407 http://dx.doi.org/10.1109/CVPR.2013.407 ]
Zhang D W, Han J W and Zhang Y. 2017a. Supervision by fusion: towards unsupervised learning of deep salient object detector//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4068-4076[ DOI: 10.1109/ICCV.2017.436 http://dx.doi.org/10.1109/ICCV.2017.436 ]
Zhang P P, Wang D, Lu H C, Wang H Y and Yin B C. 2017b. Learning uncertain convolutional features for accurate saliency detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 212-221[ DOI: 10.1109/ICCV.2017.32 http://dx.doi.org/10.1109/ICCV.2017.32 ]
Zhang Q, Lin J J, Li W J, Shi Y J and Cao G G. 2018a. Salient object detection via compactness and objectness cues. The Visual Computer, 34(4):473-489[DOI:10.1007/s00371-017-1354-0]
Zhang X N, Wang T T, Qi J Q, Lu H C and Wang G. 2018b. Progressive attention guided recurrent network for salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 714-722[ DOI: 10.1109/CVPR.2018.00081 http://dx.doi.org/10.1109/CVPR.2018.00081 ]
Zhao R, Ouyang W L, Li H S and Wang X G. 2015. Saliency detection by multi-context deep learning//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 1265-1274[ DOI: 10.1109/CVPR.2015.7298731 http://dx.doi.org/10.1109/CVPR.2015.7298731 ]
Zhu W J, Liang S, Wei Y C and Sun J. 2014. Saliency optimization from robust background detection//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 2814-2821[ DOI: 10.1109/CVPR.2014.360 http://dx.doi.org/10.1109/CVPR.2014.360 ]
相关作者
相关机构
京公网安备11010802024621