Current Issue Cover
融合多尺度卷积神经网络的显著物体检测模型

张晴,左保川,石艳娇,戴蒙(上海应用技术大学)

摘 要
目的 传统显著性检测模型大多利用手工选择的中低层特征和先验信息进行物体检测,其准确率和召回率比较低,随着深度卷积神经网络的兴起,显著性检测取得了快速进展。然而,现有显著性方法具有共同的缺点,即它们难以在复杂图像中均匀地突显出整个物体的明确边界和内部区域,这主要是由于缺乏用于检测显著物体的足够且丰富的特征。方法 在本文中,我们提出了一种简单且有效的卷积神经网络。在VGG模型的基础上进行改进,去掉最后的全连接层,并采用跳层连接的方式用于像素级别的显著性预测,它可以有效结合来自卷积神经网络不同卷积层的多尺度信息。此外,它能够在数据驱动的框架中结合高级语义信息和低层细节信息。最后,为了有效地保留物体边界和内部区域的统一,我们采用全连接的CRF模型对得到的显著性特征图进行调整。结果 本文在6个广泛使用的公开数据集DUT-OMRON、ECSSD、SED2、HKU、PASCAL-S和SOD上进行了测试,并就PR曲线、F-measure、最大F-measure、加权F-measure和MAE方法等性能评估指标与14种最先进的具有代表性的方法进行比较。结果显示,本文所提方法在6个数据集上的F测度值分别为0.696、0.876、0.797、0.868、0.772和0.785;最大F测度值分别为0.747、0.899、0.859、0.889、0.814和0.833;加权F测度值分别为0.656、0.854、0.772、0.844、0.732和0.762;以及MAE值分别为0.074、0.061、0.093、0.049、0.099和0.124。无论是前景和背景颜色相似的图像集,还是多物体的复杂图像集,本文算法的各项性能均接近最新研究成果,且优于大多数的具有代表性的方法。结论 本文算法对各种场景的图像显著性检测都具有较强的鲁棒性,同时可以使显著性物体的边界和内部区域更均匀,检测结果更准确。
关键词
A multi-scale convolutional neural network model for salient object detection

Zhang Qing,Zuobaochuan,Shiyanjiao,Dai Meng()

Abstract
Objective Salient object detection which aims to localize and segment the most conspicuous and eye-attracting objects or regions in an image, the results are usually expressed by saliency maps, in which the intensity of each pixel presents the strength of the probability that the pixel belongs to a salient region. Visual saliency detection has been used as a pre-processing step to facilitate a wide range of vision applications including image and video compression, image retargeting, visual tracking and robot navigation, etc. Traditional saliency detection models mostly focus on handcrafted features and prior information for detection, such as background prior, center prior, contrast prior and so on. However, these models are less applicable to a wide range of problems in practice. For example, it is difficult to pop out the salient objects when the background and salient objects share some similar visual attributes. Moreover, it might fail sometimes, when multiple salient object overlap partly or entirely with each other. With the rise of deep convolutional neural networks, visual saliency detection has achieved rapid progress in the last few years, it has been successful in overcoming the disadvantages of handcrafted features based approaches and has greatly enhanced the performance of saliency detection. These CNNs-based models have shown their superiority on feature extraction and better capture high-level information about the objects and their cluttered surroundings, thus achieving better performance compared with the traditional methods, especially the emergence of FCN. Most mainstream saliency detection algorithms are now based on fully convolutional networks, the FCN model unifies the two stages of feature extraction and saliency calculation, also optimizes it through supervised learning. It turns out that the features extracted by FCN network have stronger advantages in expression and robustness than handcrafted features. However, existing saliency approaches share the common drawbacks that they have dif?culties in uniformly highlighting the entire salient objects with explicit boundaries and heterogenous regions in complex images, largely due to the lack of suf?cient and rich features for detecting salient objects. Methods In this paper, we propose a simple but ef?cient convolutional neural network for pixel-wise saliency prediction to capture various features, simultaneously, which utilizes multi-scale information from different convolutional layers of a convolutional neural network. To design a FCN-like network that is capable of carrying out the task of pixel-level saliency inference, we develop a multi-scale deep convolutional neural network for discovering more information in saliency computation, the multi-scale feature extraction network generates feature maps with different resolution from different side outputs of convolutional layer groups of a base network, the shallow convolutional layers contains rich detailed structure information at the expensive of global representation while the deep convolutional while the deep convolutional layers contains rich semantic information but lack spatial context. Moreover, it is capable of incorporating high-level semantic cues and low-level detailed information in a data-driven framework. Finally, to ef?ciently preserve object boundaries and uniform interior region, we adopt a fully connected CRF model to re?ne the estimated saliency map. Results Extensive experiments have been conducted on 6 most widely used and challenging benchmark datasets, including DUT-OMRON, ECSSD, SED2, HKU, PASCAL-S and SOD. The F-measure scores of our proposed scheme on 6 benchmark datasets are 0.696, 0.876, 0.797, 0.868, 0.772 and 0.785 respectively; max F-measure scores are 0.747, 0.899, 0.859, 0.889, 0.814 and 0.833 respectively; weighted F-measure scores are 0.656, 0.854, 0.772, 0.844, 0.732 and 0.762; MAE scores are 0.074, 0.061, 0.093, 0.049, 0.099 and 0.124 respectively. Moreover, we compare our proposed method with 14 state-of-the-art methods, and it demonstrate the ef?ciency and robustness of the proposed approach against 14 state-of-the-art methods in terms of popular evaluation metrics. Conclusion We propose an efficient FCN-like salient object detection model which can generate rich and efficiency features, the algorithm in this paper is robust to image saliency detection in various scenarios, Simultaneously, the boundary and inner area of the salient object can be made more uniform, and the detection result is more accurate.
Keywords
QQ在线


订阅号|日报