MTMS300: a multiple-targets and multiple-scales benchmark dataset for salient object detection

Chuwei Li; Zhilong Zhang; Shuxin Li

doi:10.11834/jig.200612

Dataset | Views : 0 下载量: 0 CSCD: 1

PDF
Export
Share
Collection
Album

MTMS300: a multiple-targets and multiple-scales benchmark dataset for salient object detection
Vol. 27, Issue 4, Pages: 1039-1055(2022)
Published： 16 April 2022 ，

Accepted： 05 January 2021
DOI： 10.11834/jig.200612
稿件说明：

移动端阅览

Chuwei Li, Zhilong Zhang, Shuxin Li. MTMS300: a multiple-targets and multiple-scales benchmark dataset for salient object detection. [J]. Journal of Image and Graphics 27(4):1039-1055(2022)
DOI：

Chuwei Li, Zhilong Zhang, Shuxin Li. MTMS300: a multiple-targets and multiple-scales benchmark dataset for salient object detection. [J]. Journal of Image and Graphics 27(4):1039-1055(2022) DOI： 10.11834/jig.200612.

摘要

目的

在显著物体检测算法发展过程中，基准数据集发挥了重要作用。然而，现有基准数据集普遍存在数据集偏差，难以充分体现不同算法的性能，不能完全反映某些典型应用的技术特点。针对这一问题，本文对基准数据集的偏差和统计特性展开定量分析，提出针对特定任务的新基准数据集。

方法

首先，讨论设计和评价数据集时常用的度量指标；然后

定量分析基准数据集的统计学差异，设计新的基准数据集MTMS300(multiple targets and multiple scales)；接着，使用基准数据集对典型视觉显著性算法展开性能评估；最后，从公开基准数据集中找出对多数非深度学习算法而言都较为困难(指标得分低)的图像，构成另一个基准数据集DSC(difficult scenes in common)。

结果

采用平均注释图、超像素数目等6种度量指标对11个基准数据集进行定量分析，MTMS300数据集具有中心偏差小、目标面积比分布均衡、图像分辨率多样和目标数量较多等特点，DSC数据集具有前景/背景差异小、超像素数量多和图像熵值高的特点。使用11个基准数据集对18种视觉显著性算法进行定量评估，揭示了算法得分和数据集复杂度之间的相关性，并在MTMS300数据集上发现了现有算法的不足。

结论

提出的两个基准数据集具有不同的特点，有助于更为全面地评估视觉显著性算法，推动视觉显著性算法向特定任务方向发展。

Abstract

Objective

Benchmark dataset is essential to salient object detection algorithms. It conducts quantitative evaluation of various salient object detection algorithms. Most public datasets have a variety of biases like center bias

selection bias and category bias

respectively. 1) Center bias refers to the tendency of the photographer to place the object in the center of the camera's field of view when shooting the object

which is called camera-shooting bias. 2) Selection bias means the designer has a specific tendency when choosing images in the process of dataset construction

such as simple background option or large object orientation. 3) Category bias refers to the category imbalance in the dataset

which is often in the training process of deep convolutional neural network (DCNN). Based on the dataset bias

current visual saliency algorithms are aimed at daily scenes images. Such images are usually shot from close distance with a single background

and the saliency of the object is related to the object size and position. A salient object algorithm can easily judge its saliency when the large-scale object is located in the center of the image. The current saliency detection benchmark datasets are constrained of these biases. Our demonstration illustrates the statistical differences of several commonly used benchmark datasets quantitatively and proposes a new high-quality benchmark dataset.

Method

Centers bias

clutter and complexity

and label consistency of benchmark datasets are the crucial to design and evaluate a benchmark dataset. First

the commonly used evaluation metrics are discussed

including average annotation map (AAM)

normalized object distance (NOD)

super-pixels amount

image entropy

and intersection over union (IoU). Next

a new benchmark dataset is constructed

we split the image acquisition and annotation procedure to avoid dataset bias. In terms of the image acquisition requirement

we yield 6 participants to collect images based on Internet surfing and employ the similarity measurement to conduct similar or replicated images deduction. The image annotation process is divided into two stages to ensure the consistency of the annotation as well. At the beginning

a roughly annotated bounding-box is required 5 participants to label the salient objects in the image with a box and use the IoU to clarify further labeled objects. Next

a pixel-level annotation map generation

which is labeled by 2 participants. The mechanisms of pixel-level labeling are proposed as below: 1) The unoccluded parts of salient objects are only labeled; 2) The adjacent objects are segmented into independent parts based on the targets orientation. For overlapped objects

we do not separate them into dis-continuous parts deliberately; 3) Objects can be identified just by looking at their outlines. We built a benchmark dataset in the context of 300 multi-facets images derived of sea

land and sky and called it multiple targets and multiple scales (MTMS300). Third

we conducted a quantitative analysis of the current benchmark datasets and our dataset based on 6 factors and ranked them according to their difficulty degree. After that

we test and compare 18 representative visual saliency models quantitatively in the span of the public benchmark datasets and our new dataset. We reveal the possibilities of the failure of the models based on benchmark datasets. At the end

we leverage a set of images from the benchmark datasets and construct a new benchmark dataset named DSC (difficult scenes in common).

Result

The demonstration is divided into two parts: statistical analysis of benchmark datasets and quantitative evaluation of visual saliency algorithms. In the first part

we utilize average annotation map and normalized object distance to analyze the dataset center bias. Normalized object size

Chi-square distance of histograms

the number of super-pixels and the image entropy are used to analyze the dataset complexity simultaneously. Compared with other public datasets

MTMS300 dataset has a smaller center bias. MTMS300 dataset is also prominent in terms of object quantity and object size. The priority of the DSC dataset is derived of its small foreground/background difference

large number of super-pixels

and high image entropy. In the second part

two most widely adopted metrics are adopted to evaluate existing visual saliency algorithms. In terms of the experiments and evaluations of 18 algorithms on 11 datasets

we discovered that there is a correlation between the metric score of the algorithm and the difficulty of the dataset. Meanwhile

we analyzed the limitations of the current algorithms running on the new dataset.

Conclusion

We demonstrate a benchmark dataset for salient object detection

which is characterized by less center bias

balanced distribution of salient object size ratio

diverse image resolution

and multiple objects scenarios. Moreover

the multi-labelers-based annotation guaranteed that most objects in our dataset are clear and consistent. Our MTMS300 dataset includes 300 color images containing multiple targets and multiple scales. We evaluated 18 representative visual saliency algorithms on this new dataset and review the challenging issues of various algorithms. At last

we have found a set of images based on the 9 existing benchmark datasets and construct a dataset named DSC. These two datasets can evaluate the performance of various visual saliency algorithms

and facilitate the development of salient object detection algorithms further

especially for task-specific applications.

关键词

视觉显著性显著物体检测基准数据集多目标多尺度小目标

Keywords

visual saliencysalient object detectionbenchmark datasetmultiple targetmultiple scalesmall target

references

Achanta R, Hemami S, Estrada F and Susstrunk S. 2009. Frequency-tuned salient region detection//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 1597-1604[DOI: 10.1109/CVPR.2009.5206596http://dx.doi.org/10.1109/CVPR.2009.5206596]

Alpert S, Galun M, Basri R and Brandt A. 2007. Image segmentation by probabilistic bottom-up aggregation and cue integration//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2007.383017http://dx.doi.org/10.1109/CVPR.2007.383017]

Borji A. 2015. What is a salient object? A dataset and a baseline model for salient object detection. IEEE Transactions on Image Processing, 24(2): 742-756[DOI: 10.1109/TIP.2014.2383320]

Borji A, Cheng M M, Jiang H Z and Li J. 2015. Salient object detection: a benchmark. IEEE Transactions on Image Processing, 24(12): 5706-5722[DOI: 10.1109/TIP.2015.2487833]

Borji A and Itti L. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 185-207[DOI: 10.1109/TPAMI.2012.89]

Borji A, Sihite D N and Itti L. 2012. Salient object detection: a benchmark//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 414-429[DOI: 10.1007/978-3-642-33709-3_30http://dx.doi.org/10.1007/978-3-642-33709-3_30]

Borji A, Sihite D N and Itti L. 2013a. What stands out in a scene? A study of human explicit saliency judgment. Vision Research, 91: 62-77[DOI: 10.1016/j.visres.2013.07.016]

Borji A, Sihite D N and Itti L. 2013b. Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Transactions on Image Processing, 22(1): 55-69[DOI: 10.1109/TIP.2012.2210727]

Bruce N D B and Tsotsos J K. 2005. Saliency based on information maximization//Proceedings of the 18th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 155-162

Bylinskii Z, Judd T, Oliva A, Torralba A and Durand F. 2019. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3): 740-757[DOI: 10.1109/TPAMI.2018.2815601]

Cheng M M, Mitra N J, Huang X L and Hu S M. 2014. SalientShape: group saliency in image collections. The Visual Computer, 30(4): 443-453[DOI: 10.1007/s00371-013-0867-4]

Cheng M M, Mitra N J, Huang X L, Torr P H S and Hu S M. 2015. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 569-582[DOI: 10.1109/TPAMI.2014.2345401]

Cheng M M, Zhang G X, Mitra N J, Huang X L and Hu S M. 2011. Global contrast based salient region detection//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 409-416[DOI: 10.1109/CVPR.2011.5995344http://dx.doi.org/10.1109/CVPR.2011.5995344]

Erdem E and Erdem A. 2013. Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4): 11[DOI: 10.1167/13.4.11]

Fan D P, Cheng M M, Liu J J, Gao S H, Hou Q B and Borji A. 2018. Salient objects in clutter: bringing salient object detection to the foreground//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 196-212[DOI: 10.1007/978-3-030-01267-0_12http://dx.doi.org/10.1007/978-3-030-01267-0_12]

Felzenszwalb P F and Huttenlocher D P. 2004. Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2): 167-181[DOI: 10.1023/B:VISI.0000022288.19776.77]

Feng M Y, Lu H C and Ding E R. 2019. Attentive feedback network for boundary-aware salient object detection//Proceedings of 2019 IEEE/CVFConference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1623-1632[DOI: 10.1109/CVPR.2019.00172http://dx.doi.org/10.1109/CVPR.2019.00172]

Harel J, Koch C and Perona P. 2006. Graph-based visual saliency//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 545-552

Hou X D, Harel J and Koch C. 2012. Image signature: highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1): 194-201[DOI: 10.1109/TPAMI.2011.146]

Hou X D and Zhang L Q. 2007. Saliency detection: a spectral residual approach//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8[DOI: 10.1109/CVPR.2007.383267http://dx.doi.org/10.1109/CVPR.2007.383267]

Itti L, Koch C and Niebur E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11): 1254-1259[DOI: 10.1109/34.730558]

Jiang B W, Zhang L H, Lu H C, Yang C and Yang M H. 2013a. Saliency detection via absorbing Markov chain//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1665-1672[DOI: 10.1109/ICCV.2013.209http://dx.doi.org/10.1109/ICCV.2013.209]

Jiang H Z, Wang J D, Yuan Z J, Wu Y, Zheng N N and Li S P. 2013b. Salient object detection: a discriminative regional feature integration approach//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 2083-2090[DOI: 10.1109/CVPR.2013.271http://dx.doi.org/10.1109/CVPR.2013.271]

Judd T, Ehinger K, Durand F and Torralba A. 2009. Learning to predict where humans look//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2106-2113[DOI: 10.1109/ICCV.2009.5459462http://dx.doi.org/10.1109/ICCV.2009.5459462]

Lee G, Tai Y W and Kim J. 2016. Deep saliency with encoded low level distance map and high level features//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 660-668[DOI: 10.1109/CVPR.2016.78http://dx.doi.org/10.1109/CVPR.2016.78]

Li G B and Yu Y Z. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing, 25(11): 5012-5024[DOI: 10.1109/TIP.2016.2602079]

Li J, Levine M, An X J and He H E. 2011. Saliency detection based on frequency and spatial domain analyses//Proceedings of the 22nd British Machine Vision Conference. Dundee, UK: BMVA Press: 86.1-86.11[DOI: 10.5244/C.25.86http://dx.doi.org/10.5244/C.25.86]

Li S Q, Zeng C, Liu S P and Fu Y. 2017. Merging fixation for saliency detection in a multilayer graph. Neurocomputing, 230: 173-183[DOI: 10.1016/j.neucom.2016.12.026]

Li X H, Lu H C, Zhang L H, Ruan X and Yang M H. 2013. Saliency detection via dense and sparse reconstruction//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 2976-2983[DOI: 10.1109/ICCV.2013.370http://dx.doi.org/10.1109/ICCV.2013.370]

Li Y, Hou X D, Koch C, Rehg J M and Yuille A L. 2014. The secrets of salient object segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 4321-4328[DOI: 10.1109/CVPR.2014.43http://dx.doi.org/10.1109/CVPR.2014.43]

Liu T, Yuan Z J, Sun J, Wang J D, Zheng N N, Tang X O and Shum H Y. 2011. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2): 353-367[DOI: 10.1109/TPAMI.2010.70]

Margolin R, Tal A and Zelnik-Manor L. 2013. What makes a patch distinct?//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1139-1146[DOI: 10.1109/CVPR.2013.151http://dx.doi.org/10.1109/CVPR.2013.151]

Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651[DOI: 10.1109/TPAMI.2016.2572683]

Shi J P, Yan Q, Xu L and Jia J Y. 2016. Hierarchical image saliency detection on extended CSSD. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4): 717-729[DOI: 10.1109/TPAMI.2015.2465960]

Wang L J, Lu H C, Wang Y F, Feng M Y, Wang D, Yin B C and Ruan X. 2017. Learning to detect salient objects with image-level supervision//Proceedings of 2017 IEEE Conference on Computer Vision and PatternRecognition. Honolulu, USA: IEEE: 3796-3805[DOI: 10.1109/CVPR.2017.404http://dx.doi.org/10.1109/CVPR.2017.404]

Wang W G, Shen J B, Dong X P and Borji A. 2018. Salient object detection driven by fixation prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1711-1720[DOI: 10.1109/CVPR.2018.00184http://dx.doi.org/10.1109/CVPR.2018.00184]

Yang C, Zhang L H, Lu H C, Ruan X and Yang M H. 2013. Saliency detection via graph-based manifold ranking//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3166-3173[DOI: 10.1109/CVPR.2013.407http://dx.doi.org/10.1109/CVPR.2013.407]

Zhang J M and Sclaroff S. 2013. Saliency detection: a Boolean map approach//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 153-160[DOI: 10.1109/ICCV.2013.26http://dx.doi.org/10.1109/ICCV.2013.26]

Zhang P P, Wang D, Lu H C, Wang H Y and Yin B C. 2017a. Learning uncertain convolutional features for accurate saliency detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 212-221[DOI: 10.1109/ICCV.2017.32http://dx.doi.org/10.1109/ICCV.2017.32]

Zhang P P, Wang D, Lu H C, Wang H Y and Ruan X. 2017b. Amulet: aggregating multi-level convolutional features for salient object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 202-211[DOI: 10.1109/ICCV.2017.31http://dx.doi.org/10.1109/ICCV.2017.31]

Zhu W J, Liang S, Wei Y C and Sun J. 2014. Saliency optimization from robust background detection//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2814-2821[DOI: 10.1109/CVPR.2014.360http://dx.doi.org/10.1109/CVPR.2014.360]

Alert me when the article has been cited

提交

Salient object detection method by combining Boolean map and grayscale rarity

Semantic segmentation benchmark dataset for coastal ecosystem monitoring based on unmanned aerial vehicle （UAV）

Newly low-resolution pedestrian re-identification-relevant dataset and its benched method

HSRS-SC: a hyperspectral image dataset for remote sensing scene classification

SAR inshore ship detection algorithm in complex background