Current Issue Cover
MTMS300:面向显著物体检测的多目标多尺度基准数据集

李楚为1, 张志龙1, 李树新2(1.国防科技大学电子科学学院自动目标识别重点实验室, 长沙 410073;2.国防科技大学信息通信学院, 西安 710106)

摘 要
目的 在显著物体检测算法发展过程中,基准数据集发挥了重要作用。然而,现有基准数据集普遍存在数据集偏差,难以充分体现不同算法的性能,不能完全反映某些典型应用的技术特点。针对这一问题,本文对基准数据集的偏差和统计特性展开定量分析,提出针对特定任务的新基准数据集。方法 首先,讨论设计和评价数据集时常用的度量指标;然后,定量分析基准数据集的统计学差异,设计新的基准数据集MTMS300(multiple targets and multiple scales);接着,使用基准数据集对典型视觉显著性算法展开性能评估;最后,从公开基准数据集中找出对多数非深度学习算法而言都较为困难(指标得分低)的图像,构成另一个基准数据集DSC(difficult scenes in common)。结果 采用平均注释图、超像素数目等6种度量指标对11个基准数据集进行定量分析,MTMS300数据集具有中心偏差小、目标面积比分布均衡、图像分辨率多样和目标数量较多等特点,DSC数据集具有前景/背景差异小、超像素数量多和图像熵值高的特点。使用11个基准数据集对18种视觉显著性算法进行定量评估,揭示了算法得分和数据集复杂度之间的相关性,并在MTMS300数据集上发现了现有算法的不足。结论 提出的两个基准数据集具有不同的特点,有助于更为全面地评估视觉显著性算法,推动视觉显著性算法向特定任务方向发展。
关键词
MTMS300: a multiple-targets and multiple-scales benchmark dataset for salient object detection

Li Chuwei1, Zhang Zhilong1, Li Shuxin2(1.National Key Laboratory of Science and Technology on Automatic Target Recognition, College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China;2.School of Information and Communication, National University of Defense Technology, Xi'an 710106, China)

Abstract
Objective Benchmark dataset is essential to salient object detection algorithms. It conducts quantitative evaluation of various salient object detection algorithms. Most public datasets have a variety of biases like center bias, selection bias and category bias, respectively. 1) Center bias refers to the tendency of the photographer to place the object in the center of the camera’s field of view when shooting the object, which is called camera-shooting bias. 2) Selection bias means the designer has a specific tendency when choosing images in the process of dataset construction, such as simple background option or large object orientation. 3) Category bias refers to the category imbalance in the dataset, which is often in the training process of deep convolutional neural network (DCNN). Based on the dataset bias, current visual saliency algorithms are aimed at daily scenes images. Such images are usually shot from close distance with a single background, and the saliency of the object is related to the object size and position. A salient object algorithm can easily judge its saliency when the large-scale object is located in the center of the image. The current saliency detection benchmark datasets are constrained of these biases. Our demonstration illustrates the statistical differences of several commonly used benchmark datasets quantitatively and proposes a new high-quality benchmark dataset. Method Centers bias, clutter and complexity, and label consistency of benchmark datasets are the crucial to design and evaluate a benchmark dataset. First, the commonly used evaluation metrics are discussed, including average annotation map (AAM), normalized object distance (NOD), super-pixels amount, image entropy, and intersection over union (IoU). Next, a new benchmark dataset is constructed, we split the image acquisition and annotation procedure to avoid dataset bias. In terms of the image acquisition requirement, we yield 6 participants to collect images based on Internet surfing and employ the similarity measurement to conduct similar or replicated images deduction. The image annotation process is divided into two stages to ensure the consistency of the annotation as well. At the beginning, a roughly annotated bounding-box is required 5 participants to label the salient objects in the image with a box and use the IoU to clarify further labeled objects. Next, a pixel-level annotation map generation, which is labeled by 2 participants. The mechanisms of pixel-level labeling are proposed as below: 1) The unoccluded parts of salient objects are only labeled; 2) The adjacent objects are segmented into independent parts based on the targets orientation. For overlapped objects, we do not separate them into dis-continuous parts deliberately; 3) Objects can be identified just by looking at their outlines. We built a benchmark dataset in the context of 300 multi-facets images derived of sea, land and sky and called it multiple targets and multiple scales (MTMS300). Third, we conducted a quantitative analysis of the current benchmark datasets and our dataset based on 6 factors and ranked them according to their difficulty degree. After that, we test and compare 18 representative visual saliency models quantitatively in the span of the public benchmark datasets and our new dataset. We reveal the possibilities of the failure of the models based on benchmark datasets. At the end, we leverage a set of images from the benchmark datasets and construct a new benchmark dataset named DSC (difficult scenes in common). Result The demonstration is divided into two parts: statistical analysis of benchmark datasets and quantitative evaluation of visual saliency algorithms. In the first part, we utilize average annotation map and normalized object distance to analyze the dataset center bias. Normalized object size, Chi-square distance of histograms, the number of super-pixels and the image entropy are used to analyze the dataset complexity simultaneously. Compared with other public datasets, MTMS300 dataset has a smaller center bias. MTMS300 dataset is also prominent in terms of object quantity and object size. The priority of the DSC dataset is derived of its small foreground/background difference, large number of super-pixels, and high image entropy. In the second part, two most widely adopted metrics are adopted to evaluate existing visual saliency algorithms. In terms of the experiments and evaluations of 18 algorithms on 11 datasets, we discovered that there is a correlation between the metric score of the algorithm and the difficulty of the dataset. Meanwhile, we analyzed the limitations of the current algorithms running on the new dataset. Conclusion We demonstrate a benchmark dataset for salient object detection, which is characterized by less center bias, balanced distribution of salient object size ratio, diverse image resolution, and multiple objects scenarios. Moreover, the multi-labelers-based annotation guaranteed that most objects in our dataset are clear and consistent. Our MTMS300 dataset includes 300 color images containing multiple targets and multiple scales. We evaluated 18 representative visual saliency algorithms on this new dataset and review the challenging issues of various algorithms. At last, we have found a set of images based on the 9 existing benchmark datasets and construct a dataset named DSC. These two datasets can evaluate the performance of various visual saliency algorithms, and facilitate the development of salient object detection algorithms further, especially for task-specific applications.
Keywords

订阅号|日报