孙旭豪1,2,3, 沈阳1,2,3, 魏秀参1,2,3, 安鹏4(1.南京理工大学计算机科学与工程学院, 南京 210094;2.高维信息智能感知与系统教育部重点实验室, 南京 210094;3.社会安全图像与视频理解江苏省重点实验室, 南京 210094;4.中国海洋石油集团有限公司信息技术中心, 北京 100010)
目的 现有目标检测任务常在封闭集设定中进行。然而在现实问题中,待检测图片中往往包含未知类别目标。因此,在保证模型对已知类检测性能的基础上,为了提升模型在现实检测任务中对新增类别的目标检测能力,本文对开放集目标检测任务进行研究。方法 区别于现有的开放集目标检测框架在检测任务中将背景类与未知类视为一个类别进行优化,本文框架在进行开放集类别识别的过程中,优先识别候选框属于背景类或是含待识别目标类别,而后再对含待识别目标类别进行已知类与未知类的判别。本文提出基于环状原型空间优化的检测器,该检测器可以通过优化待检测框的特征在高维空间中的稀疏程度对已知类、未知类与背景类进行环状序列判别,从而提升模型对开放集类别的检测性能。在(region proposal networks,RPN)层后设计了随机覆盖候选框的方式筛选相关的背景类训练框,避免了以往开放集检测工作中繁杂的背景类采样步骤。结果 本文方法在保证模型对封闭集设定下检测性能的情况下,通过逐步增加未知类别的数量,在 Visual Object Classes-Common Objects in Context-20(VOC-COCO-20),Visual Object Classes-Common Objects in Context-40(VOC-COCO-40)以及 Visual Object ClassesCommon Objects in Context-60(VOC-COCO-60)数据集中的 4 个指标上均取得了具有竞争力的结果。同时,通过增加未知类目标的图片数量与包含已知类目标的图片数量的比值 wilderness ratio(WR),所提方法在 3 个对比实验共 12项结果中,有 10 项领先于对比方法。消融实验也证明了方法中每一个模块的有效性。结论 本文提出的基于环状原型空间优化的开放集目标检测框架取得了较好的检测效果。通过在实际检测任务中的实验对比,证明了本文方法在不改变模型封闭集识别性能的情况下,有更强的开放集类别检测能力。
Open-set object detection based on annular prototype space optimization
Sun Xuhao1,2,3, Shen Yang1,2,3, Wei Xiushen1,2,3, An Peng4(1.School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;2.Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing 210094, China;3.Jiangsu Key Laboratory of Image and Video Understanding for Social Security, Nanjing 210094, China;4.China National Offshore Oil Corporation Information Technology Center, Beijing 100010, China)
Objective In the close-set setup, object detection identifies objects in a set of images or data in other modalities that belong to the same class in both the training and test phases.Under this setting, modern object detectors have achieved impressive progress.However, the images to be detected in practical tasks usually contain objects of unknown categories.For example, specifying that some fish that meet the size requirements can be caught whereas others that do not meet the requirements are prohibited is common in offshore fishing.Object detectors usually produce two types of errors:The first involves classifying the objects of interest as another object or background, i.e., identifying a known class as a background class or an unknown class.The second occurs when a background sample or an unknown object is mistaken as one of the classes of interest, i.e., identifying a background region or an unknown object region as a known class.Most of the previous detection methods under closed-set conditions can identify unknown and background classes in the open-set setup to some extent after unknown class thresholds are added for screening.However, adjusting these thresholds in real scenarios is challenging for us.Therefore, this study explores the open-set object detection(OSOD)task to improve the robustness of the model in real-world detection tasks.In the open-set environment, the model needs to distinguish not only the known objects contained in the training data but also other objects not contained in the training set.Moreover, the model must delineate the background classes that are neither known nor unknown objects.Method The existing approaches within the OSOD domain typically group background classes and unknown classes into feature sparse classes and classify them as one class.This approach leaves the task of dividing the background class from the unknown class entirely to the final classifier.It is contrary to the original intention of the region proposal networks(RPN)layer to filter the inclusion of object candidate regions.Therefore, we propose a new OSOD framework.On the one hand, we improve the design of the classifier therein through an annular prototype space.Thus, the classifier can focus on identifying known and unknown classes.In particular, the detector can layer known classes, unknown classes, and background classes.Thus, known classes become dense in the high-dimensional space through prototype learning optimization, whereas background classes become sparse in the high-dimensional space.This scenario helps improve the detection performance.On the other hand, we filter out the background classes by randomly masking the existing proposal regions, thereby improving the robustness of the RPN layer while retaining the advantage of proposing object candidates with the RPN layer.Moreover, the need for the additional step of background class sampling is eliminated.In particular, the feature vectors generated for the regions belonging to the unknown category change considerably after a small random mask sampling.However, the feature vectors generated for the regions belonging to the background category do not change considerably after a small random mask sampling.Thus, the module corrects the regions identified as unknown categories.Result The proposed method is experimented with on the OSOD benchmark, which consists of PASCAL Visual Object Classes(PASCAL VOC)and Microsoft common objects in context(MS COCO).The train-val set of VOC is used for close-set training.Moreover, 20 VOC and 60 non-VOC sets in COCO are used to evaluate the proposed method under different open-set conditions.The comparison methods contain Faster-CNN(FR-CNN), placeholders for open-set recognition(PROSER), open world object detector (ORE), dropout sampling(DS), and open-set detector(OpenDet).OpenDet is currently the state-of-the-art method in the field of OSOD.In particular, we adopt two settings to prove the effectiveness of our method.For setting one, we gradually increase the number of unknown classes and build three joint datasets called Visual Object Classes-Common Objects in Context-20 (VOC-COCO-20), Visual Object Classes-Common Objects in Context-40 (VOC-COCO-40), and Visual Object Classes-Common Objects in Context-60(VOC-COCO-60).The proposed method outperforms other methods by a large margin in all targets and achieves new state-of-the-art results in OSOD.For example, our method gains approximately 26%, 32%, and 15.88 on wilderness impact(WI), absolute open-set error(AOSE)and APU, respectively, without compromising the mAPK(58.85% vs.58.45%)on the VOC-COCO-20 dataset.Compared with the state-of-the-art method, our method gains approximately 8%, 5%, and 15% on WI, AOSE and APU, respectively, on average on the three compared datasets.For setting two, we gradually increase the frequency of frames that may have unknowns, named the wilderness ratio, to construct three joint datasets:Visual Object Classes-Common Objects in Context-0.5n(VOC-COCO-0.5n), Visual Object Classes-Common Objects in Context-n(VOC-COCO-n), and Visual Object Classes-Common Objects in Context-4n(VOC-COCO-4n).The proposed method achieves new state-of-the-art results in 10 out of 12 targets from three comparison experiments in open-set object detection.The ablation study also demonstrates the effectiveness of each module in the proposed method.Conclusion In this study, the OSOD framework improved by the annular prototype space is adaptable to the OSOD problem.The comparison of the effects of baseline methods, the current state-of-the-art method, and our proposed method on the OSOD benchmark settings show that the proposed method can accurately detect open-set categories and background categories without changing the performance of the close-set object detection of the vanilla backbone.In future work, we hope to investigate further the correlation between known and unknown class detection performance and extend the categories to be detected to research areas, such as out-of-distribution and fine-grained image analysis.