结合环状原型空间优化的开放集目标检测
Open-set object detection based on annular prototype space optimization
- 2023年28卷第9期 页码:2719-2732
纸质出版日期: 2023-09-16
DOI: 10.11834/jig.220992
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-09-16 ,
移动端阅览
孙旭豪, 沈阳, 魏秀参, 安鹏. 2023. 结合环状原型空间优化的开放集目标检测. 中国图象图形学报, 28(09):2719-2732
Sun Xuhao, Shen Yang, Wei Xiushen, An Peng. 2023. Open-set object detection based on annular prototype space optimization. Journal of Image and Graphics, 28(09):2719-2732
目的
2
现有目标检测任务常在封闭集设定中进行。然而在现实问题中,待检测图片中往往包含未知类别目标。因此,在保证模型对已知类检测性能的基础上,为了提升模型在现实检测任务中对新增类别的目标检测能力,本文对开放集目标检测任务进行研究。
方法
2
区别于现有的开放集目标检测框架在检测任务中将背景类与未知类视为一个类别进行优化,本文框架在进行开放集类别识别的过程中,优先识别候选框属于背景类或是含待识别目标类别,而后再对含待识别目标类别进行已知类与未知类的判别。本文提出基于环状原型空间优化的检测器,该检测器可以通过优化待检测框的特征在高维空间中的稀疏程度对已知类、未知类与背景类进行环状序列判别,从而提升模型对开放集类别的检测性能。在(region proposal networks,RPN)层后设计了随机覆盖候选框的方式筛选相关的背景类训练框,避免了以往开放集检测工作中繁杂的背景类采样步骤。
结果
2
本文方法在保证模型对封闭集设定下检测性能的情况下,通过逐步增加未知类别的数量,在Visual Object Classes-Common Objects in Context-20 (VOC-COCO-20),Visual Object Classes-Common Objects in Context-40 (VOC-COCO-40)以及Visual Object Classes-Common Objects in Context-60 (VOC-COCO-60)数据集中的4个指标上均取得了具有竞争力的结果。同时,通过增加未知类目标的图片数量与包含已知类目标的图片数量的比值wilderness ratio (WR),所提方法在3个对比实验共12项结果中,有10项领先于对比方法。消融实验也证明了方法中每一个模块的有效性。
结论
2
本文提出的基于环状原型空间优化的开放集目标检测框架取得了较好的检测效果。通过在实际检测任务中的实验对比,证明了本文方法在不改变模型封闭集识别性能的情况下,有更强的开放集类别检测能力。
Objective
2
In the close-set setup, object detection identifies objects in a set of images or data in other modalities that belong to the same class in both the training and test phases. Under this setting, modern object detectors have achieved impressive progress. However, the images to be detected in practical tasks usually contain objects of unknown categories. For example, specifying that some fish that meet the size requirements can be caught whereas others that do not meet the requirements are prohibited is common in offshore fishing. Object detectors usually produce two types of errors: The first involves classifying the objects of interest as another object or background, i.e., identifying a known class as a background class or an unknown class. The second occurs when a background sample or an unknown object is mistaken as one of the classes of interest, i.e., identifying a background region or an unknown object region as a known class. Most of the previous detection methods under closed-set conditions can identify unknown and background classes in the open-set setup to some extent after unknown class thresholds are added for screening. However, adjusting these thresholds in real scenarios is challenging for us. Therefore, this study explores the open-set object detection (OSOD) task to improve the robustness of the model in real-world detection tasks. In the open-set environment, the model needs to distinguish not only the known objects contained in the training data but also other objects not contained in the training set. Moreover, the model must delineate the background classes that are neither known nor unknown objects.
Method
2
The existing approaches within the OSOD domain typically group background classes and unknown classes into feature sparse classes and classify them as one class. This approach leaves the task of dividing the background class from the unknown class entirely to the final classifier. It is contrary to the original intention of the region proposal networks (RPN) layer to filter the inclusion of object candidate regions. Therefore, we propose a new OSOD framework. On the one hand, we improve the design of the classifier therein through an annular prototype space. Thus, the classifier can focus on identifying known and unknown classes. In particular, the detector can layer known classes, unknown classes, and background classes. Thus, known classes become dense in the high-dimensional space through prototype learning optimization, whereas background classes become sparse in the high-dimensional space. This scenario helps improve the detection performance. On the other hand, we filter out the background classes by randomly masking the existing proposal regions, thereby improving the robustness of the RPN layer while retaining the advantage of proposing object candidates with the RPN layer. Moreover, the need for the additional step of background class sampling is eliminated. In particular, the feature vectors generated for the regions belonging to the unknown category change considerably after a small random mask sampling. However, the feature vectors generated for the regions belonging to the background category do not change considerably after a small random mask sampling. Thus, the module corrects the regions identified as unknown categories.
Result
2
The proposed method is experimented with on the OSOD benchmark, which consists of PASCAL Visual Object Classes (PASCAL VOC) and Microsoft common objects in context (MS COCO). The train-val set of VOC is used for close-set training. Moreover, 20 VOC and 60 non-VOC sets in COCO are used to evaluate the proposed method under different open-set conditions. The comparison methods contain Faster-CNN (FR-CNN), placeholders for open-set recognition (PROSER), open world object detector (ORE), dropout sampling (DS), and open-set detector (OpenDet). OpenDet is currently the state-of-the-art method in the field of OSOD. In particular, we adopt two settings to prove the effectiveness of our method. For setting one, we gradually increase the number of unknown classes and build three joint datasets called Visual Object Classes-Common Objects in Context-20 (VOC-COCO-20), Visual Object Classes-Common Objects in Context-40 (VOC-COCO-40), and Visual Object Classes-Common Objects in Context-60 (VOC-COCO-60). The proposed method outperforms other methods by a large margin in all targets and achieves new state-of-the-art results in OSOD. For example, our method gains approximately 26%, 32%, and 15.88 on wilderness impact (WI), absolute open-set error (AOSE) and
<math id="M1"><mi mathvariant="normal">A</mi><msub><mrow><mi mathvariant="normal">P</mi></mrow><mrow><mi>U</mi></mrow></msub></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794809&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794801&type=
5.24933338
3.21733332
, respectively, without compromising the
<math id="M2"><mi mathvariant="normal">m</mi><mi mathvariant="normal">A</mi><msub><mrow><mi mathvariant="normal">P</mi></mrow><mrow><mi>K</mi></mrow></msub></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794863&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794847&type=
7.45066643
3.21733332
(58.85% vs. 58.45%) on the VOC-COCO-20 dataset. Compared with the state-of-the-art method, our method gains approximately 8%, 5%, and 15% on WI, AOSE and
<math id="M3"><mi mathvariant="normal">A</mi><msub><mrow><mi mathvariant="normal">P</mi></mrow><mrow><mi>U</mi></mrow></msub></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794909&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=47794880&type=
5.24933338
3.21733332
, respectively, on average on the three compared datasets. For setting two, we gradually increase the frequency of frames that may have unknowns, named the wilderness ratio, to construct three joint datasets: Visual Object Classes-Common Objects in Context-0.5
n
(VOC-COCO-0.5
n
), Visual Object Classes-Common Objects in Context-
n
(VOC-COCO-
n
), and Visual Object Classes-Common Objects in Context-4
n
(VOC-COCO-4
n
). The proposed method achieves new state-of-the-art results in 10 out of 12 targets from three comparison experiments in open-set object detection. The ablation study also demonstrates the effectiveness of each module in the proposed method.
Conclusion
2
In this study, the OSOD framework improved by the annular prototype space is adaptable to the OSOD problem. The comparison of the effects of baseline methods, the current state-of-the-art method, and our proposed method on the OSOD benchmark settings show that the proposed method can accurately detect open-set categories and background categories without changing the performance of the close-set object detection of the vanilla backbone. In future work, we hope to investigate further the correlation between known and unknown class detection performance and extend the categories to be detected to research areas, such as out-of-distribution and fine-grained image analysis.
开放集目标检测(OSOD)原型学习开放集识别(OSR)目标检测深度神经网络
open-set object detection(OSOD)prototype learningopen-set recognition(OSR)object detectiondeep neural network
Arik S Ö and Pfister T. 2020. ProtoAttend: attention-based prototypical learning. Journal of Machine Learning Research, 21(1): #210
Bendale A and Boult T E. 2016. Towards open set deep networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1563-1572 [DOI: 10.1109/CVPR.2016.173http://dx.doi.org/10.1109/CVPR.2016.173]
Chen G Y, Qiao L M, Shi Y M, Peng P C, Li J, Huang T J, Pu S L and Tian Y H. 2020. Learning open set network with discriminative reciprocal points//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 507-522 [DOI: 10.1007/978-3-030-58580-8_30http://dx.doi.org/10.1007/978-3-030-58580-8_30]
Chen Z M, Wei X S, Wang P and Guo Y. 2019. Multi-label image recognition with graph convolutional networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5172-5181 [DOI: 10.1109/CVPR.2019.00532http://dx.doi.org/10.1109/CVPR.2019.00532]
DeVries T and Taylor G W. 2018. Learning confidence for out-of-distribution detection in neural networks [EB/OL]. [2018-02-13]. https://arxiv.org/pdf/1802.04865.pdfhttps://arxiv.org/pdf/1802.04865.pdf
Dhamija A R, Günther M and Boult T E. 2018. Reducing network agnostophobia//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 9157-9166
Dhamija A R, Günther M, Ventura J and Boult T E. 2020. The overlooked elephant of object detection: open set//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 1010-1019 [DOI: 10.1109/WACV45572.2020.9093355http://dx.doi.org/10.1109/WACV45572.2020.9093355]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4]
Gal Y and Ghahramani Z. 2016. Dropout as a Bayesian approximation: representing model uncertainty in deep learning//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA: JMLR.org: 1050-1059
Gao T Y, Han X, Liu Z Y and Sun M S. 2019. Hybrid attention-based prototypical networks for noisy few-shot relation classification//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 6407-6414 [DOI: 10.1609/aaai.v33i01.33016407http://dx.doi.org/10.1609/aaai.v33i01.33016407]
Ge Z Y, Demyanov S, Chen Z and Garnavi R. 2017. Generative openMax for multi-class open set classification [EB/OL]. [2017-07-24]. https://arxiv.org/pdf/1707.07418.pdfhttps://arxiv.org/pdf/1707.07418.pdf
Geng C X, Huang S J and Chen S C. 2021. Recent advances in open set recognition: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3614-3631 [DOI: 10.1109/TPAMI.2020.2981604http://dx.doi.org/10.1109/TPAMI.2020.2981604]
Grandvalet Y and Bengio Y. 2004. Semi-supervised learning by entropy minimization//Proceedings of the 17th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 529-536
Han J M, Ren Y Q, Ding J, Pan X J, Yan K and Xia G S. 2022. Expanding low-density latent regions for open-set object detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 9581-9590 [DOI: 10.1109/CVPR52688.2022.00937http://dx.doi.org/10.1109/CVPR52688.2022.00937]
He X J and Lin J F. 2022. Weakly-supervised object localization based fine-grained few-shot learning. Journal of Image and Graphics, 27(7): 2226-2239
贺小箭, 林金福. 2022. 融合弱监督目标定位的细粒度小样本学习. 中国图象图形学报, 27(7): 2226- 2239 [DOI: 10.11834/jig.200849http://dx.doi.org/10.11834/jig.200849]
Hu H, Gu J Y, Zhang Z, Dai J F and Wei Y C. 2018. Relation networks for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3588-3597 [DOI: 10.1109/CVPR.2018.00378http://dx.doi.org/10.1109/CVPR.2018.00378]
Jain L P, Scheirer W J and Boult T E. 2014. Multi-class open set recognition using probability of inclusion//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 393-409 [DOI: 10.1007/978-3-319-10578-9_26http://dx.doi.org/10.1007/978-3-319-10578-9_26]
Jia K X, Ma Z H, Zhu R and Li Y G. 2022. Attention-mechanism-based light single shot multiBox detector modelling improvement for small object detection on the sea surface. Journal of Image and Graphics, 27(4): 1161-1175
贾可心, 马正华, 朱蓉, 李永刚. 2022. 注意力机制改进轻量SSD模型的海面小目标检测. 中国图象图形学报, 27(4): 1161-1175 [DOI: 10.11834/jig.200517http://dx.doi.org/10.11834/jig.200517]
Joseph K J, Khan S, Khan F S and Balasubramanian V N. 2021. Towards open world object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5826-5836 [DOI: 10.1109/CVPR46437.2021.00577http://dx.doi.org/10.1109/CVPR46437.2021.00577]
Kong T, Sun F C, Liu H P, Jiang Y N, Li L and Shi J B. 2020. FoveaBox: beyound anchor-based object detection. IEEE Transactions on Image Processing, 29: 7389-7398 [DOI: 10.1109/TIP.2020.3002345http://dx.doi.org/10.1109/TIP.2020.3002345]
Lakshminarayanan B, Pritzel A and Blundell C. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA. Curran Associates Inc.: 6405-6416
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Miller D, Nicholson L, Dayoub F and Sünderhauf N. 2018. Dropout sampling for robust object detection in open-set conditions//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 3243-3249 [DOI: 10.1109/ICRA.2018.8460700http://dx.doi.org/10.1109/ICRA.2018.8460700]
Neal L, Olson M, Fern X, Wong W K and Li F. 2018. Open set learning with counterfactual images//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 620-635 [DOI: 10.1007/978-3-030-01231-1_38http://dx.doi.org/10.1007/978-3-030-01231-1_38]
Ren M Y, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H and Zemel R S. 2018. Meta-learning for semi-supervised few-shot classification [EB/OL]. [2022-09-29]. https://arxiv.org/pdf/1803.00676.pdfhttps://arxiv.org/pdf/1803.00676.pdf
Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montréal, Canada: MIT Press: 91-99
Scheirer W J, Jain L P and Boult T E. 2014. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11): 2317-2324 [DOI: 10.1109/TPAMI.2014.2321392http://dx.doi.org/10.1109/TPAMI.2014.2321392]
Shang T, Zhao Z, Ren X J and Liu J W. 2021. Differential identifiability clustering algorithms for big data analysis. Science China Information Sciences, 64(5): #152101 [DOI: 10.1007/s11432-020-2910-1http://dx.doi.org/10.1007/s11432-020-2910-1]
Shen Y, Sun X H, Wei X S, Hu H X and Chen Z P. 2022. A channel mix method for fine-grained cross-modal retrieval//Proceedings of 2022 IEEE International Conference on Multimedia and Expo. Taipei, China: IEEE: 1-6 [DOI: 10.1109/ICME52920.2022.9859609http://dx.doi.org/10.1109/ICME52920.2022.9859609]
Shu Y, Shi Y M, Wang Y W, Huang T J and Tian Y H. 2020. P-ODN: prototype-based open deep network for open set recognition. Scientific Reports, 10(1): #7146 [DOI: 10.1038/s41598-020-63649-6http://dx.doi.org/10.1038/s41598-020-63649-6]
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 4080-4090
Sun X, Yang Z N, Zhang C, Ling K V and Peng G H. 2020. Conditional gaussian distribution learning for open set recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13477-13486 [DOI: 10.1109/CVPR42600.2020.01349http://dx.doi.org/10.1109/CVPR42600.2020.01349]
Tan M X, Pang R M and Le Q V. 2020. EfficientDet: scalable and efficient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10778-10787 [DOI: 10.1109/CVPR42600.2020.01079http://dx.doi.org/10.1109/CVPR42600.2020.01079]
Wang C Q, Min S B, Chen X J, Sun X Y and Li H Q. 2021. Dual progressive prototype network for generalized zero-shot learning//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual: OpenReview.net: 2936-2948
Wang K X, Liew J H, Zou Y T, Zhou D Q and Feng J S. 2019. PANet: few-shot image semantic segmentation with prototype alignment//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9196-9205 [DOI: 10.1109/ICCV.2019.00929http://dx.doi.org/10.1109/ICCV.2019.00929]
Wei X S, Shen Y, Sun X H, Ye H J and Yang J. 2021. A2-Net: learning attribute-aware hash codes for large-scale fine-grained image retrieval//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual: OpenReview.net: 5720-5730
Wei X S, Xu Y Y and Yang J. 2022. Review of webly-supervised fine-grained image recognition. Journal of Image and Graphics, 27(7): 2057-2077
魏秀参, 许玉燕, 杨健. 2022. 网络监督数据下的细粒度图像识别综述. 中国图象图形学报, 27(7): 2057-2077 [DOI: 10.11834/jig.210188http://dx.doi.org/10.11834/jig.210188]
Weston J, Collobert R, Sinz F, Bottou L and Vapnik V. 2006. Inference with the universum//Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA: ACM: 1009-1016 [DOI: 10.1145/1143844.1143971http://dx.doi.org/10.1145/1143844.1143971]
Xiong Y Y, Yang P P and Liu C L. 2021. One-stage open set object detection with prototype learning//Proceedings of the 28th International Conference on Neural Information Processing. Sanur, Indonesia: Springer: 279-291 [DOI: 10.1007/978-3-030-92185-9_23http://dx.doi.org/10.1007/978-3-030-92185-9_23]
Xu Y Y, Shen Y, Wei X S and Yang J. 2022. Webly-supervised fine-grained recognition with partial label learning//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: ijcai.org: 1502-1508 [DOI: 10.24963/ijcai.2022/209http://dx.doi.org/10.24963/ijcai.2022/209]
Yan Z X, Hou Z Q, Xiong L, Liu X Y, Yu W S and Ma S G. 2021. Fine-grained classification based on bilinear feature fusion and YOLOv3. Journal of Image and Graphics, 26(4): 847-856
闫子旭, 侯志强, 熊磊, 刘晓义, 余旺盛, 马素刚. 2021. YOLOv3和双线性特征融合的细粒度图像分类. 中国图象图形学报, 26(4): 847-856 [DOI: 10.11834/jig.200031http://dx.doi.org/10.11834/jig.200031]
Yang H M, Zhang X Y, Yin F and Liu C L. 2018. Robust classification with convolutional prototype learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3474-3482 [DOI: 10.1109/CVPR.2018.00366http://dx.doi.org/10.1109/CVPR.2018.00366]
Yoshihashi R, Shao W, Kawakami R, You S, Iida M and Naemura T. 2019. Classification-reconstruction learning for open-set recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4011-4020 [DOI: 10.1109/CVPR.2019.00414http://dx.doi.org/10.1109/CVPR.2019.00414]
Zhou B Y, Cui Q, Wei X S and Chen Z M. 2020. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9716-9725 [DOI: 10.1109/CVPR42600.2020.00974http://dx.doi.org/10.1109/CVPR42600.2020.00974]
Zhou D, Ye H J and Zhan D C. 2021. Learning placeholders for Open-Set recognition//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4399-4408 [DOI: 10.1109/CVPR46437.2021.00438http://dx.doi.org/10.1109/CVPR46437.2021.00438]
相关作者
相关机构