ProMIS:概率图采样图像增广驱动的弱监督物体检测方法
ProMIS: probability-based multi-object image synthesis-relevant weakly supervised object detection method
- 2023年28卷第7期 页码:2037-2053
纸质出版日期: 2023-07-16
DOI: 10.11834/jig.220141
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-07-16 ,
移动端阅览
李笑颜, 阚美娜, 梁浩, 山世光. 2023. ProMIS:概率图采样图像增广驱动的弱监督物体检测方法. 中国图象图形学报, 28(07):2037-2053
Li Xiaoyan, Kan Meina, Liang Hao, Shan Shiguang. 2023. ProMIS: probability-based multi-object image synthesis-relevant weakly supervised object detection method. Journal of Image and Graphics, 28(07):2037-2053
目的
2
弱监督物体检测是一种仅利用图像类别标签训练物体检测器的技术。近年来弱监督物体检测器的精度不断提高,但在如何提升检出物体的完整性、如何从多个同类物体中区分出单一个体的问题上仍面临极大挑战。围绕上述问题,提出了基于物体布局后验概率图进行多物体图像增广的弱监督物体检测方法ProMIS(probability-based multi-object image synthesis)。
方法
2
将检出物体存储到物体候选池,并将候选池中的物体插入到输入图像中,构造带有伪边界框标注的增广图像,进而利用增广后的图像训练弱监督物体检测器。该方法包含图像增广与弱监督物体检测两个相互作用的模块。图像增广模块将候选池中的物体插入一幅输入图像,该过程通过后验概率的估计与采样对插入物体的类别、位置和尺度进行约束,以保证增广图像的合理性;弱监督物体检测模块利用增广后的多物体图像、对应的类别标签、物体伪边界框标签训练物体检测器,并将原始输入图像上检到的高置信度物体储存到物体候选池中。训练过程中,为了避免过拟合,本文在基线算法的基础上增加一个并行的检测分支,即基于增广边界框的检测分支,该分支利用增广得到的伪边界框标注进行训练,原有基线算法的检测分支仍使用图像标签进行训练。测试时,本文方法仅使用基于增广边界框的检测分支产生检测结果。本文提出的增广策略和检测器的分支结构在不同弱监督物体检测器上均适用。
结果
2
在Pascal VOC(pattern analysis, statistical modeling and computational learning visual object classes)2007和Pascal VOC 2012数据集上,将该方法嵌入到多种现有的弱监督物体检测器中,平均精度均值(mean average precision,mAP)平均获得了2.9%和4.2%的提升。
结论
2
本文证明了采用弱监督物体检测伪边界框标签生成的增广图像包含丰富信息,能够辅助弱监督检测器学习物体部件、整体以及多物体簇之间的区别。
Objective
2
Neural networks based fully supervised object detectors can be an essential way to improve the performance of object detection, and it is more reliable for real-world applications to a certain extent. However, it is still challenging for annotating huge amounts of data. A bounding box-related labor-intensive labeling task is required to be resolved for multiple categories and application scenarios. To meet multiple real-world applications, it is challenging to collect large-scale detection training datasets as well. Thus, a weakly supervised object detector is designed for its optimization through image category annotations only. Recent weakly supervised object detectors are focused on the multi-instance learning (MIL) technique. In these methods, object proposals are classified and aggregated into an image classification result, and objects are detected by selecting the bounding box that contributes most to the aggregated image classification results among all object proposals. However, since weakly supervised object detection lacks instance-level annotations, a challenging issue of differentiation needs to be resolved for instance from a part of the instance or a cluster of multiple instances of the same category. For training the object detector, our method proposed is focused on the learning ability to distinguish instances by inserting high confidence-relevant detected objects into an input image and generating augmented images along with pseudo bounding box annotations. However, the naive random augmentation method can not immediately improve the detection performance, owing to the following reasons: 1) over-fitting: the generated data is used to train the detection head itself; 2) infeasible augmentation: spatial distribution of the generated objects is often quite heterogenous from the real data since the hyper-parameters of the insertion are all sampled from uniform distributions.
Method
2
To resolve these issues mentioned above, a probability-based multi-object image synthesis (ProMIS) relevant weakly supervised object detection method is developed in terms of two iterative and interactive modules, namely the image augmentation module and the weakly supervised object detection module. For each training iteration, objects are detected in the original input image with the weakly supervised object detector (to ensure accuracy during the initial training, the detector is pre-trained according to its baseline method), and the highly confident detected objects are stored in an object-pending pool for the latter image augmentation. The image augmentation module inserts one or more objects sampled from the object-pending pool to the input image for an augmented training image with pseudo bounding box annotations. To make the augmented image more feasible, the referenced object category, position, and scale for the insertion are sampled from the detected objects-oriented posterior probability maps in this image. Three kinds of posterior probabilities are illustrated in the ProMIS in charge of describing the category, spacial and scale relations of an object and another referenced object, respectively. First, these posterior probabilities can be estimated online according to the objects detected in the previous training iterations, and the hyper-parameters of the newly inserted objects are assumed to obey these posterior probabilities. Then, the detection training module exploits the augmented image and its pseudo annotations to train the weakly supervised object detector. In the training process, to avoid over-fitting to the detected false positives, a new parallel detection branch is added to the baseline weakly supervised object detection head. The augmented bounding box annotations are only used to guide the newly added branch, while the original weakly supervised detection head is employed during the generation of the augmented data and it is trained on the basis of image-level labels only. In the inference process, only the added branch trained with the augmented annotation is kept for generating the testing results, which keeps the efficiency of the weakly supervised object detector in inference. The above image augmentation module and the weakly supervised object detection module can be used iteratively and interactively, and the weakly supervised object detector is facilitated to learn the ability for distinguishing instances steadily. The proposed ProMIS is an online augmentation method and does not require any additional images or annotations except the original weakly supervised detection training data. In addition, since the proposed approach is independent of the selection of the weakly supervised object detector, the proposed augmentation paradigm is generalized for all detector architectures.
Result
2
In the experiments, the effectiveness of the proposed parallel detection branch and the posterior probability maps is verified, and they improve the naive random augmentation method by 5.2% and 2.2%, respectively. The proposed ProMIS approach is applied to multiple previous weakly supervised object detectors (including online instance classifier refinement (OICR), segmentation-detection collaborative network (SDCN), and online instance classifier refinement with deep residual network(OICR-DRN)). Compared to these baseline methods, it achieves an average of 2.9% and 4.2% improvements on the Pascal VOC (pattern analysis, statistical modeling and computational learning visual object classes) 2007 and the Pascal VOC 2012 datasets, respectively. Furthermore, ablation analysis is carried out as well, and it is found that the proposed ProMIS can decrease the error mode of the ground-truth in the hypothesis and the hypothesis in the ground-truth.
Conclusion
2
It is demonstrated that ProMIS make fewer mistakes when distinguishing instances from its parts or multiple instances of the same category.
弱监督物体检测多物体数据增广图像融合概率图采样后验概率估计
weakly supervised object detectionmulti-object data augmentationimage synthesisprobability map samplingposterior probability estimation
Bilen H and Vedaldi A. 2016. Weakly supervised deep detection networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2846-2854 [DOI: 10.1109/CVPR.2016.311http://dx.doi.org/10.1109/CVPR.2016.311]
Bottou L. 1998. On-line learning and stochastic approximations//On-line Learning in Neural Networks. Cambridge, USA: Cambridge University Press: 9-42 [DOI: 10.1017/CBO9780511569920.003http://dx.doi.org/10.1017/CBO9780511569920.003]
Chen K, Wang J Q, Pang J M, Cao Y H, Xiong Y, Li X X, Sun S Y, Feng W S, Liu Z W, Xu J R, Zhang Z, Cheng D Z, Zhu C C, Cheng T H, Zhao Q J, Li B Y, Lu X, Zhu R, Wu Y, Dai J J, Wang J D, Shi J P, Ouyang W L, Loy C C and Lin D H. 2019. MMDetection: open mmlab detection toolbox and benchmark [EB/OL]. [2023-05-02]. https://arxiv.org/pdf/1906.07155v1.pdfhttps://arxiv.org/pdf/1906.07155v1.pdf
Chen Z, Fu Z H, Jiang R X, Chen Y W and Hua X S. 2020. SLV: spatial likelihood voting for weakly supervised object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 12992-13001 [DOI: 10.1109/CVPR42600.2020.01301http://dx.doi.org/10.1109/CVPR42600.2020.01301]
Cinbis R G, Verbeek J and Schmid C. 2014. Multi-fold mil training for weakly supervised object localization//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2409-2416 [DOI: 10.1109/CVPR.2014.309http://dx.doi.org/10.1109/CVPR.2014.309]
Deselaers T, Alexe B and Ferrari V. 2012. Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3): 275-293 [DOI: 10.1007/s11263-012-0538-3http://dx.doi.org/10.1007/s11263-012-0538-3]
Dwibedi D, Misra I and Hebert M. 2017. Cut, paste and learn: surprisingly easy synthesis for instance detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 1310-1319 [DOI: 10.1109/ICCV.2017.146http://dx.doi.org/10.1109/ICCV.2017.146]
Everingham M, Eslami S M A, Van Gool L, Williams C K I, Winn J and Zisserman A. 2015. The pascal visual object classes challenge: a retrospective. International Journal of Computer Vision, 111(1): 98-136 [DOI: 10.1007/s11263-014-0733-5http://dx.doi.org/10.1007/s11263-014-0733-5]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4]
Fang H S, Sun J H, Wang R Z, Gou M H, Li Y L and Lu C W. 2019. InstaBoost: boosting instance segmentation via probability map guided copy-pasting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seuol, Korea (South): IEEE: 682-691 [DOI: 10.1109/ICCV.2019.00077http://dx.doi.org/10.1109/ICCV.2019.00077]
Ghiasi G, Cui Y, Srinivas A, Qian R, Lin T Y, Cubuk E D, Le Q V and Zoph B. 2021. Simple copy-paste is a strong data augmentation method for instance segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 2917-2927 [DOI: 10.1109/CVPR46437.2021.00294http://dx.doi.org/10.1109/CVPR46437.2021.00294]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Kisantal M, Wojna Z, Murawski J, Naruniec J and Cho K. 2019. Augmentation for small object detection//Proceedings of the 9th International Conference on Advances in Computing and Information Technology (ACITY 2019). Chennai, India: Aircc Publishing Corporation: 119-133 [DOI: 10.5121/csit.2019.91713http://dx.doi.org/10.5121/csit.2019.91713]
Kosugi S, Yamasaki T and Aizawa K. 2019. Object-aware instance labeling for weakly supervised object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seuol, Korea (South): IEEE: 6063-6071 [DOI: 10.1109/ICCV.2019.00616http://dx.doi.org/10.1109/ICCV.2019.00616]
Li X Y, Kan M N, Shan S G and Chen X L. 2019. Weakly supervised object detection with segmentation collaboration//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seuol, Korea (South): IEEE: 9734-9743 [DOI: 10.1109/ICCV.2019.00983http://dx.doi.org/10.1109/ICCV.2019.00983]
Pérez P, Gangnet M and Blake A. 2003. Poisson image editing. ACM Transactions on Graphics, 22(3): 313-318 [DOI: 10.1145/882262.882269http://dx.doi.org/10.1145/882262.882269]
Ren Z Z, Yu Z D, Yang X D, Liu M Y, Lee Y J, Schwing A G and Kautz J. 2020. Instance-aware, context-focused, and memory-efficient weakly supervised object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 10595-10604 [DOI: 10.1109/CVPR42600.2020.01061http://dx.doi.org/10.1109/CVPR42600.2020.01061]
Shen Y H, Ji R R, Wang Y, Chen Z W, Zheng F, Huang F Y and Wu Y S. 2020a. Enabling deep residual networks for weakly supervised object detection//Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer: 118-136 [DOI: 10.1007/978-3-030-58598-3_8http://dx.doi.org/10.1007/978-3-030-58598-3_8]
Shen Y H, Ji R R, Wang Y, Wu Y J and Cao L J. 2019. Cyclic guidance for weakly supervised joint detection and segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 697-707 [DOI: 10.1109/CVPR.2019.00079http://dx.doi.org/10.1109/CVPR.2019.00079]
Shen Y H, Ji R R, Yang K Y, Deng C and Wang C H. 2020b. Category-aware spatial constraint for weakly supervised detection. IEEE Transactions on Image Processing, 29: 843-858 [DOI: 10.1109/TIP.2019.2933735http://dx.doi.org/10.1109/TIP.2019.2933735]
Shen Y H, Ji R R, Zhang S C, Zuo W M and Wang Y. 2018. Generative adversarial learning towards fast weakly supervised detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5764-5773 [DOI: 10.1109/CVPR.2018.00604http://dx.doi.org/10.1109/CVPR.2018.00604]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2023-05-02]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Singh K K and Lee Y J. 2019. You reap what you sow: using videos to generate high precision object proposals for weakly-supervised object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9406-9414 [DOI: 10.1109/CVPR.2019.00964http://dx.doi.org/10.1109/CVPR.2019.00964]
Tang P, Wang X G, Bai X and Liu W Y. 2017. Multiple instance detection network with online instance classifier refinement//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 3059-3067 [DOI: 10.1109/CVPR.2017.326http://dx.doi.org/10.1109/CVPR.2017.326]
Tang P, Wang X G, Wang A T, Yan Y L, Liu W Y, Huang J Z and Yuille A. 2018. Weakly supervised region proposal network and object detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 370-386 [DOI: 10.1007/978-3-030-01252-6_22http://dx.doi.org/10.1007/978-3-030-01252-6_22]
Wan F, Liu C, Ke W, Ji X Y, Jiao J B and Ye Q X. 2019a. C-MIL: continuation multiple instance learning for weakly supervised object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2194-2203 [DOI: 10.1109/CVPR.2019.00230http://dx.doi.org/10.1109/CVPR.2019.00230]
Wan F, Wei P X, Han Z J, Jiao J B and Ye Q X. 2019b. Min-entropy latent model for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10): 2395-2409 [DOI: 10.1109/TPAMI.2019.2898858http://dx.doi.org/10.1109/TPAMI.2019.2898858]
Wei Y C, Shen Z Q, Cheng B W, Shi H H, Xiong J J, Feng J S and Huang T. 2018. TS2C: tight box mining with surrounding segmentation context for weakly supervised object detection//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 454-470 [DOI: 10.1007/978-3-030-01252-6_27http://dx.doi.org/10.1007/978-3-030-01252-6_27]
Zhang X P, Feng J S, Xiong H K and Tian Q. 2018a. Zigzag learning for weakly supervised object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4262-4270 [DOI: 10.1109/CVPR.2018.00448http://dx.doi.org/10.1109/CVPR.2018.00448]
Zhang X P, Yang Y and Feng J S. 2018b. ML-LocNet: improving object localization with multi-view learning network//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 248-263 [DOI: 10.1007/978-3-030-01219-9_15http://dx.doi.org/10.1007/978-3-030-01219-9_15]
相关文章
相关作者
相关机构