视觉弱监督学习研究进展
Progress in weakly supervised learning for visual understanding
- 2022年27卷第6期 页码:1768-1798
纸质出版日期: 2022-06-16 ,
录用日期: 2022-03-15
DOI: 10.11834/jig.220178
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-06-16 ,
录用日期: 2022-03-15
移动端阅览
任冬伟, 王旗龙, 魏云超, 孟德宇, 左旺孟. 视觉弱监督学习研究进展[J]. 中国图象图形学报, 2022,27(6):1768-1798.
Dongwei Ren, Qilong Wang, Yunchao Wei, Deyu Meng, Wangmeng Zuo. Progress in weakly supervised learning for visual understanding[J]. Journal of Image and Graphics, 2022,27(6):1768-1798.
视觉理解,如物体检测、语义和实例分割以及动作识别等,在人机交互和自动驾驶等领域中有着广泛的应用并发挥着至关重要的作用。近年来,基于全监督学习的深度视觉理解网络取得了显著的性能提升。然而,物体检测、语义和实例分割以及视频动作识别等任务的数据标注往往需要耗费大量的人力和时间成本,已成为限制其广泛应用的一个关键因素。弱监督学习作为一种降低数据标注成本的有效方式,有望对缓解这一问题提供可行的解决方案,因而获得了较多的关注。围绕视觉弱监督学习,本文将以物体检测、语义和实例分割以及动作识别为例综述国内外研究进展,并对其发展方向和应用前景加以讨论分析。在简单回顾通用弱监督学习模型,如多示例学习(multiple instance learning
MIL)和期望—最大化(expectation-maximization
EM)算法的基础上,针对物体检测和定位,从多示例学习、类注意力图机制等方面分别进行总结,并重点回顾了自训练和监督形式转换等方法;针对语义分割任务,根据不同粒度的弱监督形式,如边界框标注、图像级类别标注、线标注或点标注等,对语义分割研究进展进行总结分析,并主要回顾了基于图像级别类别标注和边界框标注的弱监督实例分割方法;针对视频动作识别,从电影脚本、动作序列、视频级类别标签和单帧标签等弱监督形式,对弱监督视频动作识别的模型与算法进行回顾,并讨论了各种弱监督形式在实际应用中的可行性。在此基础上,进一步讨论视觉弱监督学习面临的挑战和发展趋势,旨在为相关研究提供参考。
Visual understanding
e.g.
object detection
semantic/instance segmentation
and action recognition
plays a crucial role in many real-world applications including human-machine interaction
autonomous driving
etc. Recently
deep networks have made great progress in these tasks under the full supervision regime. Based on convolutional neural network (CNN)
a series of representative deep models have been developed for these visual understanding tasks
e.g.
you only look once (YOLO) and Fast/Faster R-CNN(region CNN) for object detection
fully convolutional networks(FCN) and DeepLab for semantic segmentation
Mask R-CNN and you only look at coefficients (YOLACT) for instance segmentation. Recently
driven by novel network backbone
e.g.
Transformer
the performance of these tasks have been further boosted under full supervision regime. However
supervised learning relies on massive accurate annotations
which are usually laborious and costly. By taking semantic segmentation as an example
it is very laborious and costly for collecting dense annotations
i.e.
pixel-wise segmentation masks
while weak supervision annotations
e.g.
bounding box annotations
point annotations
are much easier to collect. Moreover
for video action recognition
the scenes in videos are very complicated
and it is very likely to be impossible to annotate all the actions with accurate time intervals. Alternatively
weakly supervised learning is effective in reducing the cost of data annotations
and thus is very important to the development and applications of visual understanding. Taking object detection
semantic/instance segmentation
and action recognition as examples
this article aims to provide a survey on recent progress in weakly supervised visual understanding
while pointing out several challenges and opportunities. To begin with
we first introduce two representative weakly supervised learning methods
including multiple instance learning (MIL) and expectation-maximization (EM) algorithms. Despite of different network architectures in recent weakly supervised learning methods
most existing methods can be categorized into the family of MIL or EM. As for object localization and detection
we respectively review the methods based on MIL and class attention map (CAM)
where self-training and switching between supervision settings are specifically introduced. By formulating weakly supervised object detection (WSOD) as the problem of MIL-based proposal classification
WSOD methods tend to focus on discriminative parts of object
e.g.
head for human or animals may be simply detected to represent the entire object
yielding significant performance drops in comparison to fully supervised object detection. To address this issue
self-training and switching between supervision settings have been respectively developed
and transfer learning has also been introduced to exploit auxiliary information from other tasks
e.g.
semantic segmentation. As for weakly supervised object localization
CAM is a popular solution to predict the object position where objects from one class with the highest activation value can be found. Similarly
CAM based localization methods are also facing the issue of discriminative parts
and several solutions
e.g.
suppressing the most discriminative parts and attention-based self-produced guidance
have been proposed. Based on pattern analysis
statistical modeling and computational learning visual object classes(PASCAL VOC) and Microsoft common objects in context(MS COCO) datasets
several representative weakly supervised object localization and detection methods have been evaluated
showing performance gaps between fully supervised methods. As for semantic segmentation
we consider several representative weak supervision settings including bounding box annotations
image-level class annotations
point or scribble annotations. In comparison to segmentation mask annotations
these weak annotations cannot provide accurate pixel-wise supervision. Image-level class annotations are the most convenient and easiest way
and the key issue of image-level weakly supervised semantic segmentation methods is to exploit the correlation between class labels and segmentation masks. Based on CAM
coarse segmentation results can be obtained
while facing inaccurate segmentation masks and focusing on discriminative parts. To refine segmentation masks
several strategies are introduced including iterative erasing
learning similarity between pixels
and joint learning of saliency detection and weakly supervised semantic segmentation. Point or scribble annotations and bounding box annotations can provide more accurate localization information than image-level class annotations. Among them
bounding box annotations is likely to be a good solution to balance the annotation cost and performance of weakly supervised semantic segmentation under EM framework. Moreover
weakly supervised instance segmentation is more challenging than weakly supervised semantic segmentation
since a pixel is not only assigned to an object class but also is accurately assigned to one specific object. In this article
we consider bounding box annotations and image-level annotations for weakly supervised instance segmentation. Based on image-level class annotations
peak response map in CAM is highly correlated with object instances
and can be adopted in weakly supervised instance segmentation. Based on bounding box annotations
weakly supervised instance segmentation can be formulated as MIL
where instance masks are usually more accurate than those based on image-level class annotations. Besides
in these weakly supervised segmentation methods
post-processing techniques
e.g.
dense conditional random field
are usually adopted to further refine the segmentation masks. On PASCAL VOC and MS COCO datasets
representative weakly supervised semantic and instance segmentation methods with different levels of annotations are evaluated. As for video action recognition
it is much more difficult to collect accurate annotations of all the actions due to complicated scenes in videos
and thus weakly supervised action recognition is attracting research attention in recent years. In this article
we introduce the models and algorithms for different weak supervision settings including film scripts
action sequences
video-level class labels and single-frame labels. Finally
the challenges and opportunities are analyzed and discussed. For these visual understanding tasks
the performance of weakly supervised methods still has improvement room in comparison to fully supervised methods. When applying in the wild
it is also a valuable and challenging topic to exploit large amount of unlabeled and noisy data. In future
weakly supervised visual understanding methods also will benefit from multi-task learning and large-scale pre-trained models. For an example
vision and language pre-trained models
e.g.
contrastive language-image pre-training (CLIP)
is potential to provide knowledge to significantly improve the performance of weakly supervised visual understanding tasks.
弱监督学习目标定位目标检测语义分割实例分割动作识别
weakly supervised learningobject localizationobject detectionsemantic segmentationinstance segmentationaction recognition
Ahn J, Cho S and Kwak S. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2204-2213 [DOI: 10.1109/CVPR.2019.00231http://dx.doi.org/10.1109/CVPR.2019.00231]
Ahn J and Kwak S. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4981-4990 [DOI: 10.1109/CVPR.2018.00523http://dx.doi.org/10.1109/CVPR.2018.00523]
Arbeláez P, Pont-Tuset J, Barron J, Marques F and Malik J. 2014. Multiscale combinatorial grouping//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 328-335 [DOI: 10.1109/CVPR.2014.49http://dx.doi.org/10.1109/CVPR.2014.49]
Arun A, Jawahar C V and Kumar M P. 2020. Weakly supervised instance segmentation by learning annotation consistent instances//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 254-270 [DOI: 10.1007/978-3-030-58604-1_16http://dx.doi.org/10.1007/978-3-030-58604-1_16]
Bearman A, Russakovsky O, Ferrari V and Fei-Fei L. 2016. What's the point: semantic segmentation with point supervision//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 549-565 [DOI: 10.1007/978-3-319-46478-7_34http://dx.doi.org/10.1007/978-3-319-46478-7_34]
Bilen H and Vedaldi A. 2016. Weakly supervised deep detection networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2846-2854 [DOI: 10.1109/CVPR.2016.311http://dx.doi.org/10.1109/CVPR.2016.311]
Blömer J and Bujna K. 2013. Simple methods for initializing the EM algorithm for Gaussian mixture models [EB/OL]. [2022-03-05].https://arxiv.org/pdf/1312.5946.pdfhttps://arxiv.org/pdf/1312.5946.pdf
Bojanowski P, Lajugie R, Bach F, Laptev I, Ponce J, Schmid C and Sivic J. 2014. Weakly supervised action labeling in videos under ordering constraints//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 628-643 [DOI: 10.1007/978-3-319-10602-1_41http://dx.doi.org/10.1007/978-3-319-10602-1_41]
Bolya D, Zhou C, Xiao F Y and Lee Y J. 2022. YOLACT++ better real-time instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2): 1108-1121 [DOI: 10.1109/TPAMI.2020.3014297]
Cai L, Dong F, Chen K, Yu K H, Qu W and Jiang J F. 2020. An FPGA based heterogeneous accelerator for single shot multibox detector (SSD)//Proceedings of the 15th IEEE International Conference on Solid-State and Integrated Circuit Technology. Kunming, China: IEEE: 1-3 [DOI: 10.1109/ICSICT49897.2020.9278177http://dx.doi.org/10.1109/ICSICT49897.2020.9278177]
Cao T Y, Du L Y, Zhang X Y, Chen S H, Zhang Y and Wang Y F. 2021. CaT: weakly supervised object detection with category transfer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3050-3059 [DOI: 10.1109/ICCV48922.2021.00306http://dx.doi.org/10.1109/ICCV48922.2021.00306]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Chen F Q. 2013. An improved EM algorithm [EB/OL]. [2022-03-05].https://arxiv.org/pdf/1305.0626.pdfhttps://arxiv.org/pdf/1305.0626.pdf
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2015. Semantic image segmentation with deep convolutional nets and fully connected CRFs//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184]
Chen Z, Fu Z H, Jiang R X, Chen Y W and Hua X S. 2020. SLV: spatial likelihood voting for weakly supervised object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12992-13001 [DOI: 10.1109/CVPR42600.2020.01301http://dx.doi.org/10.1109/CVPR42600.2020.01301]
Cheng B W, Parkhi O and Kirillov A. 2021. Pointly-supervised instance segmentation [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2104.06404.pdfhttps://arxiv.org/pdf/2104.06404.pdf
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223 [DOI: 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]
Dai J F, He K M and Sun J. 2015. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1635-1643 [DOI: 10.1109/ICCV.2015.191http://dx.doi.org/10.1109/ICCV.2015.191]
Dempster A P, Laird N M and Rubin D B. 1977. Maximum likelihood from incomplete data via theEMalgorithm. Journal of the Royal Statistical Society, 39(1): 1-22 [DOI: 10.1111/j.2517-6161.1977.tb01600.x]
Diba A, Sharma V, Pazandeh A, Pirsiavash H and Van Gool L. 2017. Weakly supervised cascaded convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5131-5139 [DOI: 10.1109/CVPR.2017.545http://dx.doi.org/10.1109/CVPR.2017.545]
Dietterich T G, Lathrop R H and Lozano-Pérez T. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1/2): 31-71 [DOI: 10.1016/S0004-3702(96)00034-3]
Dong B W, Huang Z T, Guo Y L, Wang Q L, Niu Z X and Zuo W M. 2021. Boosting weakly supervised object detection via learning bounding box adjusters//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 2856-2865 [DOI: 10.1109/ICCV48922.2021.00287http://dx.doi.org/10.1109/ICCV48922.2021.00287]
Duchenne O, Laptev I, Sivic J, Bach F and Ponce J. 2009. Automatic annotation of human actions in video//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 1491-1498 [DOI: 10.1109/ICCV.2009.5459279http://dx.doi.org/10.1109/ICCV.2009.5459279]
Durand T, Thome N and Cord M. 2016. WELDON: weakly supervised learning of deep convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4743-4752 [DOI: 10.1109/CVPR.2016.51http://dx.doi.org/10.1109/CVPR.2016.51]
Erez O and Maron T. 1998. A framework for multiple-instance learning. Advances in Neural Information Processing Systems[EB/OL]. [2022-03-05].https://papers.nips.cc/paper/1997/file/82965d4ed8150294d4330ace00821d77-Paper.pdfhttps://papers.nips.cc/paper/1997/file/82965d4ed8150294d4330ace00821d77-Paper.pdf
Everingham M, Eslami S M A, Van Gool L, Williams C K I, Winn J and Zisserman A. 2015. The PASCAL visual object classes challenge: a retrospective. International Journal of Computer Vision, 111(1): 98-136 [DOI: 10.1007/s11263-014-0733-5]
Fan J S, Zhang Z X, Tan T N, Song C F and Xiao J. 2020. CIAN: Cross-image affinity net for weakly supervised semantic segmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 10762-10769
Ge W F, Huang W L, Guo S and Scott M. 2019. Label-PEnet: sequential label propagation and enhancement networks for weakly supervised instance segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 3344-3353 [DOI: 10.1109/ICCV.2019.00344http://dx.doi.org/10.1109/ICCV.2019.00344]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Gong G Q, Wang X H, Mu Y D and Tian Q. 2020. Learning temporal co-attention models for unsupervised videoaction localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 9816-9825 [DOI: 10.1109/CVPR42600.2020.00984http://dx.doi.org/10.1109/CVPR42600.2020.00984]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2980-2988 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
He K M, Gkioxari G, Dollár P and Girshick R. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397 [DOI: 10.1109/TPAMI.2018.2844175]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1904-1916 [DOI: 10.1109/TPAMI.2015.2389824]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Heilbron F C, Escorcia V, Ghanem B and Niebles J C. 2015. ActivityNet: a large-scale video benchmark for human activity understanding//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 961-970 [DOI: 10.1109/CVPR.2015.7298698http://dx.doi.org/10.1109/CVPR.2015.7298698]
Hou Q B, Jiang P T, Wei Y C and Cheng M M. 2018. Self-erasing network for integral object attention//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 547-557
Hsu C C, Hsu K J, Tsai C C, Lin Y Y and Chuang Y Y. 2019. Weakly supervised instance segmentation using the bounding box tightness prior//Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing Systems Foundation: 6582-6593
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang D A, Li F F and Niebles J C. 2016. Connectionist temporal modeling for weakly supervised action labeling//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 137-153 [DOI: 10.1007/978-3-319-46493-0_9http://dx.doi.org/10.1007/978-3-319-46493-0_9]
Huang Z Y, Zou Y, Bhagavatula V and Huang D. 2020. Comprehensive attention self-distillation for weakly-supervised object detection [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2010.12023.pdfhttps://arxiv.org/pdf/2010.12023.pdf
Hwang J J, Yu S, Shi J B, Collins M, Yang T J, Zhang X and Chen L C. 2019. SegSort: segmentation by discriminative sorting of segments//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 7333-7343 [DOI: 10.1109/ICCV.2019.00743http://dx.doi.org/10.1109/ICCV.2019.00743]
Idrees H, Zamir A R, Jiang Y G, Gorban A, Laptev I, Sukthankar R and Shah M. 2017. The THUMOS challenge on action recognition for videos "in the wild". Computer Vision and Image Understanding, 155: 1-23 [DOI: 10.1016/j.cviu.2016.10.018]
Jia Q F, Wei S K, Ruan T, Zhao Y F and Zhao Y. 2021. GradingNet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s. l.]: AAAI
Ju C, Zhao P S, Zhang Y, Wang Y F and Tian Q. 2020. Point-level temporal action localization: bridging fully-supervised proposals to weakly-supervised losses [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2012.08236.pdfhttps://arxiv.org/pdf/2012.08236.pdf
Kantorov V, Oquab M, Cho M and Laptev I. 2016. ContextLocNet: context-aware deep network models for weakly supervised localization//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 350-365 [DOI: 10.1007/978-3-319-46454-1_22http://dx.doi.org/10.1007/978-3-319-46454-1_22]
Ke T W, Hwang J J and Yu S X. 2021. Universal weakly supervised segmentation by pixel-to-segment contrastive learning//Proceedings of the 9th International Conference on Learning Representations. [s. l.]: ICLR
Khoreva A, Benenson R, Hosang J, Hein M and Schiele B. 2017. Simple does it: weakly supervised instance and semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1665-1674 [DOI: 10.1109/CVPR.2017.181http://dx.doi.org/10.1109/CVPR.2017.181]
Kim D, Cho D, Yoo D and Kweon I S. 2017. Two-phase learning for weakly supervised object localization//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3554-3563 [DOI: 10.1109/ICCV.2017.382http://dx.doi.org/10.1109/ICCV.2017.382]
Kosugi S, Yamasaki T and Aizawa K. 2019. Object-aware instance labeling for weakly supervised object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 6063-6071 [DOI: 10.1109/ICCV.2019.00616http://dx.doi.org/10.1109/ICCV.2019.00616]
Krähenbühl P and Koltun V. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials//Proceedings of the 25th Advances in Neural Information Processing Systems. Granada, Spain: NIPS: 109-117
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 26th Neural Information Processing Systems. Lake Tahoe, USA: NIPS: 1106-1114
Lan S Y, Yu Z D, Choy C, Radhakrishnan S, Liu G L, Zhu Y K and Anandkumar A. 2021. DiscoBox: weakly supervised instance segmentation and semantic correspondence from box supervision//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3386-3396
Laptev I, Marszalek M, Schmid C and Rozenfeld B. 2008. Learning realistic human actions from movies//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2008.4587756http://dx.doi.org/10.1109/CVPR.2008.4587756]
Laradji I H, Vázquez D and Schmidt M. 2019. Where are the masks: instance segmentation with image-level supervision//Proceedings of the 30th British Machine Vision Conference. Cardiff, UK: BMVA: #255
Law H and Deng J. 2018. Cornernet: detecting objects as paired keypoints//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 765-781 [DOI: 10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]
Lee J, Yi J H, Shin C and Yoon S. 2021a. BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2643-2651 [DOI: 10.1109/CVPR46437.2021.00267http://dx.doi.org/10.1109/CVPR46437.2021.00267]
Lee P and Byun H. 2021. Learning action completeness from points for weakly-supervised temporal action localization//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 13628-13637 [DOI: 10.1109/ICCV48922.2021.01339http://dx.doi.org/10.1109/ICCV48922.2021.01339]
Lee P, Uh Y and Byun H. 2020. Background suppression network for weakly-supervised temporal action localization//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11320-11327
Lee P, Wang J L, Lu Y and Byun H. 2021b. Weakly-supervised temporal action localization by uncertainty modeling//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s. l.]: AAAI: 1854-1862
Lee S, Kwak S and Cho M. 2018. Universal bounding box regression and its applications//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 373-387 [DOI: 10.1007/978-3-030-20876-9_24http://dx.doi.org/10.1007/978-3-030-20876-9_24]
Lee S, Lee M, Lee J and Shim H. 2021c. Railroad is not a train: saliency as pseudo-pixel supervision for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference onComputer Vision and Pattern Recognition. Nashville, USA: IEEE: 5491-5501 [DOI: 10.1109/CVPR46437.2021.00545http://dx.doi.org/10.1109/CVPR46437.2021.00545]
Li X Y, Kan M N, Shan S G and Chen X L. 2019a. Weakly supervised object detection with segmentation collaboration//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 9734-9743 [DOI: 10.1109/ICCV.2019.00983http://dx.doi.org/10.1109/ICCV.2019.00983]
Li X Y, Zhou T F, Li J W, Zhou Y and Zhang Z X. 2021a. Group-wise semantic mining for weakly supervised semantic segmentation//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s. l.]: AAAI: 1984-1992
Li Y, Zhang J G, Huang K Q and Zhang J G. 2019b. Mixed supervised object detection with robust objectness transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3): 639-653 [DOI: 10.1109/TPAMI.2018.2810288]
Li Y W, Zhao H S, Qi X J, Chen Y K, Qi L, Wang L W, Li Z M, Sun J and Jia J Y. 2021b. Fully convolutional networks for panoptic segmentation with point-based supervision [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2108.07682.pdfhttps://arxiv.org/pdf/2108.07682.pdf
Lin D, Dai J F, Jia J Y, He K M and Sun J. 2016. ScribbleSup: scribble-supervised convolutional networks for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3159-3167 [DOI: 10.1109/CVPR.2016.344http://dx.doi.org/10.1109/CVPR.2016.344]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Liu D C, Jiang T T and Wang Y Z. 2019. Completeness modeling and context separation for weakly supervised temporal action localization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 1298-1307 [DOI: 10.1109/CVPR.2019.00139http://dx.doi.org/10.1109/CVPR.2019.00139]
Liu Y, Zhang Z J, Niu L, Chen J J and Zhang L Q. 2021a. Mixed supervised object detection by transferring mask prior and semantic similarity [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2110.14191.pdfhttps://arxiv.org/pdf/2110.14191.pdf
Liu Y C, Ma C Y, He Z J, Kuo C W, Chen K, Zhang P Z, Wu B C, Kira Z and Vajda P. 2021b. Unbiased teacher for semi-supervised object detection//Proceedings of the 9th International Conference on Learning Representations. [s. l.]: ICLR
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B. 2021c. Swin transformer: hierarchical vision transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 9992-10002 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440 [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Luo W, Zhang T Z, Yang W F, Liu J E, Mei T, Wu F and Zhang Y D. 2021. Action unit memory network for weakly supervised temporal action localization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 9964-9974 [DOI: 10.1109/CVPR46437.2021.00984http://dx.doi.org/10.1109/CVPR46437.2021.00984]
Ma F, Zhu L C, Yang Y, Zha S X, Kundu G, Feiszli M and Shou Z. 2020. SF-Net: single-frame supervision for temporal action localization//Proceedings of Computer Vision-ECCV 2020-16th European Conference. Glasgow, UK: ECCV: 420-437
Ma J W, Gorti S K, Volkovs M and Yu G W. 2021. Weakly supervised action selection learning in video//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 7583-7592 [DOI: 10.1109/CVPR46437.2021.00750http://dx.doi.org/10.1109/CVPR46437.2021.00750]
Mai J J, Yang M and Luo W F. 2020. Erasing integrated learning: a simple yet effective approach for weakly supervised object localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8763-8772 [DOI: 10.1109/CVPR42600.2020.00879http://dx.doi.org/10.1109/CVPR42600.2020.00879]
Masita K L, Hasan A N and Shongwe T. 2020. Deep learning in object detection: a review//Proceedings of 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems. Durban, South Africa: IEEE: 1-11 [DOI: 10.1109/icABCD49160.2020.9183866http://dx.doi.org/10.1109/icABCD49160.2020.9183866]
Moltisanti D, Fidler S and Damen D. 2019. Action recognition from single timestamp supervision in untrimmed videos//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9907-9916 [DOI: 10.1109/CVPR.2019.01015http://dx.doi.org/10.1109/CVPR.2019.01015]
Narayan S, Cholakkal H, Khan F S and Shao L. 2019. 3C-Net: category count and center loss for weakly-supervised action localization//Proceedigns of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 8678-8686 [DOI: 10.1109/ICCV.2019.00877http://dx.doi.org/10.1109/ICCV.2019.00877]
Nguyen P, Han B, Liu T and Prasad G. 2018. Weakly supervised action localization by sparse temporal pooling network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6752-6761 [DOI: 10.1109/CVPR.2018.00706http://dx.doi.org/10.1109/CVPR.2018.00706]
Nguyen P, Ramanan D and Fowlkes C. 2019. Weakly-supervised action localization with background modeling//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 5501-5510 [DOI: 10.1109/ICCV.2019.00560http://dx.doi.org/10.1109/ICCV.2019.00560]
Oh Y, Kim B and Ham B. 2021. Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 6909-6918 [DOI: 10.1109/CVPR46437.2021.00684http://dx.doi.org/10.1109/CVPR46437.2021.00684]
Papadopoulos D P, Uijlings J R R, Keller F and Ferrari V. 2017. Extreme clicking for efficient object annotation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4940-4949 [DOI: 10.1109/ICCV.2017.528http://dx.doi.org/10.1109/ICCV.2017.528]
Papandreou G, Chen L C, Murphy K P and Yuille A L. 2015. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1742-1750 [DOI: 10.1109/ICCV.2015.203http://dx.doi.org/10.1109/ICCV.2015.203]
Pardo A, Alwassel H, Heilbron F C, Thabet A and Ghanem B. 2021. RefineLoc: iterative refinement for weakly-supervised action localization//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 3318-3327 [DOI: 10.1109/WACV48630.2021.00336http://dx.doi.org/10.1109/WACV48630.2021.00336]
Pathak D, Shelhamer E, Long J and Darrell T. 2015. Fully convolutional multi-class multiple instance learning//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR
Paul S, Roy S and Roy-Chowdhury A K. 2018. W-TALC: weakly-supervised temporal activity localization and classification//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 588-607 [DOI: 10.1007/978-3-030-01225-0_35http://dx.doi.org/10.1007/978-3-030-01225-0_35]
Pinheiro P O and Collobert R. 2015. From image-level to pixel-level labeling with convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1713-1721 [DOI: 10.1109/CVPR.2015.7298780http://dx.doi.org/10.1109/CVPR.2015.7298780]
Qian R, Wei Y C, Shi H H, Li J C, Liu J Y and Huang T S. 2019. Weakly supervised scene parsing with point-based distance metric learning//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 8843-8850
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 779-788 [DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031]
Ren Z Z, Yu Z D, Yang X D, Liu M Y, Lee Y J, Schwing A G and Kautz J. 2020. Instance-aware, context-focused, and memory-efficient weakly supervised object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10595-10604 [DOI: 10.1109/CVPR42600.2020.01061http://dx.doi.org/10.1109/CVPR42600.2020.01061]
Rother C, Kolmogorov V and Blake A. 2004. "GrabCut": interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3): 309-314 [DOI: 10.1145/1015706.1015720]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-y]
Russell B C, Torralba A, Murphy K P and Freeman W T. 2008. LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1/3): 157-173 [DOI: 10.1007/s11263-007-0090-8]
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2020. Grad-CAM: visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2): 336-359 [DOI: 10.1007/s11263-019-01228-7]
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun Y. 2014. OverFeat: integrated recognition, localization and detection using convolutional networks//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: ICLR
Shao F F, Chen L, Shao J, Ji W, Xiao S N, Ye L, Zhuang Y T and Xiao J. 2021. Deep learning for weakly-supervised object detection and object localization: a survey [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2105.12694.pdfhttps://arxiv.org/pdf/2105.12694.pdf
Shen Y H, Ji R R, Wang Y, Wu Y J and Cao L J. 2019. Cyclic guidance for weakly supervised joint detection and segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 697-707 [DOI: 10.1109/CVPR.2019.00079http://dx.doi.org/10.1109/CVPR.2019.00079]
Shi B F, Dai Q, Mu Y D and Wang J D. 2020. Weakly-supervised action localization by generative attention modeling//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1006-1016 [DOI: 10.1109/CVPR42600.2020.0010http://dx.doi.org/10.1109/CVPR42600.2020.0010]
Shou Z, Gao H, Zhang L, Miyazawa K and Chang S F. 2018. AutoLoc: weakly-supervised temporal action localization in untrimmed videos//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 162-179 [DOI: 10.1007/978-3-030-01270-0_10http://dx.doi.org/10.1007/978-3-030-01270-0_10]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR
Singh K K and Lee Y J. 2017.Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3544-3553 [DOI: 10.1109/ICCV.2017.381http://dx.doi.org/10.1109/ICCV.2017.381]
Song C F, Huang Y, Ouyang W L and Wang L. 2019. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3131-3140 [DOI: 10.1109/CVPR.2019.00325http://dx.doi.org/10.1109/CVPR.2019.00325]
Sui L, Zhang C L and Wu J X. 2021. Salvage of supervision in weakly supervised detection [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2106.04073.pdfhttps://arxiv.org/pdf/2106.04073.pdf
Tang P, Wang X G, Bai S, Shen W, Bai X, Liu W Y and Yuille A. 2020. PCL: proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1): 176-191 [DOI: 10.1109/TPAMI.2018.2876304]
Tang P, Wang X G, Bai X and Liu W Y. 2017. Multiple instance detection network with online instance classifier refinement//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3059-3067 [DOI: 10.1109/CVPR.2017.326http://dx.doi.org/10.1109/CVPR.2017.326]
Tian Z, Shen C H and Chen H. 2020. Conditional convolutions for instance segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 282-298 [DOI: 10.1007/978-3-030-58452-8_17http://dx.doi.org/10.1007/978-3-030-58452-8_17]
Tian Z, Shen C H, Chen H and He T. 2019. FCOS: fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 9626-9635 [DOI: 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972]
Tian Z, Shen C H, Wang X L and Chen H. 2021. BoxInst: high-performance instance segmentation with box annotations//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5439-5448 [DOI: 10.1109/CVPR46437.2021.00540http://dx.doi.org/10.1109/CVPR46437.2021.00540]
Uijlings J R R, Popov S and Ferrari V. 2018. Revisiting knowledge transfer for training object class detectors//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1101-1110 [DOI: 10.1109/CVPR.2018.00121http://dx.doi.org/10.1109/CVPR.2018.00121]
Uijlings J R R, Van De Sande K E A, Gevers T and Smeulders A W M. 2013. Selective search for object recognition. International Journal of Computer Vision, 104(2): 154-171 [DOI: 10.1007/s11263-013-0620-5]
Vernaza P and Chandraker M. 2017. Learning random-walk label propagation for weakly-supervised semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2953-2961 [DOI: 10.1109/CVPR.2017.315http://dx.doi.org/10.1109/CVPR.2017.315]
Wang J D, Sun K, Cheng T H, Jiang B R, Deng C R, Zhao Y, Liu D, Mu Y D, Tan M K, Wang X G, Liu W Y and Xiao B. 2021a. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3349-3364 [DOI: 10.1109/TPAMI.2020.2983686]
Wang L M, Xiong Y J, Lin D H and Van Gool L. 2017. UntrimmedNets for weakly supervised action recognition and detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 6402-6411 [DOI: 10.1109/CVPR.2017.678http://dx.doi.org/10.1109/CVPR.2017.678]
Wang X G, Feng J P, Hu B, Ding Q, Ran L J, Chen X X and Liu W Y. 2021c. Weakly-supervised instance segmentation via class-agnostic learning with salient images//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 10220-10230 [DOI: 10.1109/CVPR46437.2021.01009http://dx.doi.org/10.1109/CVPR46437.2021.01009]
Wang Y D, Zhang J, Kan M N, Shan S G and Chen X L. 2020. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12272-12281 [DOI: 10.1109/CVPR42600.2020.01229http://dx.doi.org/10.1109/CVPR42600.2020.01229]
Wei Y C, Feng J S, Liang X D, Cheng M M, Zhao Y and Yan S C. 2017a. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6488-6496 [DOI: 10.1109/CVPR.2017.687http://dx.doi.org/10.1109/CVPR.2017.687]
Wei Y C, Liang X D, Chen Y P, Shen X H, Cheng M M, Feng J S, Zhao Y and Yan S C. 2017b. STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2314-2320 [DOI: 10.1109/TPAMI.2016.2636150]
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S and Perona P. 2010. Caltech-ucsd birds 200. California Institute of Technology. CNS-TR-2010-001
Xie C H, Ren D W, Wang L, Hu Q H, Lin L and Zuo W M. 2021. Learning class-agnostic pseudo mask generation for box-supervised semantic segmentation. [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2103.05463.pdfhttps://arxiv.org/pdf/2103.05463.pdf
Xie E Z, Sun P Z, Song X G, Wang W H, Liu X B, Liang D, Shen C H and Luo P. 2020. PolarMask: single shot instance segmentation with polar representation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12190-12199 [DOI: 10.1109/CVPR42600.2020.01221http://dx.doi.org/10.1109/CVPR42600.2020.01221]
Xie S N and Tu Z W. 2015. Holistically-nested edge detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1395-1403 [DOI: 10.1109/ICCV.2015.164http://dx.doi.org/10.1109/ICCV.2015.164]
Xu C L and Ding L. 2018. Weakly-supervised action segmentation with iterative soft boundary assignment//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6508-6516 [DOI: 10.1109/CVPR.2018.00681http://dx.doi.org/10.1109/CVPR.2018.00681]
Xu Y L, Zhang C W, Cheng Z Z, Xie J W, Niu Y, Pu S L and Wu F. 2019. Segregated temporal assembly recurrent networks for weakly supervised multiple action detection//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 9070-9078
Yan G, Liu B X, Guo N, Ye X C, Wang F, You H H and Fan D R. 2019. C-MIDN: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 9833-9842 [DOI: 10.1109/ICCV.2019.00993http://dx.doi.org/10.1109/ICCV.2019.00993]
Yang K, Li D S and Dou Y. 2019. Towards precise end-to-end weakly supervised object detection network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 8371-8380 [DOI: 10.1109/ICCV.2019.00846http://dx.doi.org/10.1109/ICCV.2019.00846]
Yang L, Han J W, Zhao T, Lin T W, Zhang D W and Chen J X. 2021a. Background-click supervision for temporal action localization. [J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence.https://ieeexplore.ieee.org/document/9633199https://ieeexplore.ieee.org/document/9633199[DOI: 10.1109/TPAMI.2021.3132058http://dx.doi.org/10.1109/TPAMI.2021.3132058].
Yang W F, Zhang T Z, Yu X Y, Qi T, Zhang Y D and Wu F. 2021b. Uncertainty guided collaborative training for weakly supervised temporal action detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 53-63 [DOI: 10.1109/CVPR46437.2021.00012http://dx.doi.org/10.1109/CVPR46437.2021.00012]
Yin Y F, Deng J J, Zhou W G and Li H Q. 2021. Instance mining with class feature banks for weakly supervised object detection//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s. l.]: AAAI: 3190-3198
Yu T, Ren Z, Li Y C, Yan E X, Xu N and Yuan J S. 2019a. Temporal structure mining for weakly supervised action detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE:5521-5530 [DOI: 10.1109/ICCV.2019.00562http://dx.doi.org/10.1109/ICCV.2019.00562]
Yu Z, Zhuge Y, Lu H C and Zhang L H. 2019b. Joint learning of saliency detection and weakly supervised semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 7222-7232 [DOI: 10.1109/ICCV.2019.00732http://dx.doi.org/10.1109/ICCV.2019.00732]
Zeng Z Y, Liu B, Fu J L, Chao H Y and Zhang L. 2019. WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 8291-8299 [DOI: 10.1109/ICCV.2019.00838http://dx.doi.org/10.1109/ICCV.2019.00838]
Zhai Y H, Wang L, Tang W, Zhang Q L, Yuan J S and Hua G. 2020. Two-stream consensus network for weakly-supervised temporal action localization//Proceedings of Computer Vision-ECCV 2020-16th European Conference. Glasgow, UK: ECCV: 37-54
Zhang B F, Xiao J M and Zhao Y. 2021a. Dynamic feature regularized loss for weakly supervised semantic segmentation [EB/OL]. [2022-03-05].https://arxiv.org/pdf/2108.01296.pdfhttps://arxiv.org/pdf/2108.01296.pdf
Zhang C, Cao M, Yang D M, Chen J and Zou Y X. 2021b. CoLA: weakly-supervised temporal action localization with snippet contrastive learning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 16005-16014 [DOI: 10.1109/CVPR46437.2021.01575http://dx.doi.org/10.1109/CVPR46437.2021.01575]
Zhang D, Zhang H W, Tang J H, Hua X S and Sun Q R. 2020. Causal Intervention for Weakly-Supervised Semantic Segmentation//Advances in Neural Information Processing Systems. [s. l.]: NeurIPS
Zhang D W, Han J W, Cheng G and Yang M H. 2021c. Weakly supervised object localization and detection: a survey. [J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence [DOI: 10.1109/TPAMI.2021.3074313http://dx.doi.org/10.1109/TPAMI.2021.3074313]
Zhang X L, Wei Y C, Feng J S, Yang Y and Huang T. 2018a. Adversarial complementary learning for weakly supervised object localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1325-1334 [DOI: 10.1109/CVPR.2018.00144http://dx.doi.org/10.1109/CVPR.2018.00144]
Zhang X L, Wei Y C, Kang G L, Yang Y and Huang T. 2018b. Self-produced guidance for weakly-supervised object localization//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 610-625 [DOI: 10.1007/978-3-030-01258-8_37http://dx.doi.org/10.1007/978-3-030-01258-8_37]
Zhang Y Q, Bai Y C, Ding M L, Li Y Q and Ghanem B. 2018c. W2F: a weakly-supervised to fully-supervised framework for object detection//Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, USA: IEEE: 928-936 [DOI: 10.1109/CVPR.2018.00103http://dx.doi.org/10.1109/CVPR.2018.00103]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239 [DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
Zhong J X, Li N N, Kong W J, Zhang T, Li T H and Li G. 2018. Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector//Proceedings of the 26th ACM International Conference on Multimedia. Seoul, Korea(South): Association for Computing Machinery: 35-44 [DOI: 10.1145/3240508.3240511http://dx.doi.org/10.1145/3240508.3240511]
Zhong Y Y, Wang J F, Peng J and Zhang L. 2020. Boosting weakly supervised object detection with progressive knowledge transfer//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 615-631 [DOI: 10.1007/978-3-030-58574-7_37http://dx.doi.org/10.1007/978-3-030-58574-7_37]
Zhou B L, Khosla A, LapedrizaÀ, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2921-2929 [DOI: 10.1109/CVPR.2016.319http://dx.doi.org/10.1109/CVPR.2016.319]
Zhou Y Z, Zhu Y, Ye Q X, Qiu Q and Jiao J B. 2018. Weakly supervised instance segmentation using class peak response//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3791-3800 [DOI: 10.1109/CVPR.2018.00399http://dx.doi.org/10.1109/CVPR.2018.00399]
Zhou Z H. 2004. Multi-instance learning: a survey. AI Lab, Department of Computer Science and Technology: 1-31
Zhou Z H, Sun Y Y and Li Y F. 2009. Multi-instance learning by treating instances as non-I.I.D. samples//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada: ACM: 1249-1256 [DOI: 10.1145/1553374.1553534http://dx.doi.org/10.1145/1553374.1553534]
Zhou Z H, Zhang M L. 2007. Solving multi-instance problems with classifier ensemble based on constructive clustering. [J/OL]Knowledge and Information Systems.https://ieeexplore.ieee.org/document/9409690https://ieeexplore.ieee.org/document/9409690
Zitnick C L and Dollár P. 2014. Edge boxes: locating object proposals from edges//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 391-405 [DOI: 10.1007/978-3-319-10602-1_26http://dx.doi.org/10.1007/978-3-319-10602-1_26]
相关作者
相关机构