自注意力融合调制的弱监督语义分割

石德硕; 李军侠; 刘青山

doi:10.11834/jig.221121

图像分析和识别 | 浏览量 : 0 下载量: 2 CSCD: 0

PDF
导出
分享
收藏
专辑

自注意力融合调制的弱监督语义分割
Self-attention fusion and modulation for weakly supervised semantic segmentation
2023年28卷第12期页码：3758-3771
纸质出版日期： 2023-12-16 ，
DOI： 10.11834/jig.221121
稿件说明：

移动端阅览

石德硕，李军侠，刘青山. 2023. 自注意力融合调制的弱监督语义分割. 中国图象图形学报， 28(12):3758-3771

Shi Deshuo， Li Junxia， Liu Qingshan. 2023. Self-attention fusion and modulation for weakly supervised semantic segmentation. Journal of Image and Graphics， 28(12):3758-3771
石德硕，李军侠，刘青山. 2023. 自注意力融合调制的弱监督语义分割. 中国图象图形学报， 28(12):3758-3771 DOI： 10.11834/jig.221121.

Shi Deshuo， Li Junxia， Liu Qingshan. 2023. Self-attention fusion and modulation for weakly supervised semantic segmentation. Journal of Image and Graphics， 28(12):3758-3771 DOI： 10.11834/jig.221121.

摘要

目的

现有图像级标注的弱监督分割方法大多利用卷积神经网络获取伪标签，其覆盖的目标区域往往过小。基于Transformer的方法通常采用自注意力对类激活图进行扩张，然而受其深层注意力不准确性的影响，优化之后得到的伪标签中背景噪声比较多。为了利用该两类特征提取网络的优点，同时结合Transformer不同层级的注意力特性，构建了一种结合卷积特征和Transformer特征的自注意力融合调制网络进行弱监督语义分割。

方法

采用卷积增强的Transformer（Conformer）作为特征提取网络，其能够对图像进行更加全面的编码，得到初始的类激活图。设计了一种自注意力层级自适应融合模块，根据自注意力值和层级重要性生成融合权重，融合之后的自注意力能够较好地抑制背景噪声。提出了一种自注意力调制模块，利用像素对之间的注意力关系，设计调制函数，增大前景像素的激活响应。使用调制后的注意力对初始类激活图进行优化，使其覆盖较多的目标区域，同时有效抑制背景噪声。

结果

在最常用的PASCAL VOC 2012（pattern analysis， statistical modeling and computational learning visual object classes 2012）数据集和COCO 2014（common objectes in context 2014）数据集上利用获得的伪标签进行分割网络的训练，在对比实验中本文算法均取得最优结果，在PASCAL VOC验证集上，平均交并比（mean intersection over union，mIoU）达到了70.2%，测试集上mIoU值为70.5%，相比对比算法中最优的Transformer模型，其性能在验证集和测试集上均提升了0.9%，相比于卷积神经网络最优方法，验证集上mIoU提升了0.7%，测试集上mIoU值提升了0.8%。在COCO 2014验证集上结果为40.1%，与对比算法中最优方法相比分割精度提高了0.5%。

结论

本文提出的弱监督语义分割模型，结合了卷积神经网络和Transformer的优点，通过对Transformer自注意力进行自适应融合调制，得到了图像级标签下目前最优的语义分割结果，该方法可应用于三维重建、机器人场景理解等应用领域。此外，所构建的自注意力自适应融合模块和自注意力调制模块均可嵌入到Transformer结构中，为具体视觉任务获取更鲁棒、更具鉴别性的特征。

Abstract

Objective

Semantic segmentation is a fundamental task in computer vision and image processing whose aim is to assign a class label to each pixel. However， the training of the segmentation model often relies on dense pixel-wise annotations， which are both time consuming and labor intensive to collect. To eliminate the dependence of pixel-level labels， weakly supervised semantic segmentation （WSSS） has been widely applied due to its weak/cheap supervision of points， scribbles， image-level labels， and bounding boxes. Image-level and pseudo labels are the weakest/easiest and most difficult labels to obtain， respectively. The main challenge in WSSS based on a convolutional neural network with image-level supervision lies in the naive gap between classification and segmentation tasks， which reduces the activation of the target regions， thus failing to satisfy the requirements of segmentation tasks. Despite activating most of the foreground objects， the classifier using the Transformer also introduces much background noise， thus decreasing the quality of the pseudo masks. In order to take full use of the advantages of these two types of feature extraction networks and combine the attention features of different levels of the Transformer， a self-attention fusion and modulation network is constructed in this paper for weakly supervised semantic segmentation.

Method

To make full use of the local features extracted by the convolutional neural network and the global features extracted by the Transformer， this paper turns to the convolution-enhanced Transformer （Conformer） as the feature extraction network that can encode the image comprehensively and obtain the initial class activation maps. The attention maps learned by the Transformer branch differ between the shallow and deep layers. Influenced by the convolution information， the attention maps in shallow layers tend to capture detailed information of the targets regions， while those maps in deeper layers prefer mining the global information. Meanwhile， the noises in the background regions are caused by the attention maps in deeper layers owing to the incorrect relation between the background and foreground. Therefore， adding different attention maps directly is a suboptimal choice. We propose a self-attention adaptive fusion module that assigns a weight to each layer to balance their importance. On the one hand， we argue that the attention maps in shallow layers are more accurate than those in deeper layers， so large and small weights are distributed to the maps in the shallow and deep layer maps， respectively， to reduce the influence of the noise caused by deep layers. On the other hand， we consider the discrete activation value of the attention map， and a map with a larger discrete activation value has greater importance. The fused self-attention can effectively suppress background noises and describe the similarity between pixel pairs. In order to further increase the activation response of foreground pixels， we design a self-attention modulation module. We initially normalize the attention map before mapping it via the exponential function to measure the importance of each pixel pair. Given that the target object pixels are relatively similar and that the attention value of the pixel pair may be larger than that of others， we increase this connection via a large modulation parameter. When a pixel pair has a small attention value， these pixels may not have a close relation， thus introducing some noise. Therefore， we reduce this connection via a small modulation parameter. After modulating the attention map， the distance between the foreground and background pixels becomes large， and the attention maps pay more attention to the foreground regions than the background ones.

Result

Our experiment results demonstrate that our model can achieve state-of-the-art performance. Our model obtains a 70.2% mean intersection over union （mIoU） in the validation set， 70.5% mIoU in the test set of the most popular PASCAL VOC 2012 dataset， and 40.1% mIoU in the validation set of COCO 2014. We do not utilize the saliency maps to provide the background cues， and our results are comparable to those works using saliency maps. Our model outperforms the state-of-the-art multi-class token Transformer （MCTformer） model， which uses the Transformer structure for feature extraction， by 2% and 2.1% in the validation and test sets， respectively， in terms of mIoU. Compared to TransCAM， which directly uses attention to adjust the class activation maps， our model obtains a 0.9% performance boost both in the validation and test sets， hence demonstrating that our model can effectively reduce noise in background regions. Our model also outperforms IRNet， SEAM， AMR， SIPE， and URN， which use the convolutional neural network as their backbone， by 6.7%， 5.7%， 1.4%， 1.4%， and 0.7%， respectively， in the validation set， thus confirming that our dual branch feature extraction structure is effective and feasible. Given that we extract our features from both the local and global aspects， we also conduct an ablation experiment to show the importance of the completement of information. If we only use the information of the convolution branch， then we obtain 27.7% mIoU of the class activation map （CAM）. However， when fused with the global feature generated by the Transformer branch， we obtain 35.1% mIoU， thereby indicating that both local and global information are helpful in generating CAM.

Conclusion

The proposed self-attention adaptive fusion and modulation network in this paper is effective for image-level weakly supervised semantic segmentation tasks.

关键词

语义分割弱监督学习Transformer卷积神经网络（CNN）自注意力调制自注意力融合类激活图

Keywords

semantic segmentationweakly supervised learningTransformerconvolutional neural network （CNN）self-attention modulationself-attention fusionclass activation map

references

Ahn J， Cho S and Kwak S. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 2204-2213 ［DOI： 10.1109/CVPR.2019.00231http://dx.doi.org/10.1109/CVPR.2019.00231］

Bearman A， Russakovsky O， Ferrari V and Li F F. 2016. What’s the point： semantic segmentation with point supervision//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 549-565 ［DOI： 10.1007/978-3-319-46478-7_34http://dx.doi.org/10.1007/978-3-319-46478-7_34］

Chang Y T， Wang Q S， Hung W C and Piramuthu R. 2020. Weakly-supervised semantic segmentation via sub-category exploration//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8991-9000 ［DOI： 10.1109/cvpr42600.2020.00901http://dx.doi.org/10.1109/cvpr42600.2020.00901］

Chaudhry A， Dokania P K and Torr P H. 2017. Discovering class-specific pixels for weakly-supervised semantic segmentation//Procedings of the 28th British Machine Vision Conference. London， UK： BMVC

Chen C， Tang S and Li J T. 2020. Weakly supervised semantic segmentation based on dynamic mask generation. Journal of Image and Graphics， 25（6）： 1190-1200

陈辰，唐胜，李锦涛. 2020. 动态生成掩膜弱监督语义分割. 中国图象图形学报， 25（6）： 1190-1200 ［DOI： 10.11834/jig.190458http://dx.doi.org/10.11834/jig.190458］

Chen L C， Papandreou G， Kokkinos I， Murphy K and Yuille A L. 2018. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 834-848 ［DOI： 10.1109/tpami.2017.2699184http://dx.doi.org/10.1109/tpami.2017.2699184］

Chen Q， Yang L X， Lai J H and Xie X H. 2022a. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4278-4288 ［DOI： 10.1109/cvpr52688.2022.00425http://dx.doi.org/10.1109/cvpr52688.2022.00425］

Chen Z Z， Wang T， Wu X W， Hua X S， Zhang H W and Sun Q R. 2022b. Class re-activation maps for weakly-supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4278-4288 ［DOI： 10.1109/cvpr52688.2022.00104http://dx.doi.org/10.1109/cvpr52688.2022.00104］

Dosovitskiy A ， Beyer L ， Kolesnikov A， Weissenborn D and Zhai X. 2021. An image is worth 16 × 16 words： Transformers for image recognition at scale ［EB/OL］. ［2022-12-08］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Everingham M， Van Gool L， Williams C K I， Winn J and Zisserman A. 2010. The PASCAL visual object classes （VOC） challenge. International Journal of Computer Vision， 88（2）： 303-338 ［DOI： 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4］

Fan J S， Zhang Z X， Song C F and Tan T N. 2020a. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 4283-4292 ［DOI： 10.1109/cvpr42600.2020.00434http://dx.doi.org/10.1109/cvpr42600.2020.00434］

Fan J S， Zhang Z X， Tan T N， Song C F and Xiao J. 2020b. CIAN： Cross-image affinity net for weakly supervised semantic segmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York， USA： AAAI： 10762-10769 ［DOI： 10.1609/aaai.v34i07.6705http://dx.doi.org/10.1609/aaai.v34i07.6705］

Gao W， Wan F， Pan X J， Peng Z L， Tian Q， Han Z J， Zhou B L and Ye Q X. 2021. TS-CAM： token semantic coupled attention map for weakly supervised object localization//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 2886-2895 ［DOI： 10.1109/iccv48922.2021.00288http://dx.doi.org/10.1109/iccv48922.2021.00288］

Hariharan B， Arbeláez P， Bourdev L， Maji S and Malik J. 2011. Semantic contours from inverse detectors//Proceedings of 2011 IEEE International Conference on Computer Vision. Barcelona， Spain： IEEE： 991-998 ［DOI： 10.1109/ICCV.2011.6126343http://dx.doi.org/10.1109/ICCV.2011.6126343］

Hou Q B， Jiang P T， Wei Y C and Cheng M M. 2018. Self-erasing network for integral object attention//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， USA： Curran Associates Inc.： 547-557

Huang Z L， Wang X G， Wang J S， Liu W Y and Wang J D. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7014-7023 ［DOI： 10.1109/cvpr.2018.00733http://dx.doi.org/10.1109/cvpr.2018.00733］

Jiang P T， Hou Q B， Cao Y， Cheng M M， Wei Y C and Xiong H-K. 2019. Integral object mining via online attention accumulation// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2070-2079 ［DOI： 10.1109/iccv.2019.00216http://dx.doi.org/10.1109/iccv.2019.00216］

Jiang P T， Yang Y Q， Hou Q B and Wei Y C. 2022. L2G： a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 16865-16875 ［DOI： 10.1109/cvpr52688.2022.01638http://dx.doi.org/10.1109/cvpr52688.2022.01638］

Khoreva A， Benenson R， Hosang J， Hein M and Schiele B. 2017. Simple does it： weakly supervised instance and semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 876-885 ［DOI： 10.1109/cvpr.2017.181http://dx.doi.org/10.1109/cvpr.2017.181］

Kim B， Han S and Kim J. 2021. Discriminative region suppression for weakly-supervised semantic segmentation//Proceedings of the 35th AAAI Conference on Artificial Intelligence. New York， USA： AAAI： 1754-1761 ［DOI： 10.1609/aaai.v35i2.16269http://dx.doi.org/10.1609/aaai.v35i2.16269］

Kolesnikov A and Lampert C H. 2016. Seed， expand and constrain： three principles for weakly-supervised image segmentation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 695-711 ［DOI： 10.1007/978-3-319-46493-0_42http://dx.doi.org/10.1007/978-3-319-46493-0_42］

Lee J， Kim E and Yoon S. 2021a. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 4071-4080 ［DOI： 10.1109/cvpr46437.2021.00406http://dx.doi.org/10.1109/cvpr46437.2021.00406］

Lee J， Kim E， Lee S， Lee J and Yoon S. 2019. FickleNet： weakly and semi-supervised semantic image segmentation using stochastic inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 5267-5276 ［DOI： 10.1109/cvpr.2019.00541http://dx.doi.org/10.1109/cvpr.2019.00541］

Lee S， Lee M， Lee J and Shim H. 2021b. Railroad is not a train： saliency as pseudo-pixel supervision for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 5495-5505 ［DOI： 10.1109/cvpr46437.2021.00545http://dx.doi.org/10.1109/cvpr46437.2021.00545］

Li R W， Mai Z D， Trabelsi C， Zhang Z B， Jang J S and Sanner S. 2022a. TransCAM： Transformer attention-based CAM refinement for weakly supervised semantic segmentation. Journal of Visual Communication and Image Representation， 92： #103800 ［DOI：10.1016/j.jvcir.2023.103800http://dx.doi.org/10.1016/j.jvcir.2023.103800］

Li Y， Duan Y Q， Kuang Z H， Chen Y M， Zhang W and Li X M. 2022b. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation//Proceedings of the 26th AAAI Conference on Artificial Intelligence. Palo Alto， USA： AAAI： 1447-1455 ［DOI： 10.1609/aaai.v36i2.20034http://dx.doi.org/10.1609/aaai.v36i2.20034］

Li Y， Kuang Z H， Liu L Y， Chen Y M and Zhang W. 2021. Pseudo-mask matters in weakly-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 6964-6973 ［DOI： 10.1109/iccv48922.2021.00688http://dx.doi.org/10.1109/iccv48922.2021.00688］

Lin D， Dai J F， Jia J Y， He K M and Sun J. 2016. ScribbleSup： scribble-supervised convolutional networks for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 3159-3167 ［DOI： 10.1109/cvpr.2016.344http://dx.doi.org/10.1109/cvpr.2016.344］

Lin T Y， Maire M， Belongie S， Hays J， Perona P， Ramanan D， Dollár P and Zitnick L. 2014. Microsoft COCO： common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 740-755 ［DOI： 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48］

Liu W， Wang H R and Zhou B J. 2021. DeepLabv3plus-IRCNet： an image semantic segmentation method for small target feature extraction. Journal of Image and Graphics， 26（2）： 391-401

刘文，王海荣，周北京. 2021. DeepLabv3plus-IRCNet：小目标特征提取的图像语义分割. 中国图象图形学报， 26（2）： 391-401［DOI： 10.11834/jig.190576http://dx.doi.org/10.11834/jig.190576］

Liu Y， Wu Y H， Wen P S， Shi Y J， Qiu Y and Cheng M M. 2022. Leveraging instance-， image- and dataset-level information for weakly supervised instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（3）： 1415-1428 ［DOI： 10.1109/tpami.2020.3023152http://dx.doi.org/10.1109/tpami.2020.3023152］

Peng Z L， Huang W， Gu S Z， Xie L X， Wang Y W， Jiao J B and Ye Q X. 2021. Conformer： local features coupling global representations for visual recognition//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 367-376 ［DOI： 10.1109/iccv48922.2021.00042http://dx.doi.org/10.1109/iccv48922.2021.00042］

Pont-Tuset J， Arbeláez P， Barron J T， Marques F and Malik J. 2017. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（1）： 128-140 ［DOI： 10.1109/tpami.2016.2537320http://dx.doi.org/10.1109/tpami.2016.2537320］

Qin J， Wu J， Xiao X F， Li L J and Wang X G. 2022. Activation modulation and recalibration scheme for weakly supervised semantic segmentation//Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， USA： AAAI： 2117-2125 ［DOI： 10.1609/aaai.v36i2.20108http://dx.doi.org/10.1609/aaai.v36i2.20108］

Rother C， Kolmogorov V and Blake A. 2004. “GrabCut”： interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics， 23（3）： 309-314 ［DOI： 10.1145/1015706.1015720http://dx.doi.org/10.1145/1015706.1015720］

Ru L X， Zhan Y B， Yu B S and Du B. 2022. Learning affinity from attention： end-to-end weakly-supervised semantic segmentation with Transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 16846-16855 ［DOI： 10.1109/cvpr52688.2022.01634http://dx.doi.org/10.1109/cvpr52688.2022.01634］

Shimoda W and Yanai K. 2019. Self-supervised difference detection for weakly-supervised semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 5208-5217 ［DOI： 10.1109/iccv.2019.00531http://dx.doi.org/10.1109/iccv.2019.00531］

Sun W X， Zhang J and Barnes N. 2022. Inferring the class conditional response map for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： 2878-2887 ［DOI： 10.1109/wacv51458.2022.00271http://dx.doi.org/10.1109/wacv51458.2022.00271］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez AN， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach， USA： ACM： 6000-6010

Wang X， You S D， Li X and Ma H M. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1354-1362 ［DOI： 10.1109/cvpr.2018.00147http://dx.doi.org/10.1109/cvpr.2018.00147］

Wang Y D， Zhang J， Kan M N， Shan S G and Chen X L. 2020. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 12275-12284 ［DOI： 10.1109/cvpr42600.2020.01229http://dx.doi.org/10.1109/cvpr42600.2020.01229］

Wei Y C， Feng J S， Liang X D， Cheng M M， Zhao Y and Yan S C. 2017. Object region mining with adversarial erasing： a simple classification to semantic segmentation approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1568-1576 ［DOI： 10.1109/cvpr.2017.687http://dx.doi.org/10.1109/cvpr.2017.687］

Wu T， Huang J S， Gao G Y， Wei X M， Wei X L， Luo X and Liu C H. 2021. Embedded discriminative attention mechanism for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 16765-16774 ［DOI： 10.1109/cvpr46437.2021.01649http://dx.doi.org/10.1109/cvpr46437.2021.01649］

Xu L， Ouyang W L， Bennamoun M， Boussaid F and Xu D. 2022. Multi-class token Transformer for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4310-4319 ［DOI： 10.1109/cvpr52688.2022.00427http://dx.doi.org/10.1109/cvpr52688.2022.00427］

Xu L， Ouyang W L， Bennamoun M， Boussaid F， Sohel F and Xu D. 2021. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 6964-6973 ［DOI： 10.1109/iccv48922.2021.00690http://dx.doi.org/10.1109/iccv48922.2021.00690］

Yao Y Z， Chen T， Xie G S， Zhang C Y， Shen F M， Wu Q， Tang Z M and Zhang J. 2021. Non-salient region object mining for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 2623-2632 ［DOI： 10.1109/cvpr46437.2021.00265http://dx.doi.org/10.1109/cvpr46437.2021.00265］

Zhang B F， Xiao J M， Wei Y C， Sun M J and Huang K Z. 2020a. Reliability does matter： an end-to-end weakly supervised semantic segmentation approach//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York， USA： AAAI： 12765-12772 ［DOI： 10.1609/aaai.v34i07.6971http://dx.doi.org/10.1609/aaai.v34i07.6971］

Zhang D， Zhang H W， Tang J H， Hua X S and Sun Q R. 2020b. Causal intervention for weakly-supervised semantic segmentation//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， USA： Curran Associates Inc.： 655-666

Zhang F， Gu C C， Zhang C Y and Dai Y C. 2021. Complementary patch for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 7242-7251 ［DOI： 10.1109/ICCV48922.2021.00715http://dx.doi.org/10.1109/ICCV48922.2021.00715］

Zhou T F， Zhang M J， Zhao F and Li J W. 2022. Regional semantic contrast and aggregation for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4289-4299 ［DOI： 10.1109/cvpr52688.2022.00426http://dx.doi.org/10.1109/cvpr52688.2022.00426］

文章被引用时，请邮件提醒。

提交

面向高光谱场景分类的空—谱模型蒸馏网络

通道注意力嵌入的Transformer图像超分辨率重构

双编码特征注意网络的手术器械分割

多尺度局部特征增强Transformer道路裂缝检测模型

融合稀疏注意力和实例增强的雷达点云分割