深度学习背景下的图像语义分割方法综述

严毅; 邓超; 李琳; 朱凌坤; 叶彪

doi:10.11834/jig.220292

综述 | 浏览量 : 0 下载量: 5 CSCD: 1

PDF
导出
分享
收藏
专辑

深度学习背景下的图像语义分割方法综述
Survey of image semantic segmentation methods in the deep learning era
2023年28卷第11期页码：3342-3362
纸质出版日期： 2023-11-16 ，
DOI： 10.11834/jig.220292
稿件说明：

移动端阅览

严毅，邓超，李琳，朱凌坤，叶彪. 2023. 深度学习背景下的图像语义分割方法综述. 中国图象图形学报， 28(11):3342-3362

Yan Yi， Deng Chao， Li Lin， Zhu Lingkun， Ye Biao. 2023. Survey of image semantic segmentation methods in the deep learning era. Journal of Image and Graphics， 28(11):3342-3362
严毅，邓超，李琳，朱凌坤，叶彪. 2023. 深度学习背景下的图像语义分割方法综述. 中国图象图形学报， 28(11):3342-3362 DOI： 10.11834/jig.220292.

Yan Yi， Deng Chao， Li Lin， Zhu Lingkun， Ye Biao. 2023. Survey of image semantic segmentation methods in the deep learning era. Journal of Image and Graphics， 28(11):3342-3362 DOI： 10.11834/jig.220292.

摘要

语义分割任务是很多计算机视觉任务的前提与基础，在虚拟现实、无人驾驶等领域具有重要的应用价值。随着深度学习技术的快速发展，尤其是卷积神经网络（convolutional neural network，CNN）的出现，使得图像语义分割取得了长足的进步。首先，本文介绍了语义分割概念、相关背景和语义分割基本处理流程。然后，总结开源的2D、2.5D、3D数据集和其相适应的分割方法，详细描述了不同网络的分割特点、优缺点及分割精确度，得出监督学习是有效的训练方式。同时，介绍了权威的算法性能评价指标，根据不同方法的侧重点，对各个分割方法的相关实验进行了对比分析，指出了目前实验方面整体存在的问题，其中，DeepLab-V3+网络在分割精确度和速度方面都具有良好的性能，应用价值较高。在此基础上，本文针对国内外的研究现状，提出了当前面临的几点挑战和未来可能的研究方向。通过总结与分析，能够为相关研究人员进行图像语义分割相关研究提供参考。

Abstract

Introduced by Ohta in 1980， image semantic segmentation assigns each pixel in an image with a pre-defined label that represents its semantic category. Aiming to understand the different scenes of images， image semantic segmentation has received much research attention in the field of computer vision. In recent years， many research laboratories around the world have carried out research work on image semantic segmentation based on deep learning. Academic conferences in the fields of automation， artificial intelligence， and pattern recognition also reported research results on semantic segmentation. At the same time， semantic segmentation serves as the premise and basis of many computer vision tasks and has important application value in virtual reality， such as automatic driving and human-computer interaction. With the rapid development of deep learning technology， especially the emergence of convolutional neural networks， image semantic segmentation technology has made great progress and has far outperformed traditional methods in terms of accuracy and efficiency. First， this paper introduces the concept of semantic segmentation along with its background and basic process. In general， image semantic segmentation based on deep learning goes through three processing modules， namely， the feature extraction， semantic segmentation， and refinement processing modules. Second， this paper summarizes the open source 2D， RGB-D， and 3D datasets that have been used in recent years and their corresponding segmentation methods. The semantic segmentation methods for 2D data are divided into method based on candidate region， method based on fully supervised learning， and method based on weakly supervised learning. As RGB-D and 3D date， only a few semantic segmentation methods need to be classified， thus no further classification is performed. This paper describes in detail the network structure of several classical algorithms， the segmentation characteristics， advantages， and disadvantages of different networks， and their segmentation accuracy. Through this summary， this study reveals that most segmentation methods are based on fully supervised learning， which is an effective training method. Third， this paper introduces several authoritative performance evaluation indexes of algorithms， such as mean average precision （mAP） and mean intersection over union （mIoU）， and tests the segmentation accuracy and computing performance of the semantic segmentation method when applied in 2D-data-related experiments. The Experimental section shows that the DeepLab-V3+ network has good segmentation accuracy and speed， which attest to its high application value. The semantic segmentation performance for 2.5D and 3D data is also compared. The following key problems are highlighted in this section： some algorithms are not tested on authoritative datasets； some algorithms are not open source； and some experiments do not describe the relevant experimental parameters in detail. Therefore， considering the current situation of research at home and abroad， this paper highlights several challenges and proposes some new directions for future research. First， segmentation algorithms tend to prioritize either accuracy or real time while ignoring the other. Second， a segmented network usually needs large amounts of memory to realize reasoning and training， hence making it unsuitable for some devices. Third， the design of the segmentation algorithm adapted to 3D data is a current research focus， but high-quality 3D datasets are generally lacking， and the existing 3D datasets are patchwork datasets. Fourth， only a few segmentation algorithms are available for RGB-D and 3D data （particularly for 3D data）， and open-source algorithms generally have low accuracy. Fifth， sequence data have temporal consistency. Sixth， some methods solve the problem of video or sequence segmentation， while others do not use time series information to improve accuracy or segmentation efficiency. Seventh， some papers have proposed that face detection can be realized without training deep neural network and examined whether semantic segmentation can be realized without a training network. Through summary and analysis， this paper hopes to provide some valuable reference for future research on image semantic segmentation.

关键词

深度学习图像语义分割（ISS）卷积神经网络（CNN）监督学习DeepLab-V3+网络

Keywords

deep learningimage semantic segmentation （ISS）convolutional neural network （CNN）supervised learningDeeplab-V3+ network

references

Arbelaez P，Pont-Tuset J，BaITon J T，Marques F and Malik J．2014. Multiscale combinatorial grouping//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 328-335 ［DOI：10.1109/CVPR.2014.49http://dx.doi.org/10.1109/CVPR.2014.49］

Armeni I， Sax S， Zamir A R and Savarese S. 2017. Joint 2D-3D-semantic data for indoor scene understanding ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf1702.01105.pdfhttps://arxiv.org/pdf1702.01105.pdf

Badrinarayanan V， Kendall A and Cipolla R. 2017. SegNet： a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（12）： 2481-2495 ［DOI： 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615］

Bearman A，Russakovsky O，Ferrari V and Li F F. 2016. What’s the point：Semantic segmentation with point supervision ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1506.02106.pdfhttps://arxiv.org/pdf/1506.02106.pdf

Bell S， Upchurch P， Snavely N and Bala K. 2015. Material recognition in the wild with the materials in context database//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3479-3487 ［DOI： 10.1109/CVPR.2015.7298970http://dx.doi.org/10.1109/CVPR.2015.7298970］

Brostow G J， Fauqueur J and Cipolla R. 2009. Semantic object classes in video： a high-definition ground truth database， Pattern Recognition Letters， 30（2）： 88-97 ［DOI： 10.1016/j.patrec.2008.04.005http://dx.doi.org/10.1016/j.patrec.2008.04.005］

Carreira J， Caseiro R， Batista J and Sminchisescu C. 2012a. Semantic Segmentation with Second-Order Pooling//Proceedings of 2012 European Conference on Computer Vision，ECCV： 7578（1）：430-443

Carreira J and Sminchisescu C. 2012b. CPMC：Automatic object segmentation using constrained parametric min-cuts． IEEE Transactions on Pattern Analysis & Machine Intelligence，IEEE： 34（7）：1312-1328

Caesar H， Uijlings J and Ferrad V. 2016. Region-based semantic segmentation wim end-to-end training//Proceedings of the 2016 European Conference on Computer Vision. USA： IEEE： 381-397

Chandra S and Kokkinos I．2016．Fast，exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs//Proceedings of the 2016 European Conference on Computer Vision．Las Vegas， USA：IEEE： 402-418

Chen L C， Papandreou G， Kokkinos I， Murphy K and Yuille A L. 2018a. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 834-848 ［DOI： 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184］

Chen L C， Zhu Y K， Papandreou G， Schroff F and Adam H. 2018b. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 833-851 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Chen X B， Golovinskiy A and Funkhouser T. 2009. A benchmark for 3D mesh segmentation. ACM Transactions on Graphics， 28（3）： #73 ［DOI： 10.1145/1531326.1531379http://dx.doi.org/10.1145/1531326.1531379］

Chen X J， Mottaghi R， Liu X B， Fidler S， Urtasun R and Yuille A. 2014. Detect what you can： detecting and representing objects using holistic models and body parts//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 1979-1986 ［DOI： 10.1109/CVPR.2014.254http://dx.doi.org/10.1109/CVPR.2014.254］

Choi S， Kim J T and Choo J. 2020. Cars can’t fly up in the sky： improving urban-scene segmentation via height-driven attention networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 9370-9380 ［DOI： 10.1109/CVPR42600.2020.00939http://dx.doi.org/10.1109/CVPR42600.2020.00939］

Cordts M， Omran M， Ramos S， Scharwachter T， Enzweiler M， Benenson R， Franke U， Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1604.01685.pdfhttps://arxiv.org/pdf/1604.01685.pdf

Csurka G and Perronnin F. 201l. An efficient approach to semantic segmentation. International Journal of computer Vision， 95（2）：198-212 ［DOI：10.1007/s11263-010-0344-8http://dx.doi.org/10.1007/s11263-010-0344-8］

Dai J，He K M and Sun J．2015. BoxSup：Exploiting bounding boxes to supervise convolutional networks for semantic segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1503.01640v1.pdfhttps://arxiv.org/pdf/1503.01640v1.pdf

Duan L J， Sun Q C， Qiao Y H， Chen J C and Cui G Q. 2021. Attention-aware and semantic-aware network for RGB-D indoor semantic segmentation. Chinese Journal Computers， 44（2）： 275-291

段立娟，孙启超，乔元华，陈军成，崔国勤. 2021.基于注意力感知和语义感知的RGB-D室内图像语义分割算法. 计算机学报， 44（2）：274-291

Durand T， Mordan T， Thome N and Cord M. 2017. WILDCAT： weakly supervised learning of deep ConvNets for image classification， pointwise localization and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 5957-5966 ［DOI： 10.1109/CVPR.2017.631http://dx.doi.org/10.1109/CVPR.2017.631］

Engelmann F， Kontogianni T， Hermans A and Leibe B. 2017. Exploring spatial context for 3D semantic segmentation of point clouds//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice， Italy： IEEE： 716-724 ［DOI： 10.1109/ICCVW.2017.90http://dx.doi.org/10.1109/ICCVW.2017.90］

Everingham M， Eslami S M A， Van Gool L， Williams C K I， Winn J and Zisserman A. 2015. The Pascal visual object classes challenge： a retrospective. International Journal of Computer Vision， 111（1）： 98-136 ［DOI： 10.1007/s11263-014-0733-5http://dx.doi.org/10.1007/s11263-014-0733-5］

Fu J， Liu J， Tian H J， Li Y， Bao Y J， Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3141-3149 ［DOI： 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326］

Geiger A， Lenz P， Stiller C and Urtasun R. 2013. Vision meets robotics： the KITTI dataset. The International Journal of Robotics Research， 32（11）： 1231-1237 ［DOI： 10.1177/0278364913491297http://dx.doi.org/10.1177/0278364913491297］

Ghiasi G and Fowlkes CC．2016．Laplacian reconstruction and refinement for semantic segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1605.02264.pdfhttps://arxiv.org/pdf/1605.02264.pdf

Girshick R，Donahue J，Darren T and Malik J．2014．Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition．Columbus， USA： IEEE： 580-587 ［DOI：10.1109/CVPR.2014.8lhttp://dx.doi.org/10.1109/CVPR.2014.8l］

Girshick R．2015．Fast R-CNN//Proceedings of 2015IEEE International Conference on Computer Vision．Boston， USA： IEEE： 1440-1448 ［DOI：10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169］

Goodfellow I J，Pouget A J，Mirza M，Xu B， Warde F D， Ozair S， Courville A and Bengio Y．2014．Generative adversarial nets ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1406.2661.pdfhttps://arxiv.org/pdf/1406.2661.pdf

Gould S， Fulton R and Koller D. 2009. Decomposing a scene into geometric and semantically consistent regions//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto， Japan： IEEE： #5459211 ［DOI： 10.1109/ICCV.2009.5459211http://dx.doi.org/10.1109/ICCV.2009.5459211］

Gupta S， Girshick R， Arbeláez P and Malik J. 2014. Learning rich features from RGB-D images for object detection and segmentation//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 345-360 ［DOI： 10.1007/978-3-319-10584-0_23http://dx.doi.org/10.1007/978-3-319-10584-0_23］

Hackel T， Wegner J D and Schindler K. 2016. Contour detection in unstructured 3D point clouds//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1610-1618 ［DOI： 10.1109/CVPR.2016.178http://dx.doi.org/10.1109/CVPR.2016.178］

Hariharan B， Arbeláez P， Bourdev L， Maji S and Malik J. 2011. Semantic contours from inverse detectors//Proceedings of 2011 International Conference on Computer Vision. Barcelona， Spain： IEEE： 991-998 ［DOI： 10.1109/ICCV.2011.6126343http://dx.doi.org/10.1109/ICCV.2011.6126343］

Hariharan B， Arbeláez P， Girshick R and Malik J. 2014. Simultaneous detection and segmentation//Proceedings of 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 297-312 ［DOI： 10.1007/978-3-319-10584-0_20http://dx.doi.org/10.1007/978-3-319-10584-0_20］

He K M， Gkioxari G， Doll􀅡r P and Girshick R. 2018. Mask R-CNN ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1703.06870.pdfhttps://arxiv.org/pdf/1703.06870.pdf

Hoffman J，Wang D，Yu F and Darrell T．2016. FCNs in the wild：Pixel-level adversarial and constraint-based adaptation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1612.02649.pdfhttps://arxiv.org/pdf/1612.02649.pdf

Hou Q B， Massiceti D， Dokania P K， Wei Y C， Cheng M M and Torr P H S. 2017. Bottom-up top-down cues for weakly-supervised semantic segmentation//Proceedings of the 11th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition. Venice， Italy： Springer： 263-277 ［DOI： 10.1007/978-3-319-78199-0_18http://dx.doi.org/10.1007/978-3-319-78199-0_18］

Hou Q B， Zhang L， Cheng M M and Feng J S. 2020. Strip pooling： rethinking spatial pooling for scene parsing//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 4002-4011 ［DOI： 10.1109/CVPR42600.2020.00406http://dx.doi.org/10.1109/CVPR42600.2020.00406］

Huang H， Kalogererakis E， Chaudhuri S. 2018. Learning local shape descriptors from part correspondences with multiview convolutional networks ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1706.04496.pdfhttps://arxiv.org/pdf/1706.04496.pdf

Iandola F N， Han S， Moskewicz M W， Ashraf K， Dally W J and Keutzer K. 2016. Squeezenet： Alexnet-level accuracy with 50x fewer parameters and < 0.5 model sizeMB ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1602.07360.pdfhttps://arxiv.org/pdf/1602.07360.pdf

Imad M，Doukhi O and Lee D J. 2021. Transfer learning based semantic segmentation for 3D object detection from point cloud. Sensors， 21（12）： #3964

Jin B， Segovia M V O and Susstrunk S．2017. Webly supervised semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： CVPR： 1705-1714 ［DOI： 10.1109/CVPR.2017.185http://dx.doi.org/10.1109/CVPR.2017.185］

Kolesnikov A and Lampern C H．2016. Seed， expand and constrain： three principles for weakly-supervised image segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1603.06098.pdfhttps://arxiv.org/pdf/1603.06098.pdf

Kozinski M，Simon L and Jurie F．2017. An adversarial regularisation for semi-supervised training of structured output neural networks ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1702.02382.pdfhttps://arxiv.org/pdf/1702.02382.pdf

Kirillov A， Wu Y X， He K M and Girshick R. 2020. PointRend： image segmentation as rendering//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 9796-9805 ［DOI： 10.1109/CVPR42600.2020.00982http://dx.doi.org/10.1109/CVPR42600.2020.00982］

Lai K， Bo L F， Ren X F and Fox D. 2011. A large-scale hierarchical multi-view RGB-D object dataset//Proceedings of 2011 IEEE International Conference on Robotics and Automation. Shanghai， China： IEEE： 1817-1824 ［DOI： 10.1109/ICRA.2011.5980382http://dx.doi.org/10.1109/ICRA.2011.5980382］

Le T and Duan Y. 2018. PointGrid： A Deep Network for 3D Shape Understanding//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 9204-9214 ［DOI： 10.1109/CVPR.2018.00959http://dx.doi.org/10.1109/CVPR.2018.00959］

Li X，Liu Z，Luo P，Loy C C and Tang X．2017. Not all pixels are equal： difficulty-aware semantic segmentation via deep layer cascade ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1704.01344v1.pdfhttps://arxiv.org/pdf/1704.01344v1.pdf

Li X， Zhong Z S， Wu J L， Yang Y B， Lin Z C and Liu H. 2019. Expectation-maximization attention networks for semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 9166-9175 ［DOI： 10.1109/ICCV.2019.00926http://dx.doi.org/10.1109/ICCV.2019.00926］

Li Z， Gan Y K， Liang X D， Yu Y Z， Cheng H and Lin L. 2016. LSTM-CF： unifying context modeling and fusion with LSTMs for RGB-D scene labeling//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 541-557 ［DOI： 10.1007/978-3-319-46475-6_34http://dx.doi.org/10.1007/978-3-319-46475-6_34］

Lin D，Dai J，Jia J，He K M and Sun J. 2016a. ScribbleSup： Scribble-supervised convolutional networks for semantic segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1604.05144v1.pdfhttps://arxiv.org/pdf/1604.05144v1.pdf

Lin G，Milan A，Shen C and Reid I. 2016b. RefineNet：Multi-path refinement networks with identity mappings for high-resolution semantic segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1611.06612.pdfhttps://arxiv.org/pdf/1611.06612.pdf

Lin T Y， Maire M， Belongie S， Hays J， Perona P， Ramanan D， Dollár P and Zitnick C L. 2014. Microsoft COCO： common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 740-755 ［DOI： 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48］

Liu C， Yuen J and Torralba A. 2009. Nonparametric scene parsing： label transfer via dense scene alignment//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 1972-1979 ［DOI： 10.1109/CVPRW.2009.5206536http://dx.doi.org/10.1109/CVPRW.2009.5206536］

Liu W， Rabinovich A and Berg A C. 2015. ParseNet： looking wider to see better ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1506.04579.pdfhttps://arxiv.org/pdf/1506.04579.pdf

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965］

Luc P， Couprie C， Chintala S and Verbeek J. 2016. Semantic Segmentation using Adversarial Networks ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1611.08408.pdfhttps://arxiv.org/pdf/1611.08408.pdf

Luo P，Wang G，Lin L and Wang X G．2017．Deep dual learning for semantic image segmentation. IEEE International Conference on Computer Vision．Venice，Italy：IEEE： 2737-2745

Ma L， Stuckler J， Kerl C and Cremers D. 2017. Multi-view deep learning for consistent semantic mapping with RGB-D cameras ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1703.08866.pdfhttps://arxiv.org/pdf/1703.08866.pdf

Mcever R A and Manjunath B S. 2020. PCAMs： Weakly Supervised Semantic Segmentation Using Point Supervision ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/2007.05615v1.pdfhttps://arxiv.org/pdf/2007.05615v1.pdf

Meng H Y， Gao L， Lai Y K and Manocha D. 2018. VV-Net： Voxel VAE Net with Group Convolutions for Point Cloud Segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.pdf/pdf/1811.04337.pdfhttps://arxiv.pdf/pdf/1811.04337.pdf

Mottaghi R，Chen X，Liu X，Cho N G，Lee S W，Fidler S，Urtasun R and Yuille A. 2014．The role of context for object detection and semantic segmentation in the wild //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 89l-898 ［DOI： 10.1109/CVPR.2014.119http://dx.doi.org/10.1109/CVPR.2014.119］

Milletari F，Navab N and Ahmadi S A．2016. V-Net：Fully convolutional neural networks for volumetric medical image segmentation//Proceedings of 2016 International Conference on 3D Vision. Stanford， USA：IEEE： 565-571

Noh H，Hong S and Han B. Learning deconvolution network for semantic segmentation//Proceedings of 2015. IEEE International Conference on Computer Vision．Santiago， Chile： IEEE： 1520-1528 ［DOI： 10.1109/ICCV.2015.178http://dx.doi.org/10.1109/ICCV.2015.178］

Pathak D，Krahenbuhl P and Darrell T．2015. Constrained convolutional neural networks for weakly supervised segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1506.03648.pdfhttps://arxiv.org/pdf/1506.03648.pdf

Peng C，Zhang X，Yu G，Luo G and Sun J．2017．Large kernel matters-Improve semantic segmentation by global convolutional network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii， USA： CVPR： 1743-1751 ［DOI： 10.1109/CVPR.2017.189http://dx.doi.org/10.1109/CVPR.2017.189］

Perazzi F， Pont-Tuset J， McWilliams B， Van Gool L， Gross M and Sorkine-Hornung A. 2016. A benchmark dataset and evaluation methodology for video object segmentation//Proceedings of 2016 IEEE Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 724-732 ［DOI： 10.1109/CVPR.2016.85http://dx.doi.org/10.1109/CVPR.2016.85］

Pinheiro P O and Collobert R. 2014. Recurrent convolutional neural networks for scene labeling//Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing， China： JMLR.org： I-82-I-90

Pinheiro P O and Conobert R．2015. From image-level to pixel-level labeling with convolutional networks ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1411.6228.pdfhttps://arxiv.org/pdf/1411.6228.pdf

Prest A， Leistner C， Civera J， Schmid C and Ferrari V. 2012. Learning object class detectors from weakly annotated video//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 3282-3289 ［DOI： 10.1109/cvpr.2012.6248065http://dx.doi.org/10.1109/cvpr.2012.6248065］

Qi C R， Su H， Mo K and Guibas L J. 2017. PointNet： Deep Learning on Point Sets for 3D Classification and Segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1612.00593.pdfhttps://arxiv.org/pdf/1612.00593.pdf

Qi X J， Liu Z Z， Shi J P， Zhao H S and Jia J Y. 2016. Augmented feedback in semantic segmentation under image level supervision//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 90-105 ［DOI： 10.1007/978-3-319-46484-8_6http://dx.doi.org/10.1007/978-3-319-46484-8_6］

Quadros A， Underwood J P and Douillard B. 2012. An occlusion-aware feature for range images//Proceedings of 2012 IEEE International Conference on Robotics and Automation. Saint Paul， USA： IEEE： 4428-4435 ［DOI： 10.1109/ICRA.2012.6225239http://dx.doi.org/10.1109/ICRA.2012.6225239］

Raj A， Maturana D and Scherer S. 2015. Multi-scale convolutional architecture for semantic segmentation ［EB/OL］. ［2022-03-17］. https://www.ri.cmu.edu/pub_files/2015/10/CMU-RI-TR_AmanRaj_revision2.pdfhttps://www.ri.cmu.edu/pub_files/2015/10/CMU-RI-TR_AmanRaj_revision2.pdf

Rajchl M，Lee M，Oktay O，Kamnitsas K，Passerat P J，Bai w，Rutherford M，Hajnal J，Kainz B and Rueckert D．2016. DeepCut： object segmentation from bounding box annotations using convolutional neural networks ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1605.07866.pdfhttps://arxiv.org/pdf/1605.07866.pdf

Ren S Q， He K M， Girshick R and Sun J. 2015. Faster R-CNN：Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）：1137-1149 ［DOI：10.1 109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Richtsfeld A，Morwald T，Prankl J，Zillich M and Vincze M. 2012. Segmentation of unknown objects in indoor environments//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Boston， USA： IEEE： 4791-4796

Ronneberger O，Fischer P and Brox T. 2015. U-Net：convolutional networks for biomedical image segmemation. Medical Image Computing and Computer-Assisted Intervention. 56（9）：234-241

Ros G， Sellart L， Materzynska J， Vazquez D and Lopez A M. 2016. The SYNTHIA dataset： a large collection of synthetic images for semantic segmentation of urban scenes//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 3234-3243 ［DOI： 10.1109/CVPR.2016.352http://dx.doi.org/10.1109/CVPR.2016.352］

Shen X Y， Hertzmann A， Jia J Y， Paris S， Price B， Shechtman E and Sachs I. 2016. Automatic portrait segmentation for image stylization. Computer Graphics Forum， 35（2）： 93-102 ［DOI： 10.1111/cgf.12814http://dx.doi.org/10.1111/cgf.12814］

Silberman N， Hoiem D， Kohli P and Fergus R. 2012. Indoor segmentation and support inference from RGBD images//Proceedings of the 12th European Conference on Computer Vision. Florence， Italy： Springer： 746-760 ［DOI： 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54］

Song S R， Lichtenberg S P and Xiao J X. 2015. Sun RGB-D： a RGB-D scene understanding benchmark suite//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 567-576 ［DOI： 10.1109/CVPR.2015.7298655http://dx.doi.org/10.1109/CVPR.2015.7298655］

Souly N，Spampinato C and Shah M．2017. Semi and weakly supervised semantic segmentation using generative adversarial network ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1703.09695.pdfhttps://arxiv.org/pdf/1703.09695.pdf

Su H， Jampani V， Sun D. 2018. SPLATNet： sparse lattice networks for point cloud processing//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2530-2539

Tatarchenko M， Park J， Koltun V and Zhou Q Y. 2018. Tangent Convolutions for Dense Prediction in 3D ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1807.02443v1.pdfhttps://arxiv.org/pdf/1807.02443v1.pdf

Thomas H， Qi C R， Deschaud J E， Marcotegui B， Goulette F and Guibas L J. 2019. KPConv： Flexible and Deformable Convolution for Point Clouds ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1904.08889.pdfhttps://arxiv.org/pdf/1904.08889.pdf

Uijlings J R， Sande K E， Gevers T and Smeulders A W. 2013. Selective search for object recognition. International Journal of Computer Vision. New York， USA： IJCV： 154-171 ［DOI： 10.1007/s11263-013-0620-5http://dx.doi.org/10.1007/s11263-013-0620-5］

Visin F， Romero A， Cho K， Matteucci M， Ciccone M， Kastner K， Bengio Y and Courville A. 2016. ReSeg： a recurrent neural network-based model for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas， USA： IEEE： 426-433 ［DOI： 10.1109/CVPRW.2016.60http://dx.doi.org/10.1109/CVPRW.2016.60］

Wang X L， Girshick R， Gpta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7794-7803 ［DOI： 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813］

Wei Y，Liang X，Chen Y，Shen X，Cheng M M，Feng J，Zhao Y and Yan S．2017. STC： A simple to complex framework for weakly-supervised semantic segmentation ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1509.03150.pdfhttps://arxiv.org/pdf/1509.03150.pdf

Wu B，Wan A，Yue X and Keutzer K. 2018a. SqueezeSeg：convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3d LiDAR point cloud ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1710.07368.pdfhttps://arxiv.org/pdf/1710.07368.pdf

Wu B，Zhou X Y，Zhao S C，Yue X Y and Keutzer K. 2018b. SqueezeSegV2： improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1809.08495.pdfhttps://arxiv.org/pdf/1809.08495.pdf

Xiao J X， Owens A and Torralba A. 2013. SUN3D： a database of big spaces reconstructed using SfM and object labels//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney， Australia： IEEE： 1625-1632 ［DOI： 10.1109/ICCV.2013.458http://dx.doi.org/10.1109/ICCV.2013.458］

Yi L， Kim V G， Ceylan D， Shen I C， Yan M Y， Su H， Lu C W， Huang Q X， Sheffer A and Guibas L. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics， 35（6）： #210 ［DOI： 10.1145/2980179.2980238http://dx.doi.org/10.1145/2980179.2980238］

Yu D J， Wang H L， Chen P Q and Wei Z H. 2014. Mixed pooling for convolutional neural networks//Proceedings of the 9th International Conference on Rough Sets and Knowledge Technology. Shanghai， China： Springer： 364-375 ［DOI： 10.1007/978-3-319-11740-9_34http://dx.doi.org/10.1007/978-3-319-11740-9_34］

Zeng A， Yu K， Song S， Suo D， Jr E W， Rodriguez A and Xiao J. 2016. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge ［EB/OL］. ［2022-03-17］. http://arxiv.org/pdf/1609.09475.pdfhttp://arxiv.org/pdf/1609.09475.pdf

Zhao H，Shi J，Qi X，Wang X and Jia J. 2017a. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii， USA： CVPR： 6230-6239 ［DOI：10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660］

Zhao H， Qi X，Shen X， Shi J and Jia J. 2017b. ICNet for real-time semantic segmentation on high-resolution images ［EB/OL］. ［2022-03-17］. https://arxiv.org/pdf/1704.08545.pdfhttps://arxiv.org/pdf/1704.08545.pdf

Zhu Z， Xu M D， Bai S， Huang T T and Bai X. 2019. Asymmetric non-local neural networks for semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 593-602 ［DOI： 10.1109/ICCV.2019.00068http://dx.doi.org/10.1109/ICCV.2019.00068］

文章被引用时，请邮件提醒。

提交

通道注意力嵌入的Transformer图像超分辨率重构