深度卷积神经网络图像语义分割研究进展

青晨; 禹晶; 肖创柏; 段娟

doi:10.11834/jig.190355

综述 | 浏览量 : 0 下载量: 109 CSCD: 22

PDF
导出
分享
收藏
专辑

深度卷积神经网络图像语义分割研究进展
Deep convolutional neural network for semantic image segmentation
2020年25卷第6期页码：1069-1090
收稿：2019-07-29，

修回：2019-11-8，

录用：2019-11-15，

纸质出版：2020-06-16
DOI： 10.11834/jig.190355
稿件说明：

移动端阅览

青晨, 禹晶, 肖创柏, 段娟. 深度卷积神经网络图像语义分割研究进展[J]. 中国图象图形学报, 2020,25(6):1069-1090. DOI： 10.11834/jig.190355.

Chen Qing, Jing Yu, Chuangbai Xiao, Juan Duan. Deep convolutional neural network for semantic image segmentation[J]. Journal of Image and Graphics, 2020, 25(6): 1069-1090. DOI： 10.11834/jig.190355.

摘要

在计算机视觉领域中，语义分割是场景解析和行为识别的关键任务，基于深度卷积神经网络的图像语义分割方法已经取得突破性进展。语义分割的任务是对图像中的每一个像素分配所属的类别标签，属于像素级的图像理解。目标检测仅定位目标的边界框，而语义分割需要分割出图像中的目标。本文首先分析和描述了语义分割领域存在的困难和挑战，介绍了语义分割算法性能评价的常用数据集和客观评测指标。然后，归纳和总结了现阶段主流的基于深度卷积神经网络的图像语义分割方法的国内外研究现状，依据网络训练是否需要像素级的标注图像，将现有方法分为基于监督学习的语义分割和基于弱监督学习的语义分割两类，详细阐述并分析这两类方法各自的优势和不足。本文在PASCAL VOC（pattern analysis，statistical modelling and computational learning visual object classes）2012数据集上比较了部分监督学习和弱监督学习的语义分割模型，并给出了监督学习模型和弱监督学习模型中的最优方法，以及对应的MIoU（mean intersection-over-union）。最后，指出了图像语义分割领域未来可能的热点方向。

Abstract

Semantic segmentation is a fundamental task in computer vision applications

such as scene analysis and behavior recognition. The recent years have witnessed significant progress in semantic image segmentation based on deep convolutional neural network(DCNN). Semantic segmentation is a type of pixel-level image understanding with the objective of assigning a semantic label for each pixel of a given image. Object detection only locates the bounding box of the object

while the task of semantic segmentation is to segment an image into several meaningful objects and then assign a specific semantic label to each object. The difficulty of image semantic segmentation mostly originates from three aspects: object

category

and background. From the perspective of objects

when an object is in different lighting

angle of view

and distance

or when it is still or moving

the image taken will significantly differ. Occlusion may also occur between adjacent objects. In terms of categories

objects from the same category have dissimilarities and objects from different categories have similarities. From the background perspective

a simple background helps output accurate semantic segmentation results

but the background of real scenes is complex. In this study

we provide a systematic review of recent advances in DCNN methods for semantic segmentation. In this paper

we first discuss the difficulties and challenges in semantic segmentation and provide datasets and quantitative metrics for evaluating the performance of these methods. Then

we detail how recent CNN-based semantic segmentation methods work and analyze their strengths and limitations. According to whether to use pixel-level labeled images to train the network

these methods are grouped into two categories: supervised and weakly supervised learning-based semantic segmentation. Supervised semantic segmentation requires pixel-level annotations. By contrast

weakly supervised semantic segmentation aims to segment images by class labels

bounding boxes

and scribbles. In this study

we divide supervised semantic segmentation models into four groups: encoder-decoder methods

feature map-based methods

probability map-based methods

and various strategies. In an encoder-decoder network

an encoder module gradually reduces feature maps and captures high semantic information

while a decoder module gradually recovers spatial information. At present

most state-of-the-art deep CNN for semantic segmentation originate from a common forerunner

i.e.

the fully convolutional network (FCN)

which is an encoder-decoder network. FCN transforms existing and well-known classification models

such as AlexNet

visual geometry group 16-layer net (VGG16)

GoogLeNet

and ResNet

into fully convolutional models by replacing fully connected layers with convolutional ones to output spatial maps instead of classification scores. Such maps are upsampled using deconvolutions to produce dense per-pixel labeled outputs. A feature map-based method aims to take complete advantage of the context information of a feature map

including its spatial context (position) and scale context (size)

facilitating the segmentation and parsing of an image. These methods obtain the spatial and scale contexts by increasing the receptive field and fusing multiscale information

effectively improving the performance of the network. Some models

such as the pyramid scene parsing network or Deeplab v3

perform spatial pyramid pooling at several different scales (including image-level pooling) or apply several parallel atrous convolutions with different rates. These models have presented promising results by involving the spatial and scale contexts. A probability map-based method combines the semantic context (probability) and the spatial context (location) with postprocess probability score maps and semantic label predictions primarily through the use of a probabilistic graph model. A probabilistic graph is a probabilistic model that uses a graph to present conditional dependence between random variables. It is the combination of probability and graph theories. Probabilistic graph models have several types

such as conditional random fields (CRFs)

Markov random fields

and Bayesian networks. Object boundary is refined and network performance is improved by establishing semantic relationships between pixels. This family of approaches typically includes CRF-recurrent neural networks

deep parsing networks

and EncNet. Some methods combine two or more of the aforementioned strategies to significantly improve the segmentation performance of a network

such as a global convolutional network

DeepLab v1

DeepLab v2

DeepLab v3+

and a discriminative feature network. In accordance with the type of weak supervision used by a training network

weakly supervised semantic segmentation methods are divided into four groups: class label-based

bounding box-based

scribble-based

and various forms of annotations. Class-label annotations only indicate the presence of an object. Thus

the substantial problem in class label-based methods is accurately assigning image-level labels to their corresponding pixels. In general

this problem can be solved by using the multiple instance learning-based strategy to train models for semantic segmentation or adopting an alternative training procedure based on the expectation-maximization algorithm to dynamically predict semantic foreground and background pixels. A recent work attempted to increase the quality of an object localization map by integrating a seed region growing technique into the segmentation network

significantly increasing pixel accuracy. Bounding box-based methods use bounding boxes and class labels as supervision information. By using region proposal methods and the traditional image segmentation theory to generate candidate segmentation masks

a convolutional network is trained under the supervision of these approximate segmentation masks. BoxSup proposes a recursive training procedure wherein a convolutional network is trained under the supervision of segment object proposals. In turn

the updated network improves the segmentation mask used for training. Scribble-supervised training methods apply a graphical model to propagate information from scribbles to unmarked pixels on the basis of spatial constraints

appearance

and semantic content

accounting for two tasks. The first task is to propagate the class labels from scribbles to other pixels and fully annotate an image. The second task is to learn a convolutional network for semantic segmentation. We compare some semantic segmentation methods of supervised learning and weakly supervised learning on the PASCAL VOC (pattern analysis

statistical modelling and computational learning visual object classes) 2012 dataset. We also give the optimal methods of supervised learning methools and wedakly supervised learning methods

and the corresponding MIoU(mean intersection-over-union). Lastly

we present related research areas

including video semantic segmentation

3D dataset semantic segmentation

real-time semantic segmentation

and instance segmentation. Image semantic segmentation is a popular topic in the fields of computer vision and artificial intelligence. Many applications require accurate and efficient segmentation models

e.g.

autonomous driving

indoor navigation

and smart medicine. Thus

further work should be conducted on semantic segmentation to improve the accuracy of object boundaries and the performance of semantic segmentation.

关键词

Keywords

references

Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet:a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481-2495[DOI:10.1109/TPAMI.2016.2644615]

Bilinski P and Prisacariu V. 2018. Dense decoder shortcut connections for single-pass semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6596-6605[ DOI: 10.1109/CVPR.2018.00690] http://dx.doi.org/10.1109/CVPR.2018.00690] .

Chaurasia A and Culurciello E. 2017. LinkNet: exploiting encoder representations for efficient semantic segmentation//Proceedings of 2017 IEEE Visual Communications and Image Processing. St. Petersburg, FL, USA: IEEE: 1-4[ DOI: 10.1109/VCIP.2017.8305148 http://dx.doi.org/10.1109/VCIP.2017.8305148 ]

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2016. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. 2016-06-02[2019-06-06] . https://arxiv.org/pdf/1412.7062.pdf https://arxiv.org/pdf/1412.7062.pdf

Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL]. 2017-12-05[2019-06-06] . https://arxiv.org/pdf/1706.05587.pdf https://arxiv.org/pdf/1706.05587.pdf

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018a. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834-848[DOI:10.1109/TPAMI.2017.2699184]

Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018b. Encoder-decoder with atrous separable convolution for semantic image segmentation[EB/OL]. 2018-08-22[2019-06-06] . https://arxiv.org/pdf/1802.02611v1.pdf https://arxiv.org/pdf/1802.02611v1.pdf

Dai J F, He K M and Sun J. 2015. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE Computer Society: 1635-1643[ DOI: 10.1109/ICCV.2015.191 http://dx.doi.org/10.1109/ICCV.2015.191 ]

Durand T, Mordan T, Thome N and Cord M. 2017. WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 5957-5966[ DOI: 10.1109/CVPR.2017.631 http://dx.doi.org/10.1109/CVPR.2017.631 ]

Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P and Garcia-Rodriguez J. 2018. A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing, 70:41-65[DOI:10.1016/j.asoc.2018.05.018]

Geng Q C, Zhou Z and Cao X C. 2018. Survey of recent progress in semantic image segmentationwith CNNs. Science China Information Sciences, 61(5):051101[DOI:10.1007/s11432-017-9189-6]

Hinton G E and Salakhutdinov R R. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507[DOI:10.1126/science.1127647]

Hong S, Yeo D, Kwak S, Lee H and Han B. 2017. Weakly supervised semantic segmentation using web-crawled videos//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 2224-2232[ DOI: 10.1109/CVPR.2017.239 http://dx.doi.org/10.1109/CVPR.2017.239 ]

Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 2261-2269[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Huang K Q, Ren W Q and Tan T N. 2014. A review on image object classification and detection. Chinese Journal of Computers, 37(6):1225-1240

黄凯奇, 任伟强, 谭铁牛. 2014.图像物体分类与检测算法综述.计算机学报, 37(6):1225-1240[DOI:10.3724/SP.J.1016.2014.01225]

Huang Z L, Wang X G, Wang J S, Liu W Y and Wang J D. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7014-7023[ DOI: 10.1109/CVPR.2018.00733 http://dx.doi.org/10.1109/CVPR.2018.00733 ]

Khoreva A, Benenson R, Hosang J, Hein M and Schiele B. 2017. Simple does it: weakly supervised instance and semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1665-1674[ DOI: 10.1109/CVPR.2017.181 http://dx.doi.org/10.1109/CVPR.2017.181 ]

Kolesnikov A and Lampert C H. 2016. Seed, expand and constrain: three principles for weakly-supervised image segmentation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer International Publishing: 695-711[ DOI: 10.1007/978-3-319-46493-0_42 http://dx.doi.org/10.1007/978-3-319-46493-0_42 ]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM: 1097-1105

Lateef F and Ruichek Y. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing, 338:321-348[DOI:10.1016/j.neucom.2019.02.003]

Li X, Jie Z Q, Wang W, Liu C S, Yang J M, Shen X H, Lin Z, Chen Q, Yan S C and Feng J S. 2017. FoveaNet: perspective-aware urban scene parsing//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE Computer Society: 784-792[ DOI: 10.1109/ICCV.2017.91 http://dx.doi.org/10.1109/ICCV.2017.91 ]

Lin D, Dai J F, Jia J Y, He K M and Sun J. 2016a. ScribbleSup: scribble-supervised convolutional networks for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE Computer Society: 3159-3167[ DOI: 10.1109/CVPR.2016.344 http://dx.doi.org/10.1109/CVPR.2016.344 ]

Lin G S, Shen C H, van den Hengel A and Reid I. 2016b. Efficient piecewise training of deep structured models for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE Computer Society: 3194-3203[ DOI: 10.1109/CVPR.2016.348 http://dx.doi.org/10.1109/CVPR.2016.348 ]

Lin G S, Milan A, Shen C H and Reid I. 2017. RefineNet: multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5168-5177[ DOI: 10.1109/CVPR.2017.549 http://dx.doi.org/10.1109/CVPR.2017.549 ]

Liu Z W, Li X X, Luo P, Loy C C and Tang X O. 2015. Semantic image segmentation via deep parsing network//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1377-1385[ DOI: 10.1109/ICCV.2015.162 http://dx.doi.org/10.1109/ICCV.2015.162 ]

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE Computer Society: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Mostajabi M, Yadollahpour P and Shakhnarovich G. 2014. Feedforward semantic segmentation with zoom-out features//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE Computer Society: 3376-3385[ DOI: 10.1109/CVPR.2015.7298959 http://dx.doi.org/10.1109/CVPR.2015.7298959 ]

Noh H, Hong S and Han B. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE Computer Society: 1520-1528[ DOI: 10.1109/ICCV.2015.178 http://dx.doi.org/10.1109/ICCV.2015.178 ]

Oh S J, Benenson R, Khoreva A, Akata Z, Fritz M and Schiele B. 2017. Exploiting saliency for object segmentation from image level labels//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 5038-5047[ DOI: 10.1109/CVPR.2017.535 http://dx.doi.org/10.1109/CVPR.2017.535 ]

Papandreou G, Chen L C, Murphy K P and Yuille A L. 2015. Weakly-and semi-supervised learningof a deep convolutional network for semantic image segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE Computer Society: 1742-1750[ DOI: 10.1109/ICCV.2015.203 http://dx.doi.org/10.1109/ICCV.2015.203 ]

Pathak D, Krähenbühl P and Darrell T. 2015. Constrained convolutional neural networks for weakly supervised segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1796-1804[ DOI: 10.1109/ICCV.2015.209 http://dx.doi.org/10.1109/ICCV.2015.209 ]

Peng C, Zhang X Y, Yu G, Luo G M and Sun J. 2017. Large kernel matters-improve semantic segmentation by global convolutional network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1743-1751[ DOI: 10.1109/CVPR.2017.189 http://dx.doi.org/10.1109/CVPR.2017.189 ]

Pinheiro P O and Collobert R. 2015. From image-level to pixel-level labeling with convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1713-1721[ DOI: 10.1109/CVPR.2015.7298780 http://dx.doi.org/10.1109/CVPR.2015.7298780 ]

Ronneberger O, Fischer P, Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer International Publishing: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]

Rother C, Kolmogorov V and Blake A. 2004. "GrabCut": interactive foreground extraction using iterated graph cuts//ACM SIGGRAPH 2004. Los Angeles, California: ACM: 309-314[ DOI: 10.1145/1186562.1015720 http://dx.doi.org/10.1145/1186562.1015720 ]

Roy A and Todorovic S. 2017. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 7282-7291[ DOI: 10.1109/CVPR.2017.770 http://dx.doi.org/10.1109/CVPR.2017.770 ]

Shen T, Lin G S, Shen C H and Reid I. 2018. Bootstrapping the performance of webly supervised semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1363-1371[ DOI: 10.1109/CVPR.2018.00148 http://dx.doi.org/10.1109/CVPR.2018.00148 ]

Shi J B and Malik J. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888-905[DOI:10.1109/34.868688]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. 2015-04-10[2019-06-06] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Tang M, Djelouah A, Perazzi F, Boykov Y and Schroers C. 2018. Normalized cut loss for weakly-supervised CNN segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1818-1827[ DOI: 10.1109/CVPR.2018.00195 http://dx.doi.org/10.1109/CVPR.2018.00195 ]

Tian X, Wang L and Ding Q. 2019. Review of image semantic segmentation based on deep learning. Journal of Software, 30(2):440-468

田萱, 王亮, 丁琪. 2019.基于深度学习的图像语义分割方法综述.软件学报, 30(2):440-468[DOI:10.13328/j.cnki.jos.005659]

Vemulapalli R, Tuzel O, Liu M Y and Chellappa R. 2016. Gaussian conditional random field network for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE Computer Society: 3224-3233[ DOI: 10.1109/CVPR.2016.351 http://dx.doi.org/10.1109/CVPR.2016.351 ]

Wang X, You S D, Li X and Ma H M. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1354-1362[ DOI: 10.1109/CVPR.2018.00147 http://dx.doi.org/10.1109/CVPR.2018.00147 ]

Wei Y C, Feng J S, Liang X D, Cheng M M, Zhao Y and Yan S C. 2017a. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6488-6496[ DOI: 10.1109/CVPR.2017.687 http://dx.doi.org/10.1109/CVPR.2017.687 ]

Wei Y C, Liang X D, Chen Y P, Shen X H, Cheng M M, Feng J S, Zhao Y and Yan S C. 2017b. STC:a simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2314-2320[DOI:10.1109/TPAMI.2016.2636150]

Wei Y C, Liang X D, Chen Y P, Jie Z Q, Xiao Y H, Zhao Y and Yan S C. 2016a. Learning to segment with image-level annotations. Pattern Recognition, 59:234-244[DOI:10.1016/j.patcog.2016.01.015]

Wei Y C, Xia W, Lin M, Huang J S, Ni B B, Dong J, Zhao Y and Yan S C. 2016b. HCP:a flexible CNN framework for multi-label image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1901-1907[DOI:10.1109/TPAMI.2015.2491929]

Xu J, Schwing A G and Urtasun R. 2015. Learning to segment under various forms of weak supervision//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 3781-3790[ DOI: 10.1109/CVPR.2015.7299002 http://dx.doi.org/10.1109/CVPR.2015.7299002 ]

Yang M K, Yu K, Zhang C, Li Z W and Yang K Y. 2018. DenseASPP for semantic segmentation in street scenes//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 3684-3692[ DOI: 10.1109/CVPR.2018.00388 http://dx.doi.org/10.1109/CVPR.2018.00388 ]

Yu C Q, Wang J B, Peng C, Gao C X, Yu G and Sang N. 2018. Learning a discriminative feature network for semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1857-1866[ DOI: 10.1109/CVPR.2018.00199] http://dx.doi.org/10.1109/CVPR.2018.00199] .

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions[EB/OL]. 2016-04-30[2019-06-06] . https://arxiv.org/pdf/1511.07122v2.pdf https://arxiv.org/pdf/1511.07122v2.pdf

Zhang H, Dana K, Shi J P, Zhang Z Y, Wang X G, Tyagi A and Agrawal A. 2018. Context encoding for semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7151-7160[ DOI: 10.1109/CVPR.2018.00747 http://dx.doi.org/10.1109/CVPR.2018.00747 ]

Zhang H, Xue J and Dana K. 2017. Deep TEN: texture encoding network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 2896-2905[ DOI: 10.1109/CVPR.2017.309 http://dx.doi.org/10.1109/CVPR.2017.309 ]

Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society: 6230-6239[ DOI: 10.1109/CVPR.2017.660 http://dx.doi.org/10.1109/CVPR.2017.660 ]

Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z Z, Du D L, Huang C and Torr P H S. 2015. Conditional random fields as recurrent neural networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE Computer Society: 1529-1537[ DOI: 10.1109/ICCV.2015.179 http://dx.doi.org/10.1109/ICCV.2015.179 ]

Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 2921-2929[ DOI: 10.1109/CVPR.2016.319 http://dx.doi.org/10.1109/CVPR.2016.319 ]