结合特征图切分的图像语义分割
Feature map slice for semantic segmentation
- 2019年24卷第3期 页码:464-473
收稿:2018-07-04,
修回:2018-9-4,
纸质出版:2019-03-16
DOI: 10.11834/jig.180402
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-07-04,
修回:2018-9-4,
纸质出版:2019-03-16
移动端阅览
目的
2
基于全卷积神经网络的图像语义分割研究已成为该领域的主流研究方向。然而,在该网络框架中由于特征图的多次下采样使得图像分辨率逐渐下降,致使小目标丢失,边缘粗糙,语义分割结果较差。为解决或缓解该问题,提出一种基于特征图切分的图像语义分割方法。
方法
2
本文方法主要包含中间层特征图切分与相对应的特征提取两部分操作。特征图切分模块主要针对中间层特征图,将其切分成若干等份,同时将每一份上采样至原特征图大小,使每个切分区域的分辨率增大;然后,各个切分特征图通过参数共享的特征提取模块,该模块中的多尺度卷积与注意力机制,有效利用各切块的上下文信息与判别信息,使其更关注局部区域的小目标物体,提高小目标物体的判别力。进一步,再将提取的特征与网络原输出相融合,从而能够更高效地进行中间层特征复用,对小目标识别定位、分割边缘精细化以及网络语义判别力有明显改善。
结果
2
在两个城市道路数据集CamVid以及GATECH上进行验证实验,论证本文方法的有效性。在CamVid数据集上平均交并比达到66.3%,在GATECH上平均交并比达到52.6%。
结论
2
基于特征图切分的图像分割方法,更好地利用了图像的空间区域分布信息,增强了网络对于不同空间位置的语义类别判定能力以及小目标物体的关注度,提供更有效的上下文信息和全局信息,提高了网络对于小目标物体的判别能力,改善了网络整体分割性能。
Objective
2
Deep convolutional neural networks have recently shown outstanding performances in object recognition and have also been the first choice for dense classification problems
such as semantic segmentation. Fully convolutional network based methods have become the main research direction in the field of image semantic segmentation. However
repeated downsampling operations in these methods
such as pooling or convolution striding
lead to a significant decrease in the initial image resolution
which results in poor object delineation
small target losing
and weak segmentation output. Although some studies have solved this problem in recent years
determining how to effectively handle this problem remains an open question and deserves further attention. This study proposes a feature map slice module for semantic segmentation to solve this problem.
Method
2
The proposed method mainly includes two parts:middle layer feature map segmentation and corresponding feature extraction network. The feature map slice module mainly focuses on the middle layer feature map. The feature map is sliced into several small cubes
and then each cube is upsampled to the corresponding resolution of the original feature map
which enlarges the small target in the local area. Each cube is equivalent to a subregion of the original feature map by the proposed feature map slice module. After upsampling these cubes
the objects in these subregions are enlarged. Thus
the small objects in these regions can be regarded as relatively large objects
which are difficult to detect through the entire feature map. Therefore
in the process of feature extraction
attention must be focused on the small target objects in these subregions
which are difficult to detect if we handle the entire feature map. A weight-shared feature extraction network is thus designed for sliced feature maps. The feature extraction network adopts multiple convolution operations (different kernel sizes) to extract different scale feature information. For each input of the network
the dimension is reduced to half to save memory and dilation convolution is adopted to enlarge the network's receptive field. We then concatenate a difficult feature map (obtained by different convolution operations) and add a channel-attention operation. The feature extraction network combines multi-scale convolution and attention mechanism; when subregions are passing through the feature extraction network
it can extract different semantic category information from corresponding subregions
as well as provide contextual and global information and discriminant information of each slice effectively. Accordingly
we can focus on small objects in local areas and improve the discriminability of small target objects. Each cube passes through the feature extraction network. The extracted feature in the corresponding position is assembled and the entire mosaic feature map is acquired. The network original output is upsampled and fused with the mosaic feature map by element-wise max operation. In this way
the middle-layer feature can be reused efficiently. To utilize the middlelayer feature information
this module is introduced at multiple scales
which enhances the capability of extracting small target characteristics and spatial information in local areas. It also utilizes the semantic information in different scales and exhibits an obvious improvement for extracting small target features
refining segmentation edge
and enhancing network discrimination.
Result
2
The proposed method is verified on two urban scene-understanding datasets
namely
CamVid and GATECH. Both datasets contain many common urban scene objects
such as building
car
and cyclist. Several ablation experiments are conducted on the two datasets and excellent performances are achieved. In particular
intersection-over-union scores of 66.3 and 52.6 are acquired on CamVid and GATECH
respectively.
Conclusion
2
The proposed method utilizes the spatial distribution information of images
enhances the network capability to determine the semantic categories of different spatial locations
pays considerable attention to small target objects
and provides effective context and global information. The proposed method is expanded into different resolutions of the network considering that different resolutions can provide rich-scale information. Thus
we utilize middle layer feature information
improve the network capability to discriminate small target objects
and enhance the overall segmentation performance of the network.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 1-9.[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL]. 2015-04-10[2018-05-01] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf .
Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. 2016-06-07[2018-05-01] . https://arxiv.org/pdf/1412.7062.pdf https://arxiv.org/pdf/1412.7062.pdf .
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[EB/OL]. 2016-03-04[2018-05-01] . https://arxiv.org/pdf/1511.07122v2.pdf https://arxiv.org/pdf/1511.07122v2.pdf .
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1520-1528.[ DOI: 10.1109/ICCV.2015.178 http://dx.doi.org/10.1109/ICCV.2015.178 ]
Badrinarayanan V, Kendall A, Cipolla R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.[DOI:10.1109/TPAMI.2016.2644615]
Wang Y, Liu J, Yan J, et al. Objectness-aware semantic segmentation[C]//Proceedings of the 24nd ACM on Multimedia Conference. Amsterdam, Netherlands: ACM, 2016: 307-311.[ DOI: 10.1145/2964284.2967232 http://dx.doi.org/10.1145/2964284.2967232 ]
Lin G S, Milan A, Shen C H, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 1925-1934.[ DOI: 10.1109/CVPR.2017.549 http://dx.doi.org/10.1109/CVPR.2017.549 ]
Jégou S, Drozdzal M, Vazquez D, et al. The one hundred layers tiramisu: fully convolutional DenseNets for semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE, 2017: 1175-1183.[ DOI: 10.1109/CVPRW.2017.156 http://dx.doi.org/10.1109/CVPRW.2017.156 ]
Fu J, Liu J, Wang Y H, et al. Densely connected deconvolutional network for semantic segmentation[C]//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE, 2017: 3085-3089.[ DOI: 10.1109/ICIP.2017.8296850 http://dx.doi.org/10.1109/ICIP.2017.8296850 ]
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 2261-2269.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Everingham M, Eslami S, van Gool L, et al. The PASCAL visual object classes challenge:a retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136.[DOI:10.1007/s11263-014-0733-5]
Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 3213-3223.[ DOI: 10.1109/CVPR.2016.350 http://dx.doi.org/10.1109/CVPR.2016.350 ]
Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video:a high-definition ground truth database[J]. Pattern Recognition Letters, 2009, 30(2):88-97.[DOI:10.1016/j.patrec.2008.04.005]
Raza S H, Grundmann M, Essa I. Geometric context from videos[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 3081-3088.[ DOI: 10.1109/CVPR.2013.396 http://dx.doi.org/10.1109/CVPR.2013.396 ]
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer, 2015: 234-241.[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Ghiasi G, Fowlkes C C. Laplacian pyramid reconstruction and refinement for semantic segmentation[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016, 519-534.[ DOI: 10.1007/978-3-319-46487-9_32 http://dx.doi.org/10.1007/978-3-319-46487-9_32 ]
Peng C, Zhang X Y, Yu G, et al. Large kernel matters——improve semantic segmentation by global convolutional network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 4353-4361.[ DOI: 10.1109/CVPR.2017.189 http://dx.doi.org/10.1109/CVPR.2017.189 ]
Chen L C, Zhu Y K, Papandreou G, et al. Encoder-Decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany. Springer, 2018, 833-851.[ DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49 ]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, Florida, USA: ACM, 2014: 675-678.[ DOI: 10.1145/2647868.2654889 http://dx.doi.org/10.1145/2647868.2654889 ]
Kendall A, Badrinarayanan V, Cipolla R. Bayesian SegNet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding[C]//Proceedings of 2017 British Machine Vision Conference, London, UK, BMVA press, 2017
Kundu A, Vineet V, Koltun V. Feature space optimization for semantic video segmentation[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 3168-3175.[ DOI: 10.1109/CVPR.2016.345 http://dx.doi.org/10.1109/CVPR.2016.345 ]
Wang Y H, Liu J, Li Y, et al. Hierarchically supervised deconvolutional network for semantic video segmentation[J]. Pattern Recognition, 2017, 64:437-445.[DOI:10.1016/j.patcog.2016.09.046]
Tran D, Bourdev L, Fergus R, et al. Deep end2end voxel2voxel prediction[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, NV, USA: IEEE, 2016: 402-409.[ DOI: 10.1109/CVPRW.2016.57 http://dx.doi.org/10.1109/CVPRW.2016.57 ]
相关作者
相关机构
京公网安备11010802024621