残差密集空间金字塔网络的城市遥感图像分割
Residual dense spatial pyramid network for urbanremote sensing image segmentation
- 2020年25卷第12期 页码:2656-2664
收稿:2019-10-30,
修回:2020-3-13,
录用:2020-3-20,
纸质出版:2020-12-16
DOI: 10.11834/jig.190557
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-10-30,
修回:2020-3-13,
录用:2020-3-20,
纸质出版:2020-12-16
移动端阅览
目的
2
遥感图像语义分割是根据土地覆盖类型对图像中每个像素进行分类,是遥感图像处理领域的一个重要研究方向。由于遥感图像包含的地物尺度差别大、地物边界复杂等原因,准确提取遥感图像特征具有一定难度,使得精确分割遥感图像比较困难。卷积神经网络因其自主分层提取图像特征的特点逐步成为图像处理领域的主流算法,本文将基于残差密集空间金字塔的卷积神经网络应用于城市地区遥感图像分割,以提升高分辨率城市地区遥感影像语义分割的精度。
方法
2
模型将带孔卷积引入残差网络,代替网络中的下采样操作,在扩大特征图感受野的同时能够保持特征图尺寸不变;模型基于密集连接机制级联空间金字塔结构各分支,每个分支的输出都有更加密集的感受野信息;模型利用跳线连接跨层融合网络特征,结合网络中的高层语义特征和低层纹理特征恢复空间信息。
结果
2
基于ISPRS(International Society for Photogrammetry and Remote Sensing)Vaihingen地区遥感数据集展开充分的实验研究,实验结果表明,本文模型在6种不同的地物分类上的平均交并比和平均F
1
值分别达到69.88%和81.39%,性能在数学指标和视觉效果上均优于SegNet、pix2pix、Res-shuffling-Net以及SDFCN(symmetrical dense-shortcut fully convolutional network)算法。
结论
2
将密集连接改进空间金字塔池化网络应用于高分辨率遥感图像语义分割,该模型利用了遥感图像不同尺度下的特征、高层语义信息和低层纹理信息,有效提升了城市地区遥感图像分割精度。
Objective
2
Remote sensing image semantic segmentation
in which each pixel in an image is classified according to the land cover type
presents an important research direction in the field of remote sensing image processing. However
accurately segmenting and extracting features from remote sensing images is difficult due to the wide coverage of these images and the large-scale difference and complex boundaries among these features. Meanwhile
the traditional remote sensing image processing methods are inefficient
inaccurate
and require much expertise. Convolutional neural networks are deep learning networks that are suitable for processing data with grid structures
such as 1D data with time series features (e.g.
speech) and image data with 2D pixel matrix grids. Given its multi-layer structure
a convolutional neural network can automatically learn features at different levels. This network also has two features that facilitate image processing. First
a convolutional neural network uses the 2D characteristics of an image in feature extraction. Given the high correlation among adjacent pixels in an image
the neuron nodes in the network do not need to connect all pixels; only a local connection is required to extract features. Second
convolution kernel parameters are shared when the convolutional neural network performs convolution operations
and features at different positions of an image use the same convolution kernel to calculate their values
there by greatly reducing the model parameters. In this paper
a full convolutional neural network based on a residual dense spatial pyramid is applied in urban remote sensing image segmentation to achieve an accurate semantic segmentation of high-resolution remote sensing images.
Method
2
To improve the semantic segmentation precision of high-resolution urban remote sensing images
we first take a 101-layer residual convolutional network as our backbone in extracting remote sensing image feature maps. When extracting features by using classic convolutional neural networks
the repeated concatenation of max-pooling and striding at consecutive layers significantly reduces the spatial resolution of the feature maps
typically by a factor of 32 across each direction in general deep convolutional neural networks(DCNNs)
thereby leading to spatial information loss. Semantic segmentation is a pixel-to-pixel mapping task whose class intensity reaches the pixel level. Reducing the spatial resolution of feature maps can lead to spatial information loss
which is not conducive to the semantic segmentation of remote sensing images. To avoid such loss
the proposed model introduces atrous convolution into the residual convolutional neural network. Compared with ordinary convolution
atrous convolution uses the parameter
r
to control the receptive field of the convolution kernel during the calculation. The convolutional neural network with atrous convolution can expand the receptive field of the feature map while keeping the feature map size unchanged
thereby significantly improving the remote sensing image semantic segmentation performance of the proposed model. Objects in remote sensing images often demonstrate large-scale variations and complex texture features
both of which challenge the accurate encoding of multi-scale advanced features. To accurately extract multi-scale features in these images
the proposed model cascades each branch of aspatial pyramid structure based on a dense connection mechanism
which allows each branch to output highly dense receptive field information. In these mantic segmentation of remote sensing images
not only the high-level semantic features extracted by the convolutional neural network are required to correctly determine the category of each pixel; low-level texture features are also required to determine the edges of the target. Low-level texture features can benefit the reconstruction of object edges during semantic segmentation. Our proposed model uses a simple encoder to effectively use high-level semantic features and low-level texture features in a network. A decoder also uses skip connection to fuse cross-layer network information and to combine high-level semantic features with the underlying texture features. After fusing high- and low-level information
we use two 3×3 convolutions to integrate the information among channels and to recover spatial information. We eventually input the extracted feature map to a softmax classifier for pixel-level classification and obtain the remote sensing image semantic segmentation results.
Result
2
Full experiments are performed by using the ISPRS(International Society for Phtogrammetry and Remote Sensing) remote sensing dataset of the Vaihingen area. WE use intersection over union (IoU) and F
1
as our indicators for evaluating the segmentation performance of the proposed model. We also build and train our models based on the NVIDIA Tesla P100 platform and the Tensorflow deep learning framework. The complexity of tasks in the experiment increases at each stage. Experimental results show that the proposed model obtains mean IoU (MIoU) and F
1
values of 69.88% and 81.39% over six types of surface features
respectively
thereby demonstrating vast improvements compared with a residual convolutional network without atrous convolution. Our proposed method also outperforms SegNet
Res-shuffling-Net and SDFCN (symmetrical dense-shortcut fully convolutional network) in terms of mathematics and outperforms pix2pix in terms of visual effects
thereby cementing its validity. We then apply this model on the remote sensing image data of Potsdam area and obtain MIoU and F
1
values of 74.02% and 83.86%
respectively
thereby proving the robustness of our model.
Conclusion
2
We build an end-to-end deep learning model for the semantic segmentation of remote sensing images of high-resolution urban areas. By applying an improved spatial pyramid pooling network based on atrous convolution and dense connections
our proposed model effectively extracts multi-scale features from remote sensing images and fuse high-level semantic information and low-level texture information of the network
which in turn can improve the accuracy of the model in the remote sensing image segmentation of urban areas. Experimental results prove that the proposed model achieves an excellent performance in terms of mathematical and visual effects and has high application value in the semantic segmentation of high-resolution remote sensing images.
Audebert N, Le Saux B L and Lefèvre S. 2016. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 180-196[ DOI: 10.1007/978-3-319-54181-5_12 http://dx.doi.org/10.1007/978-3-319-54181-5_12 ]
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet:a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481-2495[DOI:10.1109/TPAMI.2016.2644615]
Blomley R and Weinmann M. 2017. Using multi-scale features for the 3D semantic labeling of airborne laser scanning data//ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Wuhan, China: ISPRS: 43-50[ DOI: 10.5194/isprs-annals-IV-2-W4-43-2017 http://dx.doi.org/10.5194/isprs-annals-IV-2-W4-43-2017 ]
Chen G Z, Zhang X D, Wang Q, Dai F, Gong Y F and Zhu K. 2018a. Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(5):1633-1644[DOI:10.1109/JSTARS.2018.2810320]
Chen K, Weinmann M, Gao X, Yan M, Hinz S, Jutzi B and Weinmann M. 2018b. Residual shuffling convolutional neural networks for deep semantic image segmentation using multi-modal data//ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Riva del Garda, Italy: ISPRS: 65-72[ DOI: 10.5194/isprs-annals-IV-2-65-2018 http://dx.doi.org/10.5194/isprs-annals-IV-2-65-2018 ]
Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2019-09-30] . https://arxiv.org/pdf/1706.05587.pdf https://arxiv.org/pdf/1706.05587.pdf
Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018c. Encoder-decoder with atrous separable convolution for semantic image segmentation[EB/OL].[2019-09-30] . https://arxiv.org/pdf/1802.02611v1.pdf https://arxiv.org/pdf/1802.02611v1.pdf
Chen T H, Zheng S Q and Yu J C. 2018. Remote sensing image segmentation based on improved DeepLab network. Measurement and Control Technology, 37(11):34-39
陈天华, 郑司群, 于峻川. 2018.采用改进DeepLab网络的遥感图像分割.测控技术, 37(11):34-39[DOI:10.19708/j.ckjs.2018.11.008]
Feng L Y. 2017. Research on Construction Land Information Extraction from High Resolution Images with Deep Learning Technology. Hangzhou: Zhejiang University
冯丽英. 2017.基于深度学习技术的高分辨率遥感影像建设用地信息提取研究.杭州: 浙江大学
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Isola P, Zhu Y J, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[ DOI: 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ]
Li X, Tang W L and Yang B. 2019. Semantic segmentation of high-resolution remote sensing image based on deep residual network. Journal of Applied Sciences-Electronics and Information Engineering, 37(2):282-290
李欣, 唐文莉, 杨博. 2019.利用深度残差网络的高分遥感影像语义分割.应用科学学报, 37(2):282-290[DOI:10.3969/j.issn.0255-8297.2019.02.013]
Liu Y S, Piramanayagam S, Monteiro S T and Saber E. Dense semantic labeling of very-high-resolution aerial imagery and LiDAR with fully-convolutional neural networks and higher-order CRFs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1561-1570[ DOI: 10.1109/CVPRW.2017.200 http://dx.doi.org/10.1109/CVPRW.2017.200 ]
Marmanis D, Wegner J D, Galliani S, Schindler K, Datcu M and Stilla U. 2016. Semantic segmentation of aerial images with an ensemble of CNNs//ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Prague, Czech Republic: ISPRS: 473-480[ DOI: 10.5194/isprs-annals-Ⅲ-3-473-2016 http://dx.doi.org/10.5194/isprs-annals-Ⅲ-3-473-2016 ]
Rottensteiner F, Sohn G, Jung J, Gerke M, Baillard C, Bénitez S and Breitkopf U. 2012. The ISPRS benchmark on urban object classification and 3D building reconstruction//ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Melbourne, Australia: ISPRS: 293-298[ DOI: 10.5194/isprsannals-I-3-293-2012 http://dx.doi.org/10.5194/isprsannals-I-3-293-2012 ]
Sherrah J. 2016. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery[EB/OL].[2019-09-30] . https://arxiv.org/pdf/1606.02585.pdf https://arxiv.org/pdf/1606.02585.pdf
Yang M K, Yu K, Zhang C, Li Z W and Yang K Y. 2018. DenseASPP for semantic segmentation in street scenes//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3684-3692[ DOI: 10.1109/CVPR.2018.00388 http://dx.doi.org/10.1109/CVPR.2018.00388 ]
Yu F and Koltun V. 2015. Multi-scale context aggregation by dilated convolutions[EB/OL].[2019-09-30] . https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf
相关作者
相关机构
京公网安备11010802024621