高分辨率遥感影像的边缘损失增强地物分割
Segmentation of high-resolution remote sensing image by collaborating with edge loss enhancement
- 2021年26卷第3期 页码:674-685
收稿:2019-11-15,
修回:2020-6-9,
录用:2020-6-15,
纸质出版:2021-03-16
DOI: 10.11834/jig.190601
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-11-15,
修回:2020-6-9,
录用:2020-6-15,
纸质出版:2021-03-16
移动端阅览
目的
2
针对高分辨率遥感影像语义分割中普遍存在的分割精度不高、目标边界模糊等问题,提出一种综合利用边界信息和网络多尺度特征的边缘损失增强语义分割方法。
方法
2
对单幅高分辨率遥感影像,首先通过对VGG-16(visual geometry group 16-layer net)网络引入侧边输出结构,提取到图像丰富的特征细节;然后使用深度监督的短连接结构将从深层到浅层的侧边输出组合起来,实现多层次和多尺度特征融合;最后添加边缘损失增强结构,用以获得较为清晰的目标边界,提高分割结果的准确性和完整性。
结果
2
为了验证所提方法的有效性,选取中国北方种植大棚遥感影像和Google Earth上的光伏板组件遥感影像进行人工标注,并制作实验数据集。在这两个数据集上,将所提方法与几种常用的语义分割方法进行对比实验。实验结果表明,所提方法的精度在召回率为00.9之间时均在0.8以上,在2个数据集上的平均绝对误差分别为0.079 1和0.036 2。同时,通过消融实验分析了各个功能模块对最终结果的贡献。
结论
2
与当前先进方法相比,本文提出的边缘损失增强地物分割方法能够更加精确地从遥感影像的复杂背景中提取目标区域,使分割时提取到的目标拥有更加清晰的边缘。
Objective
2
Semantic analysis of remote sensing (RS) images has always been an important research topic in computer vision community. It has been widely used in related fields such as military surveillance
mapping navigation
and urban planning. Researchers can easily obtain various informative features for the following decision making by exploring and analyzing the semantic information of RS images. However
the richer
finer visual information in high-resolution RS images also puts forward higher requirements for image segmentation techniques. Traditional segmentation methods usually employ low-level visual features such as grayscale
color
spatial texture
and geometric shape to divide an image into several disjoint regions. Generally
such features are called hand-crafted ones
which are empirically defined and may be less semantically meaningful. Compared with traditional segmentation methods
semantic segmentation approaches based on deep convolutional neural networks (CNNs) are capable of learning hierarchical visual features for representing images in different semantic levels. Typical CNN-based semantic segmentation approaches mainly focus on mitigating semantic ambiguity via providing rich information. However
RS images have higher background complexity than images of nature scene. For example
they usually contain many types of geometric objects and cover massive redundant background areas. Simply employing a certain type of feature or even CNN-based ones may not be sufficient in such case. Taking single-category object extraction task in RS images for example
on the one hand
negative objects may have similar visual presentations with the expected target. These redundant
noisy semantic information may confuse the network and finally decrease the segmentation performance. On the other hand
the CNN-based feature is good at encoding the context information rather than the fine details of an image
making the CNN-based models have difficulty obtaining the precise prediction of object boundaries. Therefore
aiming at these problems in high-resolution RS image segmentation
this paper proposes an edge loss enhanced network for semantic segmentation that comprehensively utilizes the boundary information and hierarchical deep features.
Method
2
The backbone of the proposed model is a fully convolutional network that is abbreviated from a visual geometry group 16-layer net (VGG-16) structure by removing all fully connected layers and its fifth pooling layer. A side output structure is introduced for each convolutional layer of our backbone network to extract all possible rich
informative features from the input image. The side output structure starts with a (1×1
1) convolutional layer (a specific convolutional layer is denoted as (
n
×
n
c
) where
n
and
c
are the size and number of kernels
respectively)
followed by an element-wise summation layer for accumulating features in each scale. Then
a (1×1
1) convolutional layer is used to concentrate hybrid features. The side output structure makes full use of the features of each convolutional layer of our backbone and helps the network capture the fine details of the image. The side-output features are further gradually aggregated from the deep layers to shallow layers by a deep-supervised short connection structure to enhance the connections between features crossing scales. To this end
each side output feature is first encoded by a residual convolution unit then introduced to another one of a nearby shallow stage with necessary upsampling. The short connection structure enables a multilevel
multiscale fusion during feature encoding and is proven effective in the experiment. Finally
for each fused side output feature
a (3×3
128) convolutional layer is first used to unify its number of feature channels then send it to two paralleled branches
namely
an edge loss enhancement branch and an ordinary segmentation branch. In each edge loss enhancement branch
a Laplace operator coupled with a residual convolution unit is adopted to obtain the target boundary. The detected boundary is supervised by the ground truth that is generated by directly computing the gradient of existing semantic annotation of training samples. It does not require additional manual work for edge labeling. Experimental results show that the edge loss enhancement branch helps refine the target boundary as well as maintain the integrity of the target region.
Result
2
First
two datasets with human annotations that include the RS images of the planted greenhouses in the north of China and the photovoltaic panels collected by Google Earth are organized to evaluate the effectiveness of the proposed method. Then
visual and numerical comparisons are conducted between the proposed method and several popular semantic segmentation methods. In addition
an ablation study is included to illustrate the contribution of essential components in the proposed architecture. The experimental results show that our method outperforms other competing approaches on both datasets in the comparisons of precision-recall curves and mean absolute error (MAE). The precision achieved by our method is constantly above 0.8 when recall rate in the range of 0 to 0.9. The MAE achieved by our method is 0.079 1/0.036 2 which is the best of all evaluation results. In addition
the ablation study clearly illustrates the effectiveness of each individual functional block. First
the baseline of the proposed architecture obtains a poor result with MAE of 0.204 4 on the northern greenhouse dataset. Then
the residual convolutional units help reduce MAE by 31%
and the value further drops to 0.084 8 when the short connection structure is added to fuse the multiscale features of the network. Finally
the edge loss enhancement structure helps successfully lower MAE to 0.079 1
which is decreased by 61% compared with the baseline model. The results indicate that all components are necessary to obtain a good feature segmentation result.
Conclusion
2
In summary
compared with the competing methods
the proposed method is capable of extracting the target region more accurately from the complex background of RS images with a clearer target boundary.
Arnab A, Jayasumana S, Zheng S and Torr P H S. 2016. Higher order conditional random fields in deep neural networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 524-540[ DOI: 10.1007/978-3-319-46475-6_33 http://dx.doi.org/10.1007/978-3-319-46475-6_33 ]
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495[DOI:10.1109/TPAMI.2016.2644615]
Chen L C, Barron J T, Papandreou G, Murphy K and Yuille A L. 2016. Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4545-4554[ DOI: 10.1109/CVPR.2016.492 http://dx.doi.org/10.1109/CVPR.2016.492 ]
Chen L C, Zhu U K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851[ DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49 ]
Cheng G and Han J W. 2016. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117: 11-28[DOI:10.1016/j.isprsjprs.2016.03.014]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]
Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3146-3154[ DOI: 10.1109/CVPR.2019.00326 http://dx.doi.org/10.1109/CVPR.2019.00326 ]
Glorot X, Bordes A and Bengio Y. 2011. Deep sparse rectifier neural networks//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA: 315-323
Hou Q B, Cheng M M, Hu X W, Tu Z W and Torr P H S. 2017. Deeply supervised salient object detection with short connections//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5300-5309[ DOI: 10.1109/CVPR.2017.563 http://dx.doi.org/10.1109/CVPR.2017.563 ]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI: 10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]
LadickýL, Russell C, Kohli P and Torr P H S. 2009. Associative hierarchical CRFs for object class image segmentation//Proceedings of 2009 IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 739-746[ DOI: 10.1109/ICCV.2009.5459248 http://dx.doi.org/10.1109/ICCV.2009.5459248 ]
Li X, Tang W L and Yang B. 2019. Semantic segmentation of high-resolution remote sensing image based on deep residual network. Journal of Applied Sciences, 37(2): 282-290
李欣, 唐文莉, 杨博. 2019. 利用深度残差网络的高分遥感影像语义分割. 应用科学学报, 37(2): 282-290[DOI:10.3969/j.issn.0255-8297.2019.02.013]
Lin G S, Milan A, Shen C H and Reid I. 2017a. RefineNet: multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5168-5177[ DOI: 10.1109/CVPR.2017.549 http://dx.doi.org/10.1109/CVPR.2017.549 ]
Lin T Y, Dollár P, Girshick R B, He K M, Hariharan B and Belongie S J. 2017b. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[ DOI: 10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ]
Liu J J, Hou Q B, Cheng M M, Feng J S and Jiang J M. 2019. A simple pooling-based design for real-time salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3912-3921[ DOI: 10.1109/CVPR.2019.00404 http://dx.doi.org/10.1109/CVPR.2019.00404 ]
Liu Y, Cheng M M, Hu X W, Wang K and Bai X. 2017. Richer convolutional features for edge detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5872-5881[ DOI: 10.1109/CVPR.2017.622 http://dx.doi.org/10.1109/CVPR.2017.622 ]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Marmanis D, Schindler K, Wegner J D, Galliani S, Datcu M and Stilla U. 2018. Classification with an edge: improving semantic image segmentation with boundary detection. ISPRS Journal of Photogrammetry and Remote Sensing, 135: 158-172[DOI:10.1016/j.isprsjprs.2017.11.009]
Mostajabi M, Yadollahpour P and Shakhnarovich G. 2015. Feedforward semantic segmentation with zoom-out features//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3376-3385[ DOI: 10.1109/CVPR.2015.7298959 http://dx.doi.org/10.1109/CVPR.2015.7298959 ]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-11-10] . https://arxiv.org/pdf/1409.1556v1.pdf https://arxiv.org/pdf/1409.1556v1.pdf
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 6000-6010
Xie S N and Tu Z W. 2015. Holistically-nested edge detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1395-1403[ DOI: 10.1109/ICCV.2015.164 http://dx.doi.org/10.1109/ICCV.2015.164 ]
相关作者
相关机构
京公网安备11010802024621