自适应感受野机制遥感图像分割模型

刘航; 汪西莉

doi:10.11834/jig.200092

遥感图像处理 | 浏览量 : 0 下载量: 14 CSCD: 4

PDF
导出
分享
收藏
专辑

自适应感受野机制遥感图像分割模型
Remote sensing image segmentation model based on an adaptive receptive field mechanism
2021年26卷第2期页码：464-474
收稿日期：2020-03-16，

修回日期：2020-05-27，

录用日期：2020-6-3，

纸质出版日期：2021-02-16
DOI： 10.11834/jig.200092
稿件说明：

移动端阅览

刘航, 汪西莉. 自适应感受野机制遥感图像分割模型[J]. 中国图象图形学报, 2021,26(2):464-474. DOI： 10.11834/jig.200092.

Hang Liu, Xili Wang. Remote sensing image segmentation model based on an adaptive receptive field mechanism[J]. Journal of image and graphics, 2021, 26(2): 464-474. DOI： 10.11834/jig.200092.

摘要

目的

遥感图像中存在大小、形态不一的目标，增加了目标分割任务的困难性。感受野代表了特征图中每个像素对应输入图像的区域，若感受野与目标形状的契合度较高，则特征图中包含的目标特征更加完整，有利于分割。在现有的分割方法中，通常采用的是正方形的感受野，而遥感图像中目标形状多变，导致感受野无法较好地契合目标形状，在提取目标特征时会引入过多的无用特征，从而影响分割精度。为此，本文提出基于自适应感受野机制的遥感图像分割模型。

方法

在编码—解码网络结构的基础上，引入自适应感受野机制。首先在编码器上提取不同大小和宽高比的感受野特征，然后在特征融合时使用通道注意力模块自适应地获取通道权重，通过加权强化与目标形状契合度高的感受野的特征，弱化与目标形状契合度低的感受野的特征，在保留目标特征的同时减少背景特征的干扰，进而提升模型的分割精度。

结果

在Inria Aerial Image Labeling数据集与DeepGlobe Road Extraction数据集上进行实验并与相关方法比较，在两个数据集上的平均交并比分别为76.1%和61.9%，平均F1值分别为86.5%和76.5%。

结论

本文模型能够提取不同形状感受野的特征，并自适应地获取通道权重，使模型能提取更加完整的目标特征，从而提升目标分割效果。

Abstract

Objective

Remote sensing image segmentation is a technique for segmenting the target of interest. In the field of deep learning

convolutional neural networks (CNNs) are typically used to extract image features and then classify each pixel of the image. Remote sensing image segmentation has a wide range of applications

including environmental monitoring

urban construction

and crop classification. It is highly significant in the extraction and analysis of image information. However

high-resolution remote sensing images have a large number of targets with different shapes and sizes

and thus

many difficulties are encountered in achieving image segmentation. A receptive field is an important attribute of CNNs

and the matching degree between the receptive field and target size is related to the completeness and robustness of the extracted target features. If the receptive field matches the target shape well

then the target features contained in the feature map will be complete; otherwise

the feature map will contain many useless features that will interfere with the segmentation task. In existing methods

the square receptive field is used to extract features. However

the shape of targets in remote sensing images are different

and thus

the square receptive field cannot fit the shape of the target well. If the mismatched receptive field is used to extract target features

then useless features will interfere with segmentation. To solve this problem

this study proposes a remote sensing image segmentation model (RSISM) based on an adaptive receptive field mechanism (ARFM)

referred to as RSISM-ARFM hereafter.

Method

RSISM-ARFM can extract receptive fields with different sizes and ratios while simultaneously channel weighting the features of different receptive fields during feature fusion. In this manner

the receptive field features that match the target shape can be strengthened; otherwise

they are weakened

reducing the interference of useless features while retaining target features. RSISM-ARFM uses an encoder-decoder network as its backbone network. This backbone network consists of an encoder and a decoder. The encoder is used to extract basic convolution features while reducing the size of the feature map to extract deep semantic information. The extracted features in the shallow layer of the encoder contain rich detailed information

such as target location and edge. Meanwhile

the extracted features in the deep layer of the encoder contain semantic information that can help the model identify the target better. To fuse the two parts of information

the decoder concatenates feature maps at different layers to improve the feature extraction capability of the model. On the basis of the backbone network

this study introduces an ARFM. First

the features of different receptive fields are extracted from the encoder. Then

the channel attention module is used to calculate the dependency relationship among the channels of the feature map to generate channel weights. Finally

the feature maps of different receptive fields are weighted. After the aforementioned operations

the model can adaptively adjust the relationship among different receptive fields and select appropriate receptive fields to extract the features of the target.

Result

In this study

we conducted ablation and comparative experiments on the Inria Image Labeling and DeepGlobe Road Extraction datasets. Given the large size of the original images in the datasets

they cannot be used directly in the experiments. Therefore

the training and test sets were cropped to 256×256 pixel images during the experiments. The model was trained first using the training set and then tested using the test set. To verify the effectiveness of RSISM-ARFM

we conducted ablation and comparative experiments using the two aforementioned datasets. Simultaneously

we used different evaluation indexes in the experiments to evaluate the segmentation performance of the model from multiple perspectives. Experimental results show that the proposed method can effectively improve the segmentation accuracy of targets with different shapes. The segmentation result of RSISM-ARFM is the closest to the labeled image

and the details of the targets are the clearest. The intersection over union on the two datasets reaches 76.1% and 61.9%

and the average F1 score reaches 86.5% and 76.5%

respectively. Segmentation performance is better than that of the comparison model.

Conclusion

The model proposed in this study adds an ARFM based on an encoder-decoder network. It extracts the features of the receptive fields of different target shapes and sizes and then uses the channel attention module to perform channel weighting adaptively on the features during the feature fusion process. Accordingly

the model extracts complete target features and reduces the introduction of useless features

improving segmentation accuracy.

关键词

Keywords

references

Bischke B, Helber P, Folz, J, Borth D and Dengel A. 2019. Multi-task learning for segmentation of building footprints with deep neural networks//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 1480-1484[ DOI:10.1109/ICIP.2019.8803050 http://dx.doi.org/10.1109/ICIP.2019.8803050 ]

Chaurasia A and Culurciello E. 2017. Linknet: exploiting encoder representations for efficient semantic segmentation//Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP). St.Petersburg, USA: IEEE: 1-4

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848[DOI: 10.1109/TPAMI.2017.2699184]

Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807[ DOI:10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]

Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia D and Raskar R. 2017. DeepGlobe 2018: a challenge to parse the earth through satellite images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 172-181[ DOI:10.1109/CVPRW.2018.00031 http://dx.doi.org/10.1109/CVPRW.2018.00031 ]

He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedingsof 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[ DOI:10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition. IEEE Coference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI:10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]

Ketkar N. 2017. Introduction to PyTorch//Ketkar N, ed. Deep Learning with Python. Berkeley: Apress, 195-208[ DOI:10.1007/978-1-4842-2766-4_12 http://dx.doi.org/10.1007/978-1-4842-2766-4_12 ]

Khalel A and El-Saban M. 2018. Automatic pixelwise object labeling for aerial imagery using stacked U-Nets[EB/OL] .[2020-03-10]. https://arxiv.org/abs/1803.04953 https://arxiv.org/abs/1803.04953

Li X, Wang W H, Hu X L and Yang J. 2019. Selective kernel networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 510-519[ DOI:10.1109/CVPR.2019.00060 http://dx.doi.org/10.1109/CVPR.2019.00060 ]

Liu H and Wang X L. 2020. Remote sensing image segmentation model based on attention mechanism. Laser and Optoelectronics Progress, 57(4): #041015

刘航, 汪西莉. 2020.基于注意力机制的遥感图像分割模型.激光与光电子学进展, 57(4): #041015)[DOI: 10.3788/LOP57.041015]

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE conference on computer vision and pattern recognition. Boston, USA: IEEE: 3431-3440[ DOI:10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Lu X Y, Zhong Y F and Zhao J. 2019. Multi-scale enhanced deep network for road detection//Proceedings of IGARSS 2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama, Japan: IEEE: 3947-3950[ DOI:10.1109/IGARSS.2019.8899115 http://dx.doi.org/10.1109/IGARSS.2019.8899115 ]

Maggiori E, Tarabalka Y, Charpiat G and Alliez P. 2017. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark//IEEE International Geoscience and Remote Sensing Symposium. Fort Worth, USA: IEEE: 3226-3229[ DOI:10.1109/IGARSS.2017.8127684 http://dx.doi.org/10.1109/IGARSS.2017.8127684 ]

Mou L C and Zhu X X. 2018. RiFCN: recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images[EB/OL].[2020-03-10] . https://arxiv.org/pdf/1805.02091.pdf https://arxiv.org/pdf/1805.02091.pdf

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI:10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2818-2826[ DOI:10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ]

Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458[ DOI:10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]

Wang P Q, Chen P F, Yuan Y, Liu D, Huang Z H, Hou X D and Cottrell G. 2018. Understanding convolution for semantic segmentation//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, USA: IEEE: 1451-1460[ DOI:10.1109/WACV.2018.00163 http://dx.doi.org/10.1109/WACV.2018.00163 ]

文章被引用时，请邮件提醒。

提交

傅里叶变换通道注意力网络的胆管癌高光谱图像分割

用于遥感场景分类的全局—局部特征耦合网络

基于监督注意力的遥感图像定向目标检测

分割一切模型SAM的潜力与展望：综述

结合潜在扩散模型和U型网络的HIFU治疗目标区域提取