自适应感受野机制遥感图像分割模型
Remote sensing image segmentation model based on an adaptive receptive field mechanism
- 2021年26卷第2期 页码:464-474
收稿日期:2020-03-16,
修回日期:2020-05-27,
录用日期:2020-6-3,
纸质出版日期:2021-02-16
DOI: 10.11834/jig.200092
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2020-03-16,
修回日期:2020-05-27,
录用日期:2020-6-3,
纸质出版日期:2021-02-16
移动端阅览
目的
2
遥感图像中存在大小、形态不一的目标,增加了目标分割任务的困难性。感受野代表了特征图中每个像素对应输入图像的区域,若感受野与目标形状的契合度较高,则特征图中包含的目标特征更加完整,有利于分割。在现有的分割方法中,通常采用的是正方形的感受野,而遥感图像中目标形状多变,导致感受野无法较好地契合目标形状,在提取目标特征时会引入过多的无用特征,从而影响分割精度。为此,本文提出基于自适应感受野机制的遥感图像分割模型。
方法
2
在编码—解码网络结构的基础上,引入自适应感受野机制。首先在编码器上提取不同大小和宽高比的感受野特征,然后在特征融合时使用通道注意力模块自适应地获取通道权重,通过加权强化与目标形状契合度高的感受野的特征,弱化与目标形状契合度低的感受野的特征,在保留目标特征的同时减少背景特征的干扰,进而提升模型的分割精度。
结果
2
在Inria Aerial Image Labeling数据集与DeepGlobe Road Extraction数据集上进行实验并与相关方法比较,在两个数据集上的平均交并比分别为76.1%和61.9%,平均F1值分别为86.5%和76.5%。
结论
2
本文模型能够提取不同形状感受野的特征,并自适应地获取通道权重,使模型能提取更加完整的目标特征,从而提升目标分割效果。
Objective
2
Remote sensing image segmentation is a technique for segmenting the target of interest. In the field of deep learning
convolutional neural networks (CNNs) are typically used to extract image features and then classify each pixel of the image. Remote sensing image segmentation has a wide range of applications
including environmental monitoring
urban construction
and crop classification. It is highly significant in the extraction and analysis of image information. However
high-resolution remote sensing images have a large number of targets with different shapes and sizes
and thus
many difficulties are encountered in achieving image segmentation. A receptive field is an important attribute of CNNs
and the matching degree between the receptive field and target size is related to the completeness and robustness of the extracted target features. If the receptive field matches the target shape well
then the target features contained in the feature map will be complete; otherwise
the feature map will contain many useless features that will interfere with the segmentation task. In existing methods
the square receptive field is used to extract features. However
the shape of targets in remote sensing images are different
and thus
the square receptive field cannot fit the shape of the target well. If the mismatched receptive field is used to extract target features
then useless features will interfere with segmentation. To solve this problem
this study proposes a remote sensing image segmentation model (RSISM) based on an adaptive receptive field mechanism (ARFM)
referred to as RSISM-ARFM hereafter.
Method
2
RSISM-ARFM can extract receptive fields with different sizes and ratios while simultaneously channel weighting the features of different receptive fields during feature fusion. In this manner
the receptive field features that match the target shape can be strengthened; otherwise
they are weakened
reducing the interference of useless features while retaining target features. RSISM-ARFM uses an encoder-decoder network as its backbone network. This backbone network consists of an encoder and a decoder. The encoder is used to extract basic convolution features while reducing the size of the feature map to extract deep semantic information. The extracted features in the shallow layer of the encoder contain rich detailed information
such as target location and edge. Meanwhile
the extracted features in the deep layer of the encoder contain semantic information that can help the model identify the target better. To fuse the two parts of information
the decoder concatenates feature maps at different layers to improve the feature extraction capability of the model. On the basis of the backbone network
this study introduces an ARFM. First
the features of different receptive fields are extracted from the encoder. Then
the channel attention module is used to calculate the dependency relationship among the channels of the feature map to generate channel weights. Finally
the feature maps of different receptive fields are weighted. After the aforementioned operations
the model can adaptively adjust the relationship among different receptive fields and select appropriate receptive fields to extract the features of the target.
Result
2
In this study
we conducted ablation and comparative experiments on the Inria Image Labeling and DeepGlobe Road Extraction datasets. Given the large size of the original images in the datasets
they cannot be used directly in the experiments. Therefore
the training and test sets were cropped to 256×256 pixel images during the experiments. The model was trained first using the training set and then tested using the test set. To verify the effectiveness of RSISM-ARFM
we conducted ablation and comparative experiments using the two aforementioned datasets. Simultaneously
we used different evaluation indexes in the experiments to evaluate the segmentation performance of the model from multiple perspectives. Experimental results show that the proposed method can effectively improve the segmentation accuracy of targets with different shapes. The segmentation result of RSISM-ARFM is the closest to the labeled image
and the details of the targets are the clearest. The intersection over union on the two datasets reaches 76.1% and 61.9%
and the average F1 score reaches 86.5% and 76.5%
respectively. Segmentation performance is better than that of the comparison model.
Conclusion
2
The model proposed in this study adds an ARFM based on an encoder-decoder network. It extracts the features of the receptive fields of different target shapes and sizes and then uses the channel attention module to perform channel weighting adaptively on the features during the feature fusion process. Accordingly
the model extracts complete target features and reduces the introduction of useless features
improving segmentation accuracy.
Bischke B, Helber P, Folz, J, Borth D and Dengel A. 2019. Multi-task learning for segmentation of building footprints with deep neural networks//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 1480-1484[ DOI:10.1109/ICIP.2019.8803050 http://dx.doi.org/10.1109/ICIP.2019.8803050 ]
Chaurasia A and Culurciello E. 2017. Linknet: exploiting encoder representations for efficient semantic segmentation//Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP). St.Petersburg, USA: IEEE: 1-4
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848[DOI: 10.1109/TPAMI.2017.2699184]
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807[ DOI:10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia D and Raskar R. 2017. DeepGlobe 2018: a challenge to parse the earth through satellite images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 172-181[ DOI:10.1109/CVPRW.2018.00031 http://dx.doi.org/10.1109/CVPRW.2018.00031 ]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedingsof 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[ DOI:10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition. IEEE Coference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI:10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]
Ketkar N. 2017. Introduction to PyTorch//Ketkar N, ed. Deep Learning with Python. Berkeley: Apress, 195-208[ DOI:10.1007/978-1-4842-2766-4_12 http://dx.doi.org/10.1007/978-1-4842-2766-4_12 ]
Khalel A and El-Saban M. 2018. Automatic pixelwise object labeling for aerial imagery using stacked U-Nets[EB/OL] .[2020-03-10]. https://arxiv.org/abs/1803.04953 https://arxiv.org/abs/1803.04953
Li X, Wang W H, Hu X L and Yang J. 2019. Selective kernel networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 510-519[ DOI:10.1109/CVPR.2019.00060 http://dx.doi.org/10.1109/CVPR.2019.00060 ]
Liu H and Wang X L. 2020. Remote sensing image segmentation model based on attention mechanism. Laser and Optoelectronics Progress, 57(4): #041015
刘航, 汪西莉. 2020.基于注意力机制的遥感图像分割模型.激光与光电子学进展, 57(4): #041015)[DOI: 10.3788/LOP57.041015]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE conference on computer vision and pattern recognition. Boston, USA: IEEE: 3431-3440[ DOI:10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Lu X Y, Zhong Y F and Zhao J. 2019. Multi-scale enhanced deep network for road detection//Proceedings of IGARSS 2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama, Japan: IEEE: 3947-3950[ DOI:10.1109/IGARSS.2019.8899115 http://dx.doi.org/10.1109/IGARSS.2019.8899115 ]
Maggiori E, Tarabalka Y, Charpiat G and Alliez P. 2017. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark//IEEE International Geoscience and Remote Sensing Symposium. Fort Worth, USA: IEEE: 3226-3229[ DOI:10.1109/IGARSS.2017.8127684 http://dx.doi.org/10.1109/IGARSS.2017.8127684 ]
Mou L C and Zhu X X. 2018. RiFCN: recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images[EB/OL].[2020-03-10] . https://arxiv.org/pdf/1805.02091.pdf https://arxiv.org/pdf/1805.02091.pdf
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI:10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2818-2826[ DOI:10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ]
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458[ DOI:10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]
Wang P Q, Chen P F, Yuan Y, Liu D, Huang Z H, Hou X D and Cottrell G. 2018. Understanding convolution for semantic segmentation//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, USA: IEEE: 1451-1460[ DOI:10.1109/WACV.2018.00163 http://dx.doi.org/10.1109/WACV.2018.00163 ]
相关作者
相关机构