刘航,汪西莉(陕西师范大学计算机科学学院, 西安 710119)
目的 遥感图像中存在大小、形态不一的目标，增加了目标分割任务的困难性。感受野代表了特征图中每个像素对应输入图像的区域，若感受野与目标形状的契合度较高，则特征图中包含的目标特征更加完整，有利于分割。在现有的分割方法中，通常采用的是正方形的感受野，而遥感图像中目标形状多变，导致感受野无法较好地契合目标形状，在提取目标特征时会引入过多的无用特征，从而影响分割精度。为此，本文提出基于自适应感受野机制的遥感图像分割模型。方法 在编码—解码网络结构的基础上，引入自适应感受野机制。首先在编码器上提取不同大小和宽高比的感受野特征，然后在特征融合时使用通道注意力模块自适应地获取通道权重，通过加权强化与目标形状契合度高的感受野的特征，弱化与目标形状契合度低的感受野的特征，在保留目标特征的同时减少背景特征的干扰，进而提升模型的分割精度。结果 在Inria Aerial Image Labeling数据集与DeepGlobe Road Extraction数据集上进行实验并与相关方法比较，在两个数据集上的平均交并比分别为76.1%和61.9%，平均F1值分别为86.5%和76.5%。结论 本文模型能够提取不同形状感受野的特征，并自适应地获取通道权重，使模型能提取更加完整的目标特征，从而提升目标分割效果。
Remote sensing image segmentation model based on an adaptive receptive field mechanism
Liu Hang,Wang Xili(School of Computer Science, Shaanxi Normal University, Xi'an 710119, China)
Objective Remote sensing image segmentation is a technique for segmenting the target of interest. In the field of deep learning, convolutional neural networks (CNNs) are typically used to extract image features and then classify each pixel of the image. Remote sensing image segmentation has a wide range of applications, including environmental monitoring, urban construction, and crop classification. It is highly significant in the extraction and analysis of image information. However, high-resolution remote sensing images have a large number of targets with different shapes and sizes, and thus, many difficulties are encountered in achieving image segmentation. A receptive field is an important attribute of CNNs, and the matching degree between the receptive field and target size is related to the completeness and robustness of the extracted target features. If the receptive field matches the target shape well, then the target features contained in the feature map will be complete; otherwise, the feature map will contain many useless features that will interfere with the segmentation task. In existing methods, the square receptive field is used to extract features. However, the shape of targets in remote sensing images are different, and thus, the square receptive field cannot fit the shape of the target well. If the mismatched receptive field is used to extract target features, then useless features will interfere with segmentation. To solve this problem, this study proposes a remote sensing image segmentation model (RSISM) based on an adaptive receptive field mechanism (ARFM), referred to as RSISM-ARFM hereafter. Method RSISM-ARFM can extract receptive fields with different sizes and ratios while simultaneously channel weighting the features of different receptive fields during feature fusion. In this manner, the receptive field features that match the target shape can be strengthened; otherwise, they are weakened, reducing the interference of useless features while retaining target features. RSISM-ARFM uses an encoder-decoder network as its backbone network. This backbone network consists of an encoder and a decoder. The encoder is used to extract basic convolution features while reducing the size of the feature map to extract deep semantic information. The extracted features in the shallow layer of the encoder contain rich detailed information, such as target location and edge. Meanwhile, the extracted features in the deep layer of the encoder contain semantic information that can help the model identify the target better. To fuse the two parts of information, the decoder concatenates feature maps at different layers to improve the feature extraction capability of the model. On the basis of the backbone network, this study introduces an ARFM. First, the features of different receptive fields are extracted from the encoder. Then, the channel attention module is used to calculate the dependency relationship among the channels of the feature map to generate channel weights. Finally, the feature maps of different receptive fields are weighted. After the aforementioned operations, the model can adaptively adjust the relationship among different receptive fields and select appropriate receptive fields to extract the features of the target. Result In this study, we conducted ablation and comparative experiments on the Inria Image Labeling and DeepGlobe Road Extraction datasets. Given the large size of the original images in the datasets, they cannot be used directly in the experiments. Therefore, the training and test sets were cropped to 256×256 pixel images during the experiments. The model was trained first using the training set and then tested using the test set. To verify the effectiveness of RSISM-ARFM, we conducted ablation and comparative experiments using the two aforementioned datasets. Simultaneously, we used different evaluation indexes in the experiments to evaluate the segmentation performance of the model from multiple perspectives. Experimental results show that the proposed method can effectively improve the segmentation accuracy of targets with different shapes. The segmentation result of RSISM-ARFM is the closest to the labeled image, and the details of the targets are the clearest. The intersection over union on the two datasets reaches 76.1% and 61.9%, and the average F1 score reaches 86.5% and 76.5%, respectively. Segmentation performance is better than that of the comparison model. Conclusion The model proposed in this study adds an ARFM based on an encoder-decoder network. It extracts the features of the receptive fields of different target shapes and sizes and then uses the channel attention module to perform channel weighting adaptively on the features during the feature fusion process. Accordingly, the model extracts complete target features and reduces the introduction of useless features, improving segmentation accuracy.