含多级通道注意力机制的CGAN遥感图像建筑物分割

余帅; 汪西莉

发布时间： 2021-03-19
摘要点击次数： 2236
全文下载次数： 967
DOI: 10.11834/jig.200059
2021 | Volume 26 | Number 3

含多级通道注意力机制的CGAN遥感图像建筑物分割

余帅, 汪西莉(陕西师范大学计算机科学学院, 西安 710119)

摘要

目的遥感图像建筑物分割是图像处理中的一项重要应用，卷积神经网络在遥感图像建筑物分割中展现出优秀性能，但仍存在建筑物漏分、错分，尤其是小建筑物漏分以及建筑物边缘不平滑等问题。针对上述问题，本文提出一种含多级通道注意力机制的条件生成对抗网络（conditional generative adversarial network，CGAN）模型Ra-CGAN，用于分割遥感图像建筑物。方法首先构建一个具有多级通道注意力机制的生成模型G，通过融合包含注意力机制的深层语义与浅层细节信息，使网络提取丰富的上下文信息，更好地应对建筑物的尺度变化，改善小建筑物漏分问题。其次，构建一个判别网络D，通过矫正真实标签图与生成模型生成的分割图之间的差异来改善分割结果。最后，通过带有条件约束的G和D之间的对抗训练，学习高阶数据分布特征，使建筑物空间连续性更强，提升分割结果的边界准确性及平滑性。结果在WHU Building Dataset和Satellite Dataset II数据集上进行实验，并与优秀方法对比。在WHU数据集中，分割性能相对于未加入通道注意力机制和对抗训练的模型明显提高，且在复杂建筑物的空间连续性、小建筑物完整性以及建筑物边缘准确和平滑性上表现更好；相比性能第2的模型，交并比（intersection over union，IOU）值提高了1.1%，F₁-score提高了1.1%。在Satellite数据集中，相比其他模型，准确率更高，尤其是在数据样本不充足的条件下，得益于生成对抗训练，分割效果得到了大幅提升；相比性能第2的模型，IOU值提高了1.7%，F₁-score提高了1.6%。结论本文提出的含多级通道注意力机制的CGAN遥感图像建筑物分割模型，综合了多级通道注意力机制生成模型与条件生成对抗网络的优点，在不同数据集上均获得了更精确的遥感图像建筑物分割结果。

关键词

深度卷积神经网络遥感图像分割条件生成对抗网络(CGAN) 注意力机制多尺度特征融合

Remote sensing building segmentation by CGAN with multilevel channel attention mechanism

Yu Shuai, Wang Xili(School of Computer Science, Shaanxi Normal University, Xi'an 710119, China)

Abstract

Objective Remote sensing building object segmentation is one of the important applications in image processing, which plays a vital role in smart city planning and urban change detection. However, building objects in remote sensing images have many complex characteristics, such as variable sizes, dense distributions, diverse topological shapes, complex backgrounds, and presence of occlusions and shadows. Traditional building segmentation algorithms are mainly based on manually designed features such as shapes, edges, and shadow features. These features are shallow features of the building target and cannot well express high-level semantic information, resulting in low recognition accuracy. By contrast, deep convolutional networks show excellent performance in pixel-level classification of natural images. Various fully convolutional network based image segmentation models have been continuously proposed. Most of these models use deconvolution or bilinear interpolation after feature extraction. Feature upsampling and pixel-by-pixel classification are used to segment the input image. The deep features of the building are extracted using highly nonlinear mapping and a large amount of data training, which overcomes the shortcomings of traditional algorithms. However, upsampling cannot completely compensate the information loss caused by repeated convolution and pooling operations in the deep convolutional network model. Therefore, the prediction results are relatively rough, such as small target misclassification, inaccurate boundaries, and other issues. In the field of remote sensing, public data sets are few. Training excellent deep convolutional networks is difficult, and the robustness of the network needs to be further improved. Aiming at the above problems, this paper proposes a conditional generative adversarial network (Ra-CGAN) with multilevel channel attention mechanism to segment remote sensing building objects. Method A generative model with a multilevel channel attention mechanism is first built. The model is based on a coding and decoding structure that solves small target misses by fusing deep semantics and shallow details with attention. Second, a discriminative network is built and used to distinguish whether the input comes from the real label map or the segmentation map generated by the model. The segmentation result (accuracy and smoothness) is improved by correcting the difference between the two maps. The downsampling method without pooling is used in the discriminator to enhance the propagation of the gradient. Finally, the generated model and the discriminant model are alternately confronted for training through the constraint of the conditional variable of the labelled image. Learning the higher-order data distribution characteristics results in more continuity for the target space. The loss function uses a hybrid loss function, which comes from the cross-entropy loss function brought by the generated map and the real label map in the generation mode. The discriminator predicts the generated image as the loss value brought by the real label image. Experiments are performed on the WHU Building Dataset and Satellite Dataset II datasets. The first dataset has a dense building with many types and accurate labels, and can provide comprehensive, representative evaluation capabilities for the model. Another dataset with a higher segmentation difficulty is used to verify the robustness and scalability of the model. The lighting information and background information of the building are more complex than those of the first dataset. The experiment uses the PyTorch deep learning framework. The size of original image and the label image are unified to 512×512 pixels for training, the learning rate of Adam is set to 0.000 2, the momentum parameter is 0.5, the batch-size is 12, and the epoch is 200 times. Acceleration is performed using NVIDIA GTX TITAN Xp. Evaluation indicators include intersection over union (IOU), precision, recall, and F₁-score. Result Experiments are performed on the WHU Building Dataset and Satellite Dataset II datasets, and the methods are compared with the latest literature. Experimental results show that in the WHU dataset, the segmentation performance of the Ra-CGAN model is substantially improved compared with models without attention mechanism and adversarial training. Space continuity and integrity of the complex building and small building, and smoothness of building edges are considerably improved. Compared with U-Net, IOU value is increased by 3.75%, and F₁-score is increased by 2.52%. Compared with the second-performance model, IOU value is increased by 1.1%, and F₁-score is increased by 1.1%. In the Satellite Dataset II, Ra-CGAN obtains more ideal results in terms of target integrity and smoothness than other models, especially in the case of insufficient data samples. Compared with U-Net, IOU value is increased by 7.26%, and F₁-score is increased by 6.68%. Compared with the second-placed model, IOU value is increased by 1.7% and F₁-score is increased by 1.6%. Conclusion A CGAN remote sensing building object segmentation model with multilevel channel attention mechanism, which combines the advantages of multilevel channel attention mechanism generation model and conditional generative adversarial networks, is proposed. Experimental results show that our model is superior to several state-of-the-art segmentation methods. Much more accurate remote sensing building object segmentation results are obtained on different datasets, proving that the model exhibits better robustness and scalability.

Keywords

deep convolutional neural network remote sensing image segmentation conditional generative adversarial network (CGAN) attention mechanism multi-scale feature fusion

在线采编平台

论文出版

年度会议

下载中心

年度信息