含多级通道注意力机制的CGAN遥感图像建筑物分割
Remote sensing building segmentation by CGAN with multilevel channel attention mechanism
- 2021年26卷第3期 页码:686-699
收稿:2020-03-01,
修回:2020-7-15,
录用:2020-7-22,
纸质出版:2021-03-16
DOI: 10.11834/jig.200059
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-03-01,
修回:2020-7-15,
录用:2020-7-22,
纸质出版:2021-03-16
移动端阅览
目的
2
遥感图像建筑物分割是图像处理中的一项重要应用,卷积神经网络在遥感图像建筑物分割中展现出优秀性能,但仍存在建筑物漏分、错分,尤其是小建筑物漏分以及建筑物边缘不平滑等问题。针对上述问题,本文提出一种含多级通道注意力机制的条件生成对抗网络(conditional generative adversarial network,CGAN)模型Ra-CGAN,用于分割遥感图像建筑物。
方法
2
首先构建一个具有多级通道注意力机制的生成模型G,通过融合包含注意力机制的深层语义与浅层细节信息,使网络提取丰富的上下文信息,更好地应对建筑物的尺度变化,改善小建筑物漏分问题。其次,构建一个判别网络D,通过矫正真实标签图与生成模型生成的分割图之间的差异来改善分割结果。最后,通过带有条件约束的G和D之间的对抗训练,学习高阶数据分布特征,使建筑物空间连续性更强,提升分割结果的边界准确性及平滑性。
结果
2
在WHU Building Dataset和Satellite Dataset II数据集上进行实验,并与优秀方法对比。在WHU数据集中,分割性能相对于未加入通道注意力机制和对抗训练的模型明显提高,且在复杂建筑物的空间连续性、小建筑物完整性以及建筑物边缘准确和平滑性上表现更好;相比性能第2的模型,交并比(intersection over union,IOU)值提高了1.1%,F
1
-score提高了1.1%。在Satellite数据集中,相比其他模型,准确率更高,尤其是在数据样本不充足的条件下,得益于生成对抗训练,分割效果得到了大幅提升;相比性能第2的模型,IOU值提高了1.7%,F
1
-score提高了1.6%。
结论
2
本文提出的含多级通道注意力机制的CGAN遥感图像建筑物分割模型,综合了多级通道注意力机制生成模型与条件生成对抗网络的优点,在不同数据集上均获得了更精确的遥感图像建筑物分割结果。
Objective
2
Remote sensing building object segmentation is one of the important applications in image processing
which plays a vital role in smart city planning and urban change detection. However
building objects in remote sensing images have many complex characteristics
such as variable sizes
dense distributions
diverse topological shapes
complex backgrounds
and presence of occlusions and shadows. Traditional building segmentation algorithms are mainly based on manually designed features such as shapes
edges
and shadow features. These features are shallow features of the building target and cannot well express high-level semantic information
resulting in low recognition accuracy. By contrast
deep convolutional networks show excellent performance in pixel-level classification of natural images. Various fully convolutional network based image segmentation models have been continuously proposed. Most of these models use deconvolution or bilinear interpolation after feature extraction. Feature upsampling and pixel-by-pixel classification are used to segment the input image. The deep features of the building are extracted using highly nonlinear mapping and a large amount of data training
which overcomes the shortcomings of traditional algorithms. However
upsampling cannot completely compensate the information loss caused by repeated convolution and pooling operations in the deep convolutional network model. Therefore
the prediction results are relatively rough
such as small target misclassification
inaccurate boundaries
and other issues. In the field of remote sensing
public data sets are few. Training excellent deep convolutional networks is difficult
and the robustness of the network needs to be further improved. Aiming at the above problems
this paper proposes a conditional generative adversarial network (Ra-CGAN) with multilevel channel attention mechanism to segment remote sensing building objects.
Method
2
A generative model with a multilevel channel attention mechanism is first built. The model is based on a coding and decoding structure that solves small target misses by fusing deep semantics and shallow details with attention. Second
a discriminative network is built and used to distinguish whether the input comes from the real label map or the segmentation map generated by the model. The segmentation result (accuracy and smoothness) is improved by correcting the difference between the two maps. The downsampling method without pooling is used in the discriminator to enhance the propagation of the gradient. Finally
the generated model and the discriminant model are alternately confronted for training through the constraint of the conditional variable of the labelled image. Learning the higher-order data distribution characteristics results in more continuity for the target space. The loss function uses a hybrid loss function
which comes from the cross-entropy loss function brought by the generated map and the real label map in the generation mode. The discriminator predicts the generated image as the loss value brought by the real label image. Experiments are performed on the WHU Building Dataset and Satellite Dataset II datasets. The first dataset has a dense building with many types and accurate labels
and can provide comprehensive
representative evaluation capabilities for the model. Another dataset with a higher segmentation difficulty is used to verify the robustness and scalability of the model. The lighting information and background information of the building are more complex than those of the first dataset. The experiment uses the PyTorch deep learning framework. The size of original image and the label image are unified to 512×512 pixels for training
the learning rate of Adam is set to 0.000 2
the momentum parameter is 0.5
the batch-size is 12
and the epoch is 200 times. Acceleration is performed using NVIDIA GTX TITAN Xp. Evaluation indicators include intersection over union (IOU)
precision
recall
and F
1
-score.
Result
2
Experiments are performed on the WHU Building Dataset and Satellite Dataset II datasets
and the methods are compared with the latest literature. Experimental results show that in the WHU dataset
the segmentation performance of the Ra-CGAN model is substantially improved compared with models without attention mechanism and adversarial training. Space continuity and integrity of the complex building and small building
and smoothness of building edges are considerably improved. Compared with U-Net
IOU value is increased by 3.75%
and F
1
-score is increased by 2.52%. Compared with the second-performance model
IOU value is increased by 1.1%
and F
1
-score is increased by 1.1%. In the Satellite Dataset II
Ra-CGAN obtains more ideal results in terms of target integrity and smoothness than other models
especially in the case of insufficient data samples. Compared with U-Net
IOU value is increased by 7.26%
and F
1
-score is increased by 6.68%. Compared with the second-placed model
IOU value is increased by 1.7% and F
1
-score is increased by 1.6%.
Conclusion
2
A CGAN remote sensing building object segmentation model with multilevel channel attention mechanism
which combines the advantages of multilevel channel attention mechanism generation model and conditional generative adversarial networks
is proposed. Experimental results show that our model is superior to several state-of-the-art segmentation methods. Much more accurate remote sensing building object segmentation results are obtained on different datasets
proving that the model exhibits better robustness and scalability.
Badrinarayanan V, Kendall A and Cipolla R. 2017. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495[DOI:10.1109/TPAMI.2016.2644615]
Chen H, Guo W and Yan J W. 2019. Synthetic aperture radar image target segmentation method based on boundary and texture information. Journal of Image and Graphics, 24(6): 882-889
谌华, 郭伟, 闫敬文. 2019. 综合边界和纹理信息的合成孔径雷达图像目标分割. 中国图象图形学报, 24(6): 882-889[DOI:10.11834/jig.180484]
Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851[ DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49 ]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: NIPS: 2672-2680
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023[DOI:10.1109/TPAMI.2019.2913372]
Huang Y, Wang Q Q, Jia W J, Yue L and He X J. 2019. See more than once-kernel-sharing atrous convolution for semantic segmentation[EB/OL].[2020-02-01] . https://arxiv.org/pdf/1908.09443.pdf https://arxiv.org/pdf/1908.09443.pdf
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[ DOI: 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ]
Ji S P, Wei S Q and Lu M. 2019. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing, 57(1): 574-586[DOI:10.1109/TGRS.2018.2858817]
Jin F, Wang F, Rui J, Liu Z, Wang C and Zhang H. 2017. Residential area extraction based on conditional generative adversarial networks//Proceedings of 2017 SAR in Big Data Era: Models, Methods and Applications. Beijing, China: IEEE: 1-5[ DOI: 10.1109/BIGSARDATA.2017.8124931 http://dx.doi.org/10.1109/BIGSARDATA.2017.8124931 ]
Lin G S, Milan A, Shen C H and Reid L. 2017. Refinenet: multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5168-5177[ DOI: 10.1109/CVPR.2017.549 http://dx.doi.org/10.1109/CVPR.2017.549 ]
Maggiori E, Tarabalka Y, Charpiat G and Alliez P. 2017. High-resolution aerial image labeling with convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(12): 7092-7103[DOI:10.1109/TGRS.2017.2740362]
Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651[DOI:10.1109/TPAMI.2016.2572683]
Springenberg J T, Dosovitskiy A, Brox T and Riedmiller M. 2014. Striving for simplicity: the all convolutional net[EB/OL].[2020-02-01] . https://arxiv.org/pdf/1412.6806.pdf https://arxiv.org/pdf/1412.6806.pdf
Xu B, Wang N Y, Chen T Q and Li M. 2015. Empirical evaluation of rectified activations in convolutional network[EB/OL].[2020-02-01] . https://arxiv.org/pdf/1505.00853.pdf https://arxiv.org/pdf/1505.00853.pdf
Yu Y L, Li X Z and Liu F X. 2020. Attention GANs: unsupervised deep feature learning for aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 58(1): 519-531[DOI:10.1109/TGRS.2019.2937830]
Zhang X M, Zhu X B, Zhang III X Y, Zhang N G, Li P and Wang L. 2018a. SegGAN: semantic segmentation with generative adversarial network//Proceedings of the 4th IEEE International Conference on Multimedia Big Data. Xi'an, China: IEEE: 1-5[ DOI: 10.1109/BigMM.2018.8499105 http://dx.doi.org/10.1109/BigMM.2018.8499105 ]
Zhang Z L, Zhang X Y, Peng C, Xue X Y and Sun J. 2018b. Exfuse: enhancing feature fusion for semantic segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 273-288[ DOI: 10.1007/978-3-030-01249-6_17 http://dx.doi.org/10.1007/978-3-030-01249-6_17 ]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239[ DOI: 10.1109/CVPR.2017.660 http://dx.doi.org/10.1109/CVPR.2017.660 ]
相关作者
相关机构
京公网安备11010802024621