Airborne image segmentation via progressive multi-scale causal intervention

Feng Zhou; Renlong Hang; Chao Xu; Qingshan Liu; Guowei Yang

doi:10.11834/jig.211036

Medical Image Processing | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Airborne image segmentation via progressive multi-scale causal intervention
Vol. 28, Issue 2, Pages: 628-642(2023)
Published： 16 February 2023 ，

Accepted： 28 February 2022
DOI： 10.11834/jig.211036
稿件说明：

移动端阅览

Feng Zhou, Renlong Hang, Chao Xu, Qingshan Liu, Guowei Yang. Airborne image segmentation via progressive multi-scale causal intervention. [J]. Journal of Image and Graphics 28(2):628-642(2023)
DOI：

Feng Zhou, Renlong Hang, Chao Xu, Qingshan Liu, Guowei Yang. Airborne image segmentation via progressive multi-scale causal intervention. [J]. Journal of Image and Graphics 28(2):628-642(2023) DOI： 10.11834/jig.211036.

摘要

目的

航拍图像分割为遥感领域中许多实际应用提供支撑。与传统方法相比，深度学习方法能够自适应地学习与任务相关的特征，极大提升了分割精度，但忽略了数据集中的偏置问题。由偏置引起的混杂因子干扰使分割方法容易获得模糊的物体边缘，并且难以区分易混淆物体。针对这个问题，提出了一种基于渐进式多尺度因果干预的模型。

方法

首先，使用深度卷积神经网络提取航拍图像的卷积特征。然后，解混杂模块引入类别隐特征，近似表示混杂因子特征。同时，使用混杂因子特征以因果干预的方式将卷积特征分解成对应每一种混杂因子下的特征表示，抑制特定混杂因子的干扰。最后，由深层解混杂特征得到的分割结果，经过融合模块指导浅层解混杂特征生成分割结果，以此得到每个尺度的分割结果，并以加权求和的方式得到最终分割结果。

结果

实验在公开的航拍图像数据集Potsdam和Vaihingen上进行，与6种先进的深度学习分割方法和7种公开的基准方法进行对比。本文方法在Potsdam和Vaihingen数据集中的总体准确率分别为90.3%和90.8%，相比性能第2的深度学习方法分别提高了0.6%和0.8%。与性能第2的基准方法相比，本文方法在Potsdam和Vaihingen数据集上的总体准确率分别提升了1.3%和0.5%。

结论

本文提出的分割模型能够有效缓解数据集中的偏置问题，提升了航拍图像分割性能。

Abstract

Objective

Airborne-relevant image segmentation is one of the essential tasks for remote sensing

which can assign a semantic label to each pixel in an image. Its applications have been developing in related to such research domain like land use

urban planning

and environmental surveillance. To analyze the segmentation results of airborne image

most of conventional methods are concerned about label-manual features like scale-invariant feature transform (SIFT) and histogram of oriented gradient (HOG). Their performance is constrained of features-selected intensively. It is still challenged to deal with such complex scene. To optimize image classification tasks

the deep convolution neural network (DCNN) has been melted into pixel-wise classification issues like airborne image segmentation. The ability of DCNN is linked to auto task-adaptive features extraction for training to a certain extent. Fully convolutional network (FCN) can be used to improve the performance of airborne image segmentation. FCN-based UNet and SegNet are followed and developed further. A newly encoder-decoder design is involved in for airborne image segmentation. The fixed-size convolutional kernels are employed to capture contextual information for segmentation. Deep learning technique is beneficial for airborne image segmentation

but the output-learned is restricted of single-scale and local. In fact

it is required to handle the two challenging issues in airborne image segmentation as mentioned below: 1) remote sensing based multiple objects

and 2) multi-source images based heterogeneity. The first task is focused on multi-scale contexts for segmentation. The second one is developed to get discriminative information more in terms of global extraction. To alleviate the limitations and improve the performance

these two kinds of methods are compared to FCN-based methods. However

the mutual benefits are not included and the interference of confounders is leaked out. So

we develop a causal and effect-intervened segmentation method to suppress the interference of confounders.

Method

In this study

a progressive multi-scale cause and effect intervention model (PM-SCIM) is built up. First

the PM-SCIM takes ResNet18 as backbone network to extract convolutional features of airborne images. Then

a de-confounded module is designed to measure the average cause and effect of confounders on the convolutional feature through stratifying the confounders into different cases. In this way

to suppress the interference of a specific confounder

it is possible to collect objects in any context confounders indirectly. Next

the de-confounded feature generated is used to analyze the segmentation result from the deepest layer. This overall scale segmentation result can be obtained while a fusion module is fed into the segmentation results are guided in terms of de-confounded features from shallow layers. Finally

all segmentation results are fused via sum-weighted. The PM-SCIM is trained on two datasets of those are Potsdam and Vaihingen. For Potsdam

we choose 24 images for training and the remaining 14 images for testing. For Vaihingen

we select 16 images for training and the remaining 17 images for testing. To make full use of computing resources

a 256×256 sliding window is used to crop the input images for generating training samples. At inference phase

the same sliding method is used to crop input tiles from the original testing image and they are processed gradually. For training

the momentum parameter is set to 0.9

the learning rate is kept to 0.01

and the weight decay is configured at 0.000 01. The SGD (stochastic gradient descent) learning procedure is accelerated using a NVIDIA GTX TITAN X GPU device. A poly learning rate pathway is employed to update each iteration-after learning rate as well.

Result

Our demonstration is compared to 4 popular state-of-the-art deep methods and 7 public benchmark data sets. The quantitative evaluation metrics are composed of overall accuracy (OA) and F1 score

and we offer several segmentation maps of benched results for comparison. Specifically

the OA is increased by 0.6% and 0.8% each (higher is better)

and mean F1 increased by 0.7% and 1% of each as well (higher is better) compared to DANet on Potsdam and Vaihingen. The OA is increased by 1.3%

and the mean F1 is increased by 0.3% in comparison with CVEO2 on Potsdam. The OA is increased by 0.5% and the mean F1 is increased by 0.5% in terms of the comparative analysis with DLR_10 on Vaihingen. The segmentation maps showed that our method has its potentials for small objects (e.g.

car) and ambiguous objects (e.g.

tree and lawn). Additionally

to clarify the effectiveness of multiple modules in PM-SCIM

a series of ablation studies on Potsdam and Vaihingen are carried out.

Conclusion

To suppress the interference of confounders using cause and effect intervention

a novel segmentation method is proposed and developed through melting de-confounded module and fusion module into ResNet18.

关键词

航拍图像语义分割卷积神经网络(CNN)因果干预解混杂

Keywords

airborne imagesemantic segmentationconvolutional neural networkcausal interventionde-confound

references

Audebert N, Le Saux B and Lefèvre S. 2016. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 180-196[DOI: 10.1007/978-3-319-54181-5_12http://dx.doi.org/10.1007/978-3-319-54181-5_12]

Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495[DOI: 10.1109/TPAMI.2016.2644615]

Chen G Z, Zhang X D, Wang Q, Dai F, Gong Y F and Zhu K. 2018a. Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(5): 1633-1644[DOI: 10.1109/JSTARS.2018.2810320]

Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018b. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851[DOI: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]

Ding L, Tang H and Bruzzone L. 2021. LANet: local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 59(1): 426-435[DOI: 10.1109/TGRS.2020.2994150]

Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3141-3149[DOI: 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326]

Gerke M. 2014. Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen)[EB/OL]. [2021-10-20].https://research.utuente.cl/en/publications/use-of-the-stair-vision-library-within-the-isprs-2d-semantic-labehttps://research.utuente.cl/en/publications/use-of-the-stair-vision-library-within-the-isprs-2d-semantic-labe

Han B B, Zhang Y T, Pan Z X, Tai X Q and Li F F. 2020. Residual dense spatial pyramid network for urban remote sensing image segmentation. Journal of Image and Graphics, 25(12): 2656-2664

韩彬彬, 张月婷, 潘宗序, 台宪青, 李芳芳. 2020. 残差密集空间金字塔网络的城市遥感图像分割. 中国图象图形学报, 25(12): 2656-2664[DOI: 10.11834/jig.190557]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C and Liu W Y. 2019. CCNet: Criss-cross attention for semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on International Conference on Computer Vision. Seoul, Korea(South): IEEE: 603-612[DOI: 10.1109/iccv.2019.00069http://dx.doi.org/10.1109/iccv.2019.00069]

Kemker R, Salvaggio C and Kanan C. 2018. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing, 145: 60-77[DOI: 10.1016/j.isprsjprs.2018.04.014]

Li X W, Li Y S and Zhang Y J. 2021. Weakly supervised deep semantic segmentation network for water body extraction based on multi-source remote sensing imagery. Journal of Image and Graphics, 26(12): 3015-3026

李鑫伟, 李彦胜, 张永军.2021. 弱监督深度语义分割网络的多源遥感影像水体检测. 中国图象图形学报, 26(12): 3015-3026[DOI: 10.11834/jig.200192]

Liu W, Rabinovich A and Berg A C. 2015. ParseNet: looking wider to see better[EB/OL]. [2015-11-19].https://arxiv.org/pdf/1506.04579v2.pdfhttps://arxiv.org/pdf/1506.04579v2.pdf

Liu Y C, Fan B, Wang L F, Bai J, Xiang S M and Pan C H. 2018. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS Journal of Photogrammetry and Remote Sensing, 145: 78-95[DOI: 10.1016/j.isprsjprs.2017.12.007]

Liu Y S, Piramanayagam S, Monteiro S T and Saber E. 2017. Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order CRFs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Hawaii, USA: IEEE: 1561-1570[DOI: 10.1109/CVPRW.2017.200http://dx.doi.org/10.1109/CVPRW.2017.200]

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]

Long Y, Gong Y P, Xiao Z F and Liu Q. 2017. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(5): 2486-2498[DOI: 10.1109/TGRS.2016.2645610]

Marmanis D, Schindler K, Wegner J D, Galliani S, Datcu M and Stilla U. 2018. Classification with an edge: improving semantic image segmentation with boundary detection. ISPRS Journal of Photogrammetry and Remote Sensing, 135: 158-172[DOI: 10.1016/j.isprsjprs.2017.11.009]

Mou L C, Ghamisi P and Zhu X X. 2017. Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3639-3655[DOI: 10.1109/TGRS.2016.2636241]

Mou L C, Hua Y S and Zhu X X. 2019. A relation-augmented fully convolutional network for semantic segmentation in aerial scenes//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12408-12417[DOI: 10.1109/CVPR.2019.01270http://dx.doi.org/10.1109/CVPR.2019.01270]

Nogueira K, Mura M D, Chanussot J, Schwartz W R and Dos Santos J A. 2019. Dynamic multicontext segmentation of remote sensing images based on convolutional networks. IEEE Transactions on Geoscience and Remote Sensing, 57(10): 7503-7520[DOI: 10.1109/TGRS.2019.2913861]

Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Sun W W and Wang R S. 2018. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM. IEEE Geoscience and Remote Sensing Letters, 15(3): 474-478[DOI: 10.1109/LGRS.2018.2795531]

Volpi M and Tuia D. 2017. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(2): 881-893[DOI: 10.1109/TGRS.2016.2616585]

Wang T, Huang J Q, Zhang H W and Sun Q R. 2020. Visual commonsense R-CNN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10757-10767[DOI: 10.1109/CVPR42600.2020.01077http://dx.doi.org/10.1109/CVPR42600.2020.01077]

Yang X, Zhang H W, Qi G J and Cai J F. 2021. Causal attention for vision-language tasks//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 9842-9852[DOI: 10.1109/CVPR46437.2021.00972http://dx.doi.org/10.1109/CVPR46437.2021.00972]

Yu S and Wang X L. 2021. Remote Sensing building segmentation by CGAN with multilevel channel attention mechanism. Journal of Image and Graphics, 26(3): 686-699

余帅, 汪西莉. 2021. 含多级通道注意力机制的CGAN遥感图像建筑物分割. 中国图象图形学报, 26(3): 686-699[DOI: 10.11834/jig.200059]

Zhang H, Dana K, Shi J P, Zhang Z Y, Wang X G, Tyagi A and Agrawal A. 2018. Context encoding for semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7151-7160[DOI: 10.1109/CVPR.2018.00747http://dx.doi.org/10.1109/CVPR.2018.00747]

ZhaoH S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 2881-2890[DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]

Zheng J H, Liu X Y and Wang X D. 2021. Single image cloud removal using u-net and generative adversarial networks. IEEE Transactions on Geoscience and Remote Sensing, 59(8): 6371-6385[DOI: 10.1109/TGRS.2020.3027819]

Zhou F, Hang R L and Liu Q S. 2021. Class-guided feature decoupling network for airborne image segmentation. IEEE Transactions on Geoscience and Remote Sensing, 59(3): 2245-2255[DOI: 10.1109/TGRS.2020.3006872]

Alert me when the article has been cited

提交

Bilateral cross enhancement with self-attention compensation for semantic segmentation of point clouds

Semantic segmentation benchmark dataset for coastal ecosystem monitoring based on unmanned aerial vehicle （UAV）

Weakly supervised semantic segmentation based on deep learning

Cross-layer detail perception and group attention-guided semantic segmentation network for remote sensing images

Low-light image enhancement guided by semantic segmentation and HSV color space