渐进式多尺度因果干预航拍图像分割

周峰; 杭仁龙; 徐超; 刘青山; 杨国为

发布时间： 2023-02-18
摘要点击次数： 922
全文下载次数： 514
DOI: 10.11834/jig.211036
2023 | Volume 28 | Number 2

渐进式多尺度因果干预航拍图像分割

周峰¹, 杭仁龙², 徐超¹, 刘青山², 杨国为^1,3(1.南京审计大学计算机学院, 南京 211815;2.南京信息工程大学计算机与软件学院, 南京 210044;3.青岛大学电子信息学院, 青岛 266071)

摘要

目的航拍图像分割为遥感领域中许多实际应用提供支撑。与传统方法相比，深度学习方法能够自适应地学习与任务相关的特征，极大提升了分割精度，但忽略了数据集中的偏置问题。由偏置引起的混杂因子干扰使分割方法容易获得模糊的物体边缘，并且难以区分易混淆物体。针对这个问题，提出了一种基于渐进式多尺度因果干预的模型。方法首先，使用深度卷积神经网络提取航拍图像的卷积特征。然后，解混杂模块引入类别隐特征，近似表示混杂因子特征。同时，使用混杂因子特征以因果干预的方式将卷积特征分解成对应每一种混杂因子下的特征表示，抑制特定混杂因子的干扰。最后，由深层解混杂特征得到的分割结果，经过融合模块指导浅层解混杂特征生成分割结果，以此得到每个尺度的分割结果，并以加权求和的方式得到最终分割结果。结果实验在公开的航拍图像数据集Potsdam和Vaihingen上进行，与6种先进的深度学习分割方法和7种公开的基准方法进行对比。本文方法在Potsdam和Vaihingen数据集中的总体准确率分别为90.3%和90.8%，相比性能第2的深度学习方法分别提高了0.6%和0.8%。与性能第2的基准方法相比，本文方法在Potsdam和Vaihingen数据集上的总体准确率分别提升了1.3%和0.5%。结论本文提出的分割模型能够有效缓解数据集中的偏置问题，提升了航拍图像分割性能。

关键词

航拍图像语义分割卷积神经网络(CNN) 因果干预解混杂

Airborne image segmentation via progressive multi-scale causal intervention

Zhou Feng¹, Hang Renlong², Xu Chao¹, Liu Qingshan², Yang Guowei^1,3(1.School of Computer Science, Nanjing Audit University, Nanjing 211815, China;2.School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China;3.School of Electronic Information, Qingdao University, Qingdao 266071, China)

Abstract

Objective Airborne-relevant image segmentation is one of the essential tasks for remote sensing, which can assign a semantic label to each pixel in an image. Its applications have been developing in related to such research domain like land use, urban planning, and environmental surveillance. To analyze the segmentation results of airborne image, most of conventional methods are concerned about label-manual features like scale-invariant feature transform (SIFT) and histogram of oriented gradient (HOG). Their performance is constrained of features-selected intensively. It is still challenged to deal with such complex scene. To optimize image classification tasks, the deep convolution neural network (DCNN) has been melted into pixel-wise classification issues like airborne image segmentation. The ability of DCNN is linked to auto task-adaptive features extraction for training to a certain extent. Fully convolutional network (FCN) can be used to improve the performance of airborne image segmentation. FCN-based UNet and SegNet are followed and developed further. A newly encoder-decoder design is involved in for airborne image segmentation. The fixed-size convolutional kernels are employed to capture contextual information for segmentation. Deep learning technique is beneficial for airborne image segmentation, but the output-learned is restricted of single-scale and local. In fact, it is required to handle the two challenging issues in airborne image segmentation as mentioned below:1) remote sensing based multiple objects, and 2) multi-source images based heterogeneity. The first task is focused on multi-scale contexts for segmentation. The second one is developed to get discriminative information more in terms of global extraction. To alleviate the limitations and improve the performance, these two kinds of methods are compared to FCN-based methods. However, the mutual benefits are not included and the interference of confounders is leaked out. So, we develop a causal and effect-intervened segmentation method to suppress the interference of confounders. Method In this study, a progressive multi-scale cause and effect intervention model (PM-SCIM) is built up. First, the PM-SCIM takes ResNet18 as backbone network to extract convolutional features of airborne images. Then, a de-confounded module is designed to measure the average cause and effect of confounders on the convolutional feature through stratifying the confounders into different cases. In this way, to suppress the interference of a specific confounder, it is possible to collect objects in any context confounders indirectly. Next, the de-confounded feature generated is used to analyze the segmentation result from the deepest layer. This overall scale segmentation result can be obtained while a fusion module is fed into the segmentation results are guided in terms of de-confounded features from shallow layers. Finally, all segmentation results are fused via sum-weighted. The PM-SCIM is trained on two datasets of those are Potsdam and Vaihingen. For Potsdam, we choose 24 images for training and the remaining 14 images for testing. For Vaihingen, we select 16 images for training and the remaining 17 images for testing. To make full use of computing resources, a 256×256 sliding window is used to crop the input images for generating training samples. At inference phase, the same sliding method is used to crop input tiles from the original testing image and they are processed gradually. For training, the momentum parameter is set to 0.9, the learning rate is kept to 0.01, and the weight decay is configured at 0.000 01. The SGD (stochastic gradient descent) learning procedure is accelerated using a NVIDIA GTX TITAN X GPU device. A poly learning rate pathway is employed to update each iteration-after learning rate as well. Result Our demonstration is compared to 4 popular state-of-the-art deep methods and 7 public benchmark data sets. The quantitative evaluation metrics are composed of overall accuracy (OA) and F1 score, and we offer several segmentation maps of benched results for comparison. Specifically, the OA is increased by 0.6% and 0.8% each (higher is better), and mean F1 increased by 0.7% and 1% of each as well (higher is better) compared to DANet on Potsdam and Vaihingen. The OA is increased by 1.3%, and the mean F1 is increased by 0.3% in comparison with CVEO2 on Potsdam. The OA is increased by 0.5% and the mean F1 is increased by 0.5% in terms of the comparative analysis with DLR_10 on Vaihingen. The segmentation maps showed that our method has its potentials for small objects (e.g., car) and ambiguous objects (e.g., tree and lawn). Additionally, to clarify the effectiveness of multiple modules in PM-SCIM, a series of ablation studies on Potsdam and Vaihingen are carried out. Conclusion To suppress the interference of confounders using cause and effect intervention, a novel segmentation method is proposed and developed through melting de-confounded module and fusion module into ResNet18.

Keywords

airborne image semantic segmentation convolutional neural network causal intervention de-confound

在线采编平台

在线出版

年度会议

下载中心

年度信息