发布时间: 2021-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200461
2021 | Volume 26 | Number 11

图像分析和识别

L-UNet：轻量化云遮挡道路提取网络

许苗¹, 李元祥¹, 钟娟娟², 左宗成¹, 熊伟²

1. 上海交通大学航空航天学院, 上海 200240;

2. 中航工业集团雷华电子技术研究所, 无锡 214063

收稿日期: 2020-08-24; 修回日期: 2020-12-21; 预印本日期: 2020-12-28

基金项目: 工业和信息化部民机专项项目(MJZ-2016-S-44)

作者简介: 许苗, 1996年生, 女, 硕士研究生, 主要研究方向为遥感图像分割及去云。E-mail: miaoxusjtu@163.com
李元祥, 通信作者, 男, 副教授, 博士生导师, 主要研究方向为图像识别和故障预测。E-mail: yuanxli@sjtu.edu.cn
钟娟娟, 女, 高级工程师, 主要研究方向为雷达目标识别。E-mail: zhongjuan19821211@tom.com
左宗成, 男, 博士研究生, 主要研究方向为高分辨率遥感影像智能信息提取。E-mail: whuzzc@163.com
熊伟, 男, 高级工程师, 主要研究方向为雷达信号处理。E-mail: xiongweiwhumath@sina.com
*通信作者: 李元祥 yuanxli@sjtu.edu.cn

中图法分类号: TP751

文献标识码: A

文章编号: 1006-8961(2021)11-2670-10

摘要

目的道路提取是常见的遥感应用之一。现有的基于深度卷积网络的道路提取方法往往未考虑云遮挡给道路提取带来的影响，且提取网络模型较大，不利于在移动端部署，同时缺乏用于云遮挡场景下的道路提取数据集。对此，本文提出一种轻量化的UNet网络(lightweight UNet，L-UNet)，高效地实现云遮挡下的道路提取。方法通过柏林噪声模拟云层以扩展现有道路提取数据集，进而训练L-UNet。使用移动翻转瓶颈卷积模块作为特征提取的主要结构，在深度可分离卷积的基础上加入扩展卷积和压缩激励模块，在减少参数量的同时大幅提升了分割效果。结果在DeepGlobe道路提取扩展数据集的测试中，与D-LinkNet相比，L-UNet的交并比(intersection over union，IoU)提升了1.97%，而参数量仅为D-LinkNet的1/5。在真实云遮挡遥感图像道路提取测试中，L-UNet的性能仍然最优，与D-LinkNet和UNet相比，IoU值分别提高19.47%和31.87%。结论 L-UNet网络具有一定的云遮挡区域下道路标签生成能力，虽然在模拟云遮挡数据集下训练得到，但对于真实云遮挡仍具有较强的鲁棒性。L-UNet模型参数量很小，易于嵌入移动端。

关键词

道路提取; 轻量化UNet(L-UNet); 遥感图像; 云层遮挡仿真; 深度学习

L-UNet: lightweight network for road extraction in cloud occlusion scene

Xu Miao¹, Li Yuanxiang¹, Zhong Juanjuan², Zuo Zongcheng¹, Xiong Wei²

1. School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China;

2. Leihua Electronic Technology Research Institute, the Aviation Industry Corporation of China, Wuxi 214063, China

Supported by: Civil Aircraft Special Project of Ministry of Industry and Information Technology (MJZ-2016-S-44)

Abstract

Objective Road extraction is one of the primary tasks in the field of remote sensing. It has been applied in many areas, such as urban planning, route optimization, and navigation. Especially in the event of disasters such as mudslides, floods, and earthquakes, road information will suddenly change. Embedding road extraction models on the mobile terminal has an essential application value for rapid rescue. In recent years, deep learning has provided new ideas for realizing road pixel-level extraction, such as the classic image segmentation network UNet and the improved road extraction networks based on UNet, such as Residual UNet, LinkNet, and D-LinkNet. They can achieve road extraction better than traditional methods based on primary image feature extraction. However, these methods that rely on deep convolutional networks still have two problems: 1) cloud occlusion seriously affects the retrieval of information about ground objects in remote sensing images. At present, these convolutional network models are all trained on clear remote sensing images and do not consider the effect of cloud occlusion on road extraction. Their road extraction performance on cloudy remote sensing images is substantially reduced. 2) Network lightweight design has been an engaging area of research for several years. None of the above models based on deep learning considers the lightweight design of deep convolutional networks, which adds considerable difficulty to the deployment of these models. To address these road extraction problems, a lightweight UNet (L-UNet) is proposed, and road extraction is implemented in an end-to-end manner in the cloud occlusion scene. Method 1) To address the problem of cloud occlusion, the Perlin noise is used to simulate a cloud layer image, and then the artificial cloud layer image and an RGB remote sensing image merge through the alpha coefficient to simulate the cloud occlusion scene. This simulation method is used to extend the cloudless road extraction dataset. Specifically, 20 000 artificial cloud layer images have been generated before the network training. During training, cloud layer images are randomly sampled with replacement. The selected cloud layer image is merged with the clear remote sensing image in the training dataset, thereby simulating the continually changing cloud occlusion scenes. 2) In terms of network lightweight, UNet, a fully convolutional neural network, is improved to obtain L-UNet. The main improvement is the use of mobile inverted bottleneck convolutional blocks (MBConv) in the encoder. The MBConv first uses depthwise separable convolution, which considerably reduces the number of network params. However, the performance of road extraction only using depthwise separable convolution is not ideal; thus, expand convolution is added. Expand convolution with several 1×1 convolution kernels can increase the number of feature channels for each layer in the encoder part. Therefore, each layer of the network can learn more abundant features. The MBConv also uses a squeeze-and-excitation block. The block consists of two parts: global pooling for squeeze and 1×1 convolution with swish function for excitation. The squeeze-and-excitation block rationalizes the relative weights between the output feature maps of each layer. It highlights the feature information related to roads and clouds, which is beneficial to the segmentation tasks. Moreover, the swish is selected as the activation function rather than the rectified linear unit (ReLU). The L-UNet model reduces the param of the original UNet model and achieves better results. 3) The training loss function is the sum of the binary cross-entropy loss and the dice coefficient loss. The optimizer for network training is Adam, which has an initial learning rate of 2E-4. The encoder parameters of L-UNet are initialized to "ImageNet" pretrained model parameters. Then, the training is finetuned. The PyTorch deep learning framework is selected to implement L-UNet model construction and experiment. L-UNet performs 233 epochs of training on two NVIDIA GTX 1080 TI GPUs and finally converges. Result Network training and comparison experiments are carried out on the DeepGlobe road extraction extended dataset. 1) In the trial of comparing the network structure, the baseline network is UNet# that only contains depthwise separable convolution. When the expand convolution and squeeze-and-excitation block are added separately, the corresponding intersection over union (IoU) values increase by 1.12% and 8.45%, respectively. Adding the expand convolution and squeeze-and-excitation block simultaneously increases the IoU index by 16.24%. 2) L-UNet is compared with other networks on the extended test dataset. The IoU index of L-UNet rises by 4.65% compared with UNet. The IoU index increases by 1.97% compared with D-LinkNet, which is the second most powerful. The L-UNet param is only 22.28 M, which is 1/7 of UNet and 1/5 of D-LinkNet. The Mask-IoU and Mask-P indices, which are used to measure the network's road prediction performance in the cloud occlusion area, are also higher than those of other networks. 3) For road extraction tests on several real cloudy remote sensing images from Sentinel-2 satellite, the performance of L-UNet remains the best. The average IoU of the detection results is higher than D-LinkNet's 19.47% and UNet's 31.87%. Conclusion This paper studies the problem of road extraction from remote sensing images in cloud occlusion scenes. Simulated cloud layers are added on existing datasets, and extended datasets are used to improve the robustness of existing deep learning-based methods against cloud occlusion interference. The proposed L-UNet network architecture dramatically reduces the param and has excellent performance for road extraction given cloud cover. It can even predict road labels under thick clouds through known visible road edges and trends; thus, its road detection results have a better consistency. Other tasks for extracting remotely sensed ground objects with cloud cover can also use our method in future work.

Key words

road extraction; lightweight UNet (L-UNet); remote sensing image; cloud occlusion simulation; deep learning

0 引言

从遥感图像中提取道路已在城市规划、路线优化和导航等领域广泛应用。遥感图像道路提取方法大致分为两类(史文中等，2001)：生成道路的像素级标签和检测道路中心线(Cheng等，2017)。第1类方法需解决图像分割问题，深度学习为此提供了新思路，如经典的UNet网络(Ronneberger等，2015)。在此基础上，Zhang等人(2018)提出Residual UNet，在编码支路和解码支路上均采用残差卷积块代替一般卷积，并设有3阶段的特征提取块，降低了原UNet网络的深度和参数量，但表现优于UNet。张军军等人(2020)基于LinkNet运用迁移学习实现道路快速提取。Zhou等人(2018)提出D-LinkNet网络，用空洞卷积对LinkNet(Woo等，2018)进行改进，在2018年DeepGlobe比赛(Demir等，2018)中取得佳绩。

当发生地震、洪涝和泥石流等突发灾害时，道路信息会随之发生变化。在移动端嵌入道路提取模型对实施快速救援具有重要的应用价值。但前述基于深度学习的模型均没有考虑深度卷积网络的轻量化设计，从而使这些模型的移动端嵌入增加了难度。网络轻量化设计是当前的研究热点(Cheng等，2018)，已有不少人工设计的轻量网络结构应用在图像分类方面。Howard等人(2017)首次提出深度可分离卷积技术并构建了MobileNet网络，在保证分类精度的同时大幅减少了网络的参数量；EfficientNet(Tan和Le，2019)通过网络架构搜索(neural architecture search，NAS)(Zoph和Le，2016)得到轻量且高效的网络设计，在MobileNet等相关网络的基础上，重新考虑网络深度、宽度及特征图分辨率3个维度与图像分类精度、效率之间的关系，并设计合适的约束条件，通过NAS搜索得到一系列精度和效率兼优的网络模型。如何借鉴这些网络结构设计轻量化的图像分割网络，从而提取遥感图像中的道路值得进一步研究。

此外，云遮挡会严重影响遥感图像中的地物提取(Lin等，2014)。地物提取一般在云去除之后进行(Sarukkai等，2020)，但是对于特定道路的提取，如狭窄道路，在遥感图像中仅占很小的像素比例，若去云后再提取道路，不仅过程烦琐而且易导致道路丢失。目前的道路提取方法均在无云的遥感图像上进行，没有考虑云遮挡给道路提取带来的影响，在有云的遥感图像上的道路提取性能大幅降低。

为解决云遮挡条件下的道路提取问题，本文一方面从数据集扩展的角度出发，提出用仿真云层增强原数据集，使应用深度学习的方法端到端地解决该问题成为可能；另一方面，设计了轻量化UNet网络(lightweight UNet，L-UNet)，主要应用移动翻转瓶颈卷积块(Tan和Le，2019)改进UNet的编码器模块，使整个网络模型更小且性能有所提升。该网络在合成数据集上的表现超越了D-LinkNet，模型大小仅为D-LinkNet的1/5。

1 遥感图像云遮挡仿真

目前尚无云遮挡条件下道路提取的公开数据集，本文通过在无云遥感图像上合成模拟云创建新的数据集，增强网络模型对云遮挡干扰的鲁棒性。首先利用柏林噪声(Perlin，2002)对云层进行模拟(Enomoto等，2017)，然后通过alpha系数融合仿真云层与RGB遥感图像，生成云遮挡仿真图像，如图 1所示。RGB遥感图像(图 1(a))被柏林噪声模拟的云层(图 1(b))，采用alpha混合方法(Porter和Duff，1984)合成仿真图像(图 1(c))，再用判断阈值的方法可得到该仿真图像云遮挡部分的云掩膜(图 1(d))。引入云掩膜可进一步评判网络模型在仿真云遮挡区域下道路预测的性能。由于仿真云层透明度不同，视觉上产生云层厚薄不一的效果。整个云仿真图数据集云覆盖率约为40%~70%，其中模拟厚云的平均覆盖率为3%~23%，模拟薄云的平均覆盖率为28%~59%。使用此类仿真云图训练网络模型，期望最终模型对厚云及薄云的云遮挡区域均具有鲁棒性。

图 1 云遮挡仿真示例

Fig. 1 An example of cloud occlusion simulation

((a) clear image; (b) cloud simulation; (c) merged image; (d) cloud mask)

2 轻量化UNet网络设计

UNet网络(Ronneberger等，2015)是经典的端到端分割网络，通过跳跃连接使低维信息有效地补偿了高维特征，大幅提升最终的分割效果。轻量化UNet(L-UNet)网络借鉴EfficientNet-B0整体网络结构(Tan和Le，2019)设计，在特征提取支路中进行较大改进，将普通的卷积替换成移动翻转瓶颈卷积网络块(mobile inverted bottleneck convolutional blocks，MBConv)，并且每个阶段的特征提取块中的卷积网络层数及特征通道数也均为NAS搜索得到的最优组合设计。另外，为了保留更加完整的信息，在复制整合低维特征图时舍弃了UNet网络中原有的中心剪切的操作。整体网络结构如图 2所示。移动翻转瓶颈卷积模块主要包括扩展卷积，压缩激励模块和深度可分离卷积，如图 3所示。

图 2 轻量化UNet网络架构

Fig. 2 Architecture of the lightweight UNet

图 3 移动翻转瓶颈卷积块

Fig. 3 A mobile inverted bottleneck convolutional block

深度可分离卷积(Howard等，2017)包括深度卷积和逐点卷积。深度卷积对输入特征图逐通道提取特征后，逐点卷积再用1×1卷积核将特征进行通道维度线性组合构成新的特征图。设输入特征图尺寸为$(H, W, C)$, 卷积核尺寸为$(k, k)$，输出特征图尺寸为$\left(H, W, C^{\prime}\right)$，则普通卷积的参数量$P$和深度可分离卷积的参数量$P^{\prime}$分别为

$ \begin{gathered} P=k \times C \times C^{\prime} \end{gathered} $

(1)

$ P^{\prime}=C \times\left(k^{2}+C^{\prime}\right) $

(2)

深度可分离卷积的参数量仅约为普通卷积的$1 / k^{2} $，网络的参数量大幅减少。

压缩激励模块(Hu等，2020)是获取特征通道之间关联性的网络块，如图 4所示。压缩步骤采用全局池化分别聚合每个通道的特征，激励步骤由1×1卷积和非线性激活函数来实现，目的是利用自门控机制产生每个特征通道调整权重的集合，从而捕获通道之间的依赖关系，在通道维度上突出重要的与道路和云相关的特征信息，抑制无用的其余特征。

图 4 压缩激励模块

Fig. 4 A squeeze-and-excitation block

此外，移动翻转瓶颈卷积引入残差思想，解决训练过程中梯度消失问题；该网络块的激活函数为Swish函数，具备平滑和非单调的特性，在深层网络中的性能表现优于ReLU函数，具体为

$ f(\boldsymbol{x})=\boldsymbol{x} \cdot \operatorname{Sigmoid}(\boldsymbol{x}) $

(3)

式中，$ \boldsymbol{x}$为激活函数的输入特征。

3 轻量化UNet网络训练

除了实现L-UNet网络，本文还对比了UNet(Ronneberger等，2015)、Resdual UNet (Zhang等2018)、ResUNet、LinkNet(Woo等，2018)、D-LinkNet(Zhou等，2018)等几种网络。此处的ResUNet不同于前述的Resdual UNet, ResUNet仅在UNet的编码支路使用了残差卷积模块且有5阶段的特征提取块。所有网络均在PyTorch学习框架下实现，并使用两张NVIDIA GTX1080 TI GPU进行训练。

3.1 数据集

网络模型训练和测试使用DeepGlobe道路提取数据集(Demir等，2018)，其内容广泛丰富，涵盖泰国、印度和印度尼西亚的城市和农村地区，图像均为1 024×1 024像素的RGB卫星遥感图。实验时，从中分别选取5 226、1 012和1 000幅无云遥感图作为训练集、验证集和测试集的基础图像。每轮训练迭代时，从已经生成的20 000幅云仿真图(网址https://drive.google.com/file/d/1UuUmhtVy0gCl6TI bOHW-DA0o2RlraGKZ/view)中随机有放回地抽取5 226幅云仿真图与训练集图像进行融合，模拟不断变换的云遮挡场景。然后对云层融合后的仿真图像进行数据增强，包括图像平移缩放，水平、垂直、对角方向翻转以及HSI(hue-saturation-intensity)的3个分量上的波动等，使训练的模型具有更好的泛化性能。而验证集和测试集图像则分别与固定的2 012幅不同仿真云层融合，保证评价标准统一。网络使用的标签真值图没有考虑云遮挡的影响，使用原数据集的真值图。

3.2 损失函数及其他设置

所有网络均选用综合损失$ {L_{{\rm{total }}}}$作为损失函数。综合损失$ {L_{{\rm{total }}}}$包括二值交叉熵损失${L_{{\rm{BCE }}}} $和Dice系数损失$ {L_{{\rm{dice }}}}$ (Milletari等，2016)，即

$ L_{\text {total }}=L_{\text {BCE }}+L_{\text {dice }} $

(4)

二值交叉熵损失${L_{{\rm{BCE }}}} $的计算为

$ L_{\mathrm{BCE}}=-\sum\limits_{i}^{N}\left[\boldsymbol{y}^{(i)} \log \hat{\boldsymbol{y}}^{(i)}+\left(1-\boldsymbol{y}^{(i)}\right) \log \left(1-\hat{\boldsymbol{y}}^{(i)}\right)\right] $

(5)

式中，$N$为批处理大小，$i$为对应的索引，$\boldsymbol{y}$ 是样本真值，$\hat{\boldsymbol{y}}$ 为网络预测概率值。

Dice系数损失对于类别不平衡问题表现更优，而在一般遥感图像中道路的像素占很少的比例，所以可以将道路提取归为类别不平衡问题。Dice系数损失的表达式为

$ L_{\text {dice }}=1-\sum\limits_{i}^{N} 2\left|\boldsymbol{Y}_{i} \cap \boldsymbol{G}_{i}\right| / \sum\limits_{i}^{N}\left|\boldsymbol{Y}_{i} \cup \boldsymbol{G}_{i}\right| $

(6)

式中，$\boldsymbol{G}$ 为道路标签真值，$\boldsymbol{Y}$ 为网络最终输出的道路标签，$N$和$i$的含义与式(5)相同。

网络训练的优化器为Adam，对有预训练模型要求的网络(除UNet、Resdual UNet外)，运用迁移学习将网络的编码块参数初始化为ImageNet预训练模型，再对网络进行微调训练。训练的初始学习率为2×10^-4，并在训练过程缓慢时自动降低学习率，直到降低为0，结束训练。本文L-UNet网络进行了233次训练迭代达到收敛。

4 实验结果及分析

为验证本文方法的性能，分别进行轻量化网络模块对比实验和模拟云遮挡与真实云遮挡的网络对比实验。4.2和4.3小节实验均以原始DeepGlobe数据集上最优D-LinkNet_or模型(Zhou等，2018)检测结果作为基线进行参考，该模型在训练过程中没有引入云遮挡仿真。

4.1 轻量化网络模块对比实验

移动翻转瓶颈卷积块与普通卷积块或残差卷积块相比，主要多了扩展卷积(expand convolution，EX)和压缩激励模块(squeeze-and-excitation block，SE)。为了验证这两种模块的使用效果，进行对比实验，主要对比指标是道路标签的交并比(intersection over union，IoU)和网络参数量(param)，卷积网络参数量按式(1)和式(2)计算。UNet#为基本对比网络，其在编码支路的卷积模块仅包含带残差的深度可分离卷积，其余与L-UNet相同。L-UNet中的移动翻转瓶颈卷积块与MobileNetV2(Sandler等，2018)的基本卷积块类似，差别是移动翻转瓶颈卷积块使用SE模块并改用Swish激励函数而非ReLU函数。实验同时将用MobileNetV2改进的UNet网络结构(M-UNet)作为对比的网络结构。这些网络在测试仿真数据集上的评价结果如表 1和图 5所示。

表 1 不同网络模块的IoU和参数量对比
Table 1 Comparison of IoU and params among different network modules

下载CSV

网络	IoU/%	参数量/M
UNet#	50.07	9.65
UNet#+SE	51.19	10.06
UNet#+EX	58.52	19.85
M-UNet	58.81	25.29
L-UNet(UNet#+EX+SE)	66.31	22.28
注：加粗字体表示各列最优结果。

图 5 轻量化网络模块综合性能对比

Fig. 5 Comparison of comprehensive performances among lightweight network modules

从表 1可以看出，1)在深度可分离卷积的基础中加入EX，可有效提升道路提取的性能，这是由于EX增加了每层网络的特征通道数即网络的宽度，从而使每层网络学到更丰富的特征；2)无论是否有EX，在卷积模块中加入SE，均能进一步提升网络的道路提取性能，因为SE进一步合理化了每层网络输出特征图之间的相对权重，突出与道路及云层有关的特征信息，更有利于后续分割任务的进行。

从图 5可以看出，1)在一定范围内，相较于实线代表的EX，虚线和点虚线对应的SE模块提升相同幅度的道路提取性能会占用更少的参数量；2)点虚线的提升幅度明显高于虚线，EX是增加输入特征通道数的操作，EX与SE相结合能大幅提升IoU，提高道路提取的性能；3)M-UNet对应的点位于L-UNet的右下方，无论模型大小还是性能表现均逊于L-UNet。

4.2 模拟云遮挡道路提取

实验在云仿真扩展数据集上进行，对UNet、Resdual UNet、ResUNet、LinkNet、D-LinkNet和D-LinkNet_or等6种网络进行对比，评价指标采用精确度(precision)、召回率(recall)和交并比(IoU)。同时，针对云遮挡场景定义云遮挡区域内的平均交并比Mask-IoU和云遮挡区域内检测到道路的概率Mask-P两个指标。Mask-IoU仅统计在云掩膜(图 1(d))区域内检测结果图和真值图中判为道路的像素数目计算IoU。Mask-P为测试集中检测到云遮挡区域内有道路标签的图像数量与真值图中该区域内有道路标签的图像数量之比，该指标能够反映云遮挡条件下网络检测的鲁棒性。此外，实验对网络参数量进行了统计。

图 6展示了各网络在模拟云遮挡下田地、住宅区、沙漠和建筑群几个代表性图像的预测结果。可以看出：1)D-LinkNet_or中的道路提取结果普遍差于UNet、ResUNet、LinkNet、D-LinkNet和L-UNet，说明在原始无云数据集上训练的模型对模拟云遮挡的抗干扰性较差；2)在使用云仿真数据集训练的模型中，L-UNet网络具有更优秀的预测表现，尤其在云遮挡区域，可以依据未遮挡区域的图像信息合理推测云下区域道路的可能位置，使整幅图的预测结果更具有连贯性。

图 6 不同网络在云仿真数据集上的道路检测结果示例

Fig. 6 Examples of road detection results on simulation dataset by different networks

((a) merged images; (b) ground truth labels; (c) D-LinkNet_or; (d) UNet; (e) ResUNet; (f) LinkNet; (g) D-LinkNet; (h) L-UNet(ours))

表 2统计了各网络在仿真图像测试集上的定量评价结果。可以看出：1)L-UNet的召回率、IoU、Mask-IoU和Mask-P指标均优于其他网络，与性能良好的D-LinkNet相比，IoU和Mask-IoU分别提升了1.97%和2%，表明该网络的道路预测性能在整幅遥感图和云遮挡区域内均表现更优秀，鲁棒性也更强；2)与D-LinkNet_or相比，L-UNet的精确度并不是很高。L-UNet能够通过已知可见的道路位置和趋势预测较厚云层下面的道路标签，但厚云下面的道路没有任何像素信息可参考，仅靠L-UNet的类似于“幻想”的推理仍有一定误差存在。

表 2 各网络在仿真测试集上的结果
Table 2 Results on simulation test dataset

下载CSV

/%
网络	精确度	召回率	IoU	Mask-IoU	Mask-P
D-LinkNet_or	90.33	57.51	53.69	33.63	93.37
Resdual UNet	83.83	61.71	54.31	50.15	97.04
UNet	78.69	74.31	61.66	59.1	98.52
ResUNet	83.41	72.31	62.51	60.19	97.86
LinkNet	82.56	74.38	63.80	60.98	97.96
D-LinkNet	80.93	76.43	64.34	61.8	97.96
L-UNet(本文)	80.13	79.21	66.31	63.8	98.98
注：加粗字体表示各列最优结果。

图 7为各网络的参数量和IoU指标的可视化结果。可以看出，1)Resdual UNet的参数量虽仅次于L-UNet，但在云遮挡道路提取的表现不佳，甚至低于UNet。Resdual UNet仅含3阶段的特征提取块，这类较浅的网络刚好能解决无云条件的道路提取问题，但对更复杂的云遮挡情况其非线性拟合能力有限。2)L-UNet的模型参数量最少，但IoU指标依然最优。L-UNet并没有从减少网络层数的角度出发去减少模型大小，而是充分考虑轻量网络的设计元素，使用深度可分离卷积使整个网络的参数量仅为22.28 M，另外引入EX和SE模块，大幅提高了道路提取性能。

图 7 道路提取网络的综合性能对比

Fig. 7 Comparison of comprehensive performance among network modules for road extraction

4.3 真实云遮挡道路提取

上述实验引入模拟云遮挡扩展数据集，增加云遮挡情况下网络模型的鲁棒性。为进一步验证这几种网络模型在真实云遮挡情况下道路提取的性能，对无云图和有云图道路提取进行比较。真实有云遥感图像全部来自Sarukkai等人(2020)整理的用于遥感图像去云的数据集，由Sentinel-2卫星获取的有云遥感图和无云遥感图组成。

真实有云遥感图的道路提取结果如图 8所示，图 8(a)为无云遥感图像，图 8(b)为无云遥感图像的道路检测结果，图 8(c)为对应图 8(a)同一区域含真实云遮挡的遥感图，图 8(d)-(h)为不同网络在真实含云遥感图上的道路检测结果。

图 8 不同网络在真实含云图像的道路检测结果

Fig. 8 Road detection results of images with real clouds by different networks

((a) clear images; (b) results of (a); (c) cloudy images; (d) UNet; (e) ResUNet; (f) LinkNet; (g) D-LinkNet; (h) L-UNet(ours))

一般无云数据集训练的模型(D-LinkNet_or)在云遮挡条件下道路检测结果较差，对于真实含云图，仅检测到图 8(c)A图中的部分道路。从图 8可以看出：1)通过仿真云层合成的数据集训练的网络模型对真实云遮挡遥感图像的道路提取效果均有一定提升，其中L-UNet在4幅有云遥感图均检测出道路，相比其他网络鲁棒性更强；2)在图 8(c)的A、B、C图中，云遮挡的是道路中间部分，而L-UNet检测出的道路标签仍具有更好的连贯性；3)在图 8(c)的D图中几种网络检测结果相差不大，D图中厚云从上方端点处完全遮挡了道路一端，其他地方影响较小，但L-UNet整体的道路预测图依旧是连贯的。

为进一步明确网络模型对不同云遮挡情况的泛化性，对图 8(c)的4幅含云遥感图像对应的云遮挡类型和云覆盖率进行分类统计，结果如表 3所示。

表 3 真实含云遥感图的含云类型及云覆盖率
Table 3 Cloud classes and cloud coverage of real cloudy images

下载CSV

图像	含云类型	云覆盖率/%
图 8(c)A	厚云	20.35
图 8(c)B	厚云	27.67
图 8(c)C	厚云和薄云	100
图 8(c)D	厚云和薄云	100

综合图 8和表 3可知，在真实云遮挡场景中，无论遥感图像仅包含20%以上厚云还是覆盖100%厚薄相间的云，只要图像中含有可参考的部分道路信息，L-UNet的性能都能保持稳定。

同时，以无云遥感图像中的道路检测结果作为参照标签，计算图 8(c)的4幅含云遥感图像的IoU指标，结果如表 4所示。可以看出，D-LinkNet的平均IoU值比D-LinkNet_or高出28.09%，验证了使用云遮挡仿真的有效性；L-UNet与D-LinkNet和UNet相比，IoU值分别提升了19.47%和31.87%，体现了L-UNet网络结构的优越性。

表 4 不同网络在真实含云遥感图上的道路检测IoU值
Table 4 IoU of road prediction results from real cloudy images by different networks

下载CSV

/%
网络	图 8(c)				平均
网络	A	B	C	D	平均
D-LinkNet_or	53.70	0	0	0	13.43
Resdual UNet	25.38	0.99	18.34	19.12	15.96
UNet	50.13	11.10	0	55.27	29.12
ResUNet	46.87	5.57	25.85	58.89	34.30
LinkNet	59.05	0	0	62.13	30.30
D-LinkNet	42.41	29.82	32.94	60.9	41.52
L-UNet(本文)	86.82	41.31	54.19	61.65	60.99
注：加粗字体表示各列最优结果。

5 结论

本文主要研究了遥感图像中云遮挡场景下的道路提取问题。在已有数据集上融合模拟云层，利用扩展数据集提高现有基于深度学习的方法对云遮挡干扰的鲁棒性。通过对比实验发现，使用扩展数据集训练的几种网络模型的表现均高于原数据集上的最优模型。本文提出的L-UNet网络架构不仅大幅降低了网络模型参数量，而且对云遮挡下的道路提取仍有良好性能，道路检测结果具有更好的连贯性。

由于厚云遮挡使得道路信息未知，L-UNet网络依靠“幻想”推测厚云下的道路像素仍然存在误差。在未来工作中，可使用不同的仿真云图与同一幅无云遥感图相组合模拟一组相同地点不同时间点的云遮挡场景，并在深度神经网络中引入时间序列，实现信息互补，从而进一步提升云遮挡道路提取精度。

参考文献

Cheng G L, Wang Y, Xu S B, Wang H Z, Xiang S M, Pan C H. 2017. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 55(6): 3322-3337 [DOI:10.1109/TGRS.2017.2669341]

Cheng Y, Wang D, Zhou P, Zhang T. 2018. Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1): 126-136 [DOI:10.1109/MSP.2017.2765695]

Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia D and Raskar R. 2018. DeepGlobe 2018: a challenge to parse the earth through satellite images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 172-181[DOI: 10.1109/CVPRW.2018.00031]

Enomoto K, Sakurada K, Wang W M, Fukui H, Matsuoka M, Nakamura R and Kawaguchi N. 2017. Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1533-1541[DOI: 10.1109/CVPRW.2017.197]

Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2020-08-11]. https://arxiv.org/pdf/1704.04861.pdf

Hu J, Shen L, Albanie S, Sun G, Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023 [DOI:10.1109/TPAMI.2019.2913372]

Lin C H, Lai K H, Chen Z B, Chen J Y. 2014. Patch-based information reconstruction of cloud-contaminated multitemporal images. IEEE Transactions on Geoscience and Remote Sensing, 52(1): 163-174 [DOI:10.1109/TGRS.2012.2237408]

Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE: 565-571[DOI: 10.1109/3DV.2016.79]

Perlin K. 2002. Improving noise//Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. San Antonio, USA: ACM: 681-682[DOI: 10.1145/566570.566636]

Porter T and Duff T. 1984. Compositing digital images//Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM: 253-259[DOI: 10.1145/800031.808606]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of 2015 International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28]

Sandler M, Howard A, Zhu M, Zhmoginov A and Chen L C. 2018. MobileNetV2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4510-4520[DOI: 10.1109/CVPR.2018.00474]

Sarukkai V, Jain A, Uzkent B and Ermon S. 2020. Cloud removal in satellite images using spatiotemporal generative networks//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 1785-1794[DOI: 10.1109/WACV45572.2020.9093564]

Shi W Z, Zhu C Q, Wang Y. 2001. Road feature extraction from remotely sensed image: review and prospects. Acta Geodaetica et Cartographica Sinica, 30(3): 257-262 (史文中, 朱长青, 王昱. 2001. 从遥感影像提取道路特征的方法综述与展望. 测绘学报, 30(3): 257-262) [DOI:10.3321/j.issn:1001-1595.2001.03.014]

Tan M X and Le Q V. 2019. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2020-08-11]. https://arxiv.org/pdf/1905.11946.pdf

Woo S, Kim D, Cho D and Kweon I S. 2018. LinkNet: relational embedding for scene graph//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 558-568[DOI: 10.5555/3326943.3326995]

Zhang J J, Wan G T, Zhang H Q, Li S S, Feng X X. 2020. Rapid road extraction from quick view imagery of high-resolution satellites with transfer learning. Journal of Image and Graphics, 25(7): 1501-1512 (张军军, 万广通, 张洪群, 李山山, 冯旭祥. 2020. 迁移学习下高分快视数据道路快速提取. 中国图象图形学报, 25(7): 1501-1512) [DOI:10.11834/jig.190441]

Zhang Z X, Liu Q J, Wang Y H. 2018. Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5): 749-753 [DOI:10.1109/LGRS.2018.2802944]

Zhou L C, Zhang C and Wu M. 2018. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 182-186[DOI: 10.1109/CVPRW.2018.00034]

Zoph B and Le Q V. 2016. Neural architecture search with reinforcement learning[EB/OL]. [2020-08-11]. https://arxiv.org/pdf/1611.01578.pdf