许苗,李元祥,钟娟娟,左宗成,熊伟(上海交通大学航空航天学院, 上海 200240;中航工业集团雷华电子技术研究所, 无锡 214063)
目的 道路提取是常见的遥感应用之一。现有的基于深度卷积网络的道路提取方法往往未考虑云遮挡给道路提取带来的影响，且提取网络模型较大，不利于在移动端部署，同时缺乏用于云遮挡场景下的道路提取数据集。对此，本文提出一种轻量化的UNet网络（lightweight UNet，L-UNet），高效地实现云遮挡下的道路提取。方法 通过柏林噪声模拟云层以扩展现有道路提取数据集，进而训练L-UNet。使用移动翻转瓶颈卷积模块作为特征提取的主要结构，在深度可分离卷积的基础上加入扩展卷积和压缩激励模块，在减少参数量的同时大幅提升了分割效果。结果 在DeepGlobe道路提取扩展数据集的测试中，与D-LinkNet相比，L-UNet的交并比（intersection over union，IoU）提升了1.97%，而参数量仅为D-LinkNet的1/5。在真实云遮挡遥感图像道路提取测试中，L-UNet的性能仍然最优，与D-LinkNet和UNet相比，IoU值分别提高19.47%和31.87%。结论 L-UNet网络具有一定的云遮挡区域下道路标签生成能力，虽然在模拟云遮挡数据集下训练得到，但对于真实云遮挡仍具有较强的鲁棒性。L-UNet模型参数量很小，易于嵌入移动端。
L-UNet: lightweight network for road extraction in cloud occlusion scene
Xu Miao,Li Yuanxiang,Zhong Juanjuan,Zuo Zongcheng,Xiong Wei(School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China;Leihua Electronic Technology Research Institute, the Aviation Industry Corporation of China, Wuxi 214063, China)
Objective Road extraction is one of the primary tasks in the field of remote sensing. It has been applied in many areas, such as urban planning, route optimization, and navigation. Especially in the event of disasters such as mudslides, floods, and earthquakes, road information will suddenly change. Embedding road extraction models on the mobile terminal has an essential application value for rapid rescue. In recent years, deep learning has provided new ideas for realizing road pixel-level extraction, such as the classic image segmentation network UNet and the improved road extraction networks based on UNet, such as Residual UNet, LinkNet, and D-LinkNet. They can achieve road extraction better than traditional methods based on primary image feature extraction. However, these methods that rely on deep convolutional networks still have two problems:1) cloud occlusion seriously affects the retrieval of information about ground objects in remote sensing images. At present, these convolutional network models are all trained on clear remote sensing images and do not consider the effect of cloud occlusion on road extraction. Their road extraction performance on cloudy remote sensing images is substantially reduced. 2) Network lightweight design has been an engaging area of research for several years. None of the above models based on deep learning considers the lightweight design of deep convolutional networks, which adds considerable difficulty to the deployment of these models. To address these road extraction problems, a lightweight UNet (L-UNet) is proposed, and road extraction is implemented in an end-to-end manner in the cloud occlusion scene. Method 1) To address the problem of cloud occlusion, the Perlin noise is used to simulate a cloud layer image, and then the artificial cloud layer image and an RGB remote sensing image merge through the alpha coefficient to simulate the cloud occlusion scene. This simulation method is used to extend the cloudless road extraction dataset. Specifically, 20 000 artificial cloud layer images have been generated before the network training. During training, cloud layer images are randomly sampled with replacement. The selected cloud layer image is merged with the clear remote sensing image in the training dataset, thereby simulating the continually changing cloud occlusion scenes. 2) In terms of network lightweight, UNet, a fully convolutional neural network, is improved to obtain L-UNet. The main improvement is the use of mobile inverted bottleneck convolutional blocks (MBConv) in the encoder. The MBConv first uses depthwise separable convolution, which considerably reduces the number of network params. However, the performance of road extraction only using depthwise separable convolution is not ideal; thus, expand convolution is added. Expand convolution with several 1×1 convolution kernels can increase the number of feature channels for each layer in the encoder part. Therefore, each layer of the network can learn more abundant features. The MBConv also uses a squeeze-and-excitation block. The block consists of two parts:global pooling for squeeze and 1×1 convolution with swish function for excitation. The squeeze-and-excitation block rationalizes the relative weights between the output feature maps of each layer. It highlights the feature information related to roads and clouds, which is beneficial to the segmentation tasks. Moreover, the swish is selected as the activation function rather than the rectified linear unit (ReLU). The L-UNet model reduces the param of the original UNet model and achieves better results. 3) The training loss function is the sum of the binary cross-entropy loss and the dice coefficient loss. The optimizer for network training is Adam, which has an initial learning rate of 2E-4. The encoder parameters of L-UNet are initialized to "ImageNet" pretrained model parameters. Then, the training is finetuned. The PyTorch deep learning framework is selected to implement L-UNet model construction and experiment. L-UNet performs 233 epochs of training on two NVIDIA GTX 1080 TI GPUs and finally converges. Result Network training and comparison experiments are carried out on the DeepGlobe road extraction extended dataset. 1) In the trial of comparing the network structure, the baseline network is UNet# that only contains depthwise separable convolution. When the expand convolution and squeeze-and-excitation block are added separately, the corresponding intersection over union (IoU) values increase by 1.12% and 8.45%, respectively. Adding the expand convolution and squeeze-and-excitation block simultaneously increases the IoU index by 16.24%. 2) L-UNet is compared with other networks on the extended test dataset. The IoU index of L-UNet rises by 4.65% compared with UNet. The IoU index increases by 1.97% compared with D-LinkNet, which is the second most powerful. The L-UNet param is only 22.28 M, which is 1/7 of UNet and 1/5 of D-LinkNet. The Mask-IoU and Mask-P indices, which are used to measure the network's road prediction performance in the cloud occlusion area, are also higher than those of other networks. 3) For road extraction tests on several real cloudy remote sensing images from Sentinel-2 satellite, the performance of L-UNet remains the best. The average IoU of the detection results is higher than D-LinkNet's 19.47% and UNet's 31.87%. Conclusion This paper studies the problem of road extraction from remote sensing images in cloud occlusion scenes. Simulated cloud layers are added on existing datasets, and extended datasets are used to improve the robustness of existing deep learning-based methods against cloud occlusion interference. The proposed L-UNet network architecture dramatically reduces the param and has excellent performance for road extraction given cloud cover. It can even predict road labels under thick clouds through known visible road edges and trends; thus, its road detection results have a better consistency. Other tasks for extracting remotely sensed ground objects with cloud cover can also use our method in future work.