集成注意力机制和扩张卷积的道路提取模型
Road extraction model derived from integrated attention mechanism and dilated convolution
- 2022年27卷第10期 页码:3102-3115
收稿:2021-04-02,
修回:2021-7-8,
录用:2021-7-15,
纸质出版:2022-10-16
DOI: 10.11834/jig.210226
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-04-02,
修回:2021-7-8,
录用:2021-7-15,
纸质出版:2022-10-16
移动端阅览
目的
2
为解决当前遥感影像道路提取方法普遍存在的自动化程度低、提取精度不高和由于样本数量不平衡导致的模型训练不稳定等问题,本文提出一种集成注意力机制和扩张卷积的道路提取模型(attention and dilated convolutional U-Net,A & D-UNet)。
方法
2
A & D-UNet聚合网络模型以经典U-Net网络结构为基础,在编码部分引入残差学习单元(residual learning unit,RLU),降低深度卷积神经网络在训练时的复杂度;应用卷积注意力模块(convolutional block attention module,CBAM)从通道和空间维度两个方面优化分配权重,突出道路特征信息;并使用扩张卷积单元(dilated convolutional unit,DCU)感受更大范围的特征区域,整合道路的上下文信息。采用二进制交叉熵(binary cross entropy,BCE)和Dice相结合的复合损失函数训练模型,减轻遥感影像中样本数量不平衡导致的模型不稳定。
结果
2
在公开的美国马萨诸塞州和Deep Globe道路数据集上进行模型验证实验,并与传统的U-Net、LinkNet和D-LinkNet图像分割模型对比分析。在美国马萨诸塞州道路测试集上,本文构建的A & D-UNet模型的总体精度、F1分数和交并比等评价指标分别为95.27%、77.96%和79.89%,均优于对比算法,在测试集中对线性特征明显、标签遗漏标记以及存在树木遮挡的道路区域具有更好的识别效果;在Deep Globe道路测试集上,A & D-UNet模型的总体精度、F1分数和交并比分别为94.01%、77.06%和78.44%,且对线性特征明显的主干道路、标签未标记的狭窄道路以及阴影遮挡的城市道路都具有较好的提取效果。
结论
2
本文提出的A & D-UNet道路提取模型,综合了残差学习、注意力机制和扩张卷积的优点,有效提升了目标分割的性能,是一种提取效果较好、值得推广的聚合网络模型。
Objective
2
Due to sample imbalance in the existing road extraction methods in remote sensing images
we facilitate the deep convolutional neural aggregation network model
integrated attention mechanism and dilation convolutional (A & D-UNet) to optimize the issues of low automation
less extraction accuracy
and unstable model training.
Method
2
To reduce the complexity of deep network model training
the A & D-UNet model uses residual learning unit (RLU) in the encoder part based on the classical U-Net network structure. To highlight road feature information
the convolutional block attention module (CBAM) is applied to assign weights optimally from channel and spatial dimensions both to accept a larger range of receptive filed
the following road features information is obtained by dilated convolutional unit (DCU). The A & D-UNet model takes full advantage of residual learning
dilated convolution
and attention mechanisms to simplify the training of the model
obtain more global information
and improve the utilization of shallow features
respectively. First
RLU
as a component of the backbone feature extraction network
takes advantage of identity mapping to avoid the problem of difficult training and degradation of the model caused by deep and continuous convolutional neural networks. Second
DCU makes full use of the road feature map after the fourth down-sampling of the model and integrates the contextual information of the road features through the consistent dilation convolution with different dilation rates. Finally
CBAM multiplies the attention to road features by the form of weighted assignment along the sequential channel dimension and spatial dimension
which improves the attention to shallow features
reduces the interference of background noise information. The binary cross-entropy (BCE) loss function is used to train the model in image segmentation tasks in common. However
it often makes the model fall into local minima when facing the challenge of the unbalanced number of road samples in remote sensing images. To improve the road segmentation performance of the model
BCE and Dice loss functions are combined to train the A & D-UNet model. To validate the effectiveness of the model
our experiments are conducted on the publicly available Massachusetts road dataset (MRDS) and deep globe road dataset. Due to the large number of blank areas in the MRDS and the constraints of computer computing resources
these remotely sensed images are cropped to a size of 256 × 256 pixels
and contained blank areas are removed. Through the above processing steps
2 230 training images and 161 test images are generated. In order to compare the performance of this model in the roadway extraction task
we carry out synchronized road extraction experiments to visually analyze the results of road extraction via three network models
classical U-Net
LinkNet
and D-LinkNet. In addition
such five evaluation metrics like overall precision (OA)
precision (P)
recall (R)
F1-score (F1)
and intersection over union (IoU) are used for a comprehensive assessment to analyze the extraction effectiveness of the four models quantitatively.
Result
2
The following experimental results are obtained through the comparative result of road extraction maps and quantitative analysis of metrics evaluation: 1) the model proposed in this work has better recognition performance in three cases of obvious road-line characteristics (ORLC)
incomplete road label data (IRLD)
and the road blocked by trees (RBBT). A & D-UNet model extracts road results that are similar to the ground truth of road label images with clear linear relationship of roads. It can learn the relevant features of roads through large training data sets of remote sensing images
avoiding the wrong extraction of roads in the case of IRLD. It can extract road information better by DCU and CBAM in the RBBT case
which improves the accuracy of model classification prediction. 2) The A & D-UNet network model is optimized compared algorithms in the evaluation metrics of OA
F1
and IoU
reaching 95.27%
77.96% and 79.89% in the Massachusetts road testsets
respectively. To alleviate the degradation problem of the model caused by more convolutional layers to a certain extent
the A & D-UNet model uses RLU as the encoder in comparison with the classical U-Net network
and its OA
F1
and IoU are improved by 0.99%
6.40%
and 4.08%
respectively. Meanwhile
the A & D-UNet model improves OA
F1
and IoU on the test set by 1.21%
5.12%
and 3.93% over LinkNet through DCU and CBAM
respectively. 3) The F1 score and IoU of A & D-UNet model are trained and improved by 0.26% and 0.18% each via the compound loss function. This indicates that the loss function combined by BCE and Dice can handle the problem of imbalance between positive and negative samples
thus improving the accuracy of the model prediction classification. Through the above comparative analysis between different models and different loss functions
it is obvious that our A & D-UNet road extraction model has better extraction capability. 4) Judged from testing with the deep globe road dataset
we can obtain the OA
F1 score
and IoU of the A & D-UNet model(each of them is 94.01%
77.06%
and 78.44%)
which shows that the A & D-UNet model has a better extraction effect on main roads with obvious road-line characteristics
narrow road unmarked in label data/overshadowed roads.
Conclusion
2
Our A & D-UNet aggregation network model is demonstrated based on RLU with DCU and CBAM. It uses a combination of BCE and Dice loss functions and MRDS for training and shows better extraction results. The road extraction model is integrated to residual learning
attention mechanism and dilated convolution. This novel aggregation network model is featured with high automation
high extraction accuracy
and good extraction effect. Compared to current classical algorithms
it alleviates problems such as difficulties in model training caused by deep convolutional networks through RLU
uses DCU to integrate detailed information of road features
and enhances the degree of utilization of shallow information using CBAM. Additionally
the integrated BCE and Dice loss function optimize the issue of unbalanced sample of road regions and background regions.
Alshehhi R and Marpu P R. 2017.Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images. ISPRS Journal of Photogrammetry and Remote Sensing, 126: 245-260 [DOI: 10.1016/j.isprsjprs.2017.02.008]
Alshehhi R, Marpu P R, Woon W L and Mura M D. 2017. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 130: 139-149 [DOI: 10.1016/j.isprsjprs.2017.05.002]
Cheng G L, Wang Y, Xu S B, Wang H Z, Xiang S M and Pan C H. 2017. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 55(6): 3322-3337 [DOI: 10.1109/TGRS.2017.2669341]
Dai J G, Wang Y, Du Y, Zhu T T, Xie S Z, Li C C and Fang X X. 2020. Development and prospect of road extraction method for optical remote sensing image. Journal of Remote Sensing, 24(7): 804-823
戴激光, 王杨, 杜阳, 朱婷婷, 谢诗哲, 李程程, 方鑫鑫. 2020. 光学遥感影像道路提取的方法综述. 遥感学报, 24(7): 804-823 [DOI: 10.11834/jrs.20208360]
Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia D and Raskar R. 2018. DeepGlobe 2018: a challenge to parse the earth through satellite images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, USA: IEEE: 17201-17209 [ DOI:10.1109/cvprw.2018.00031 http://dx.doi.org/10.1109/cvprw.2018.00031 ]
Gao L, Song W D, Dai J G and Chen Y. 2019. Road extraction from high-resolution remote sensing imagery using refined deep residual convolutional neural network. Remote Sensing, 11(5): #552 [DOI: 10.3390/rs11050552]
Guo M Q, Liu H, Xu Y Y and Huang Y. 2020. Building extraction based on U-net with an attention block and multiple losses. Remote Sensing, 12(9): #1400 [DOI: 10.3390/rs12091400]
Guo Q and Wang Z P. 2020. A self-supervised learning framework for road centerline extraction from high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 4451-4461 [DOI: 10.1109/JSTARS.2020.3014242]
Han J, Guo Q and Li A. 2017. Road extraction based on unsupervised classification and geometric-texture-spectral features for high-resolution remote sensing images. Journal of Image and Graphics, 22(12): 1788-1797
韩洁, 郭擎, 李安. 2017. 结合非监督分类和几何—纹理—光谱特征的高分影像道路提取. 中国图象图形学报, 22(12): 1788-1797 [DOI: 10.11834/jig.170222]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift [EB/OL]. [2021-03-02] . https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf
Kingma D P and Ba J L. 2017. Adam: a method for stochastic optimization [EB/OL]. [2021-01-30] . https://arxiv.org/pdf/1412.6980.pdf https://arxiv.org/pdf/1412.6980.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386]
Lecun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553): 436-444 [DOI: 10.1038/nature14539]
Lian R B and Huang L Q. 2020. DeepWindow: sliding window based on deep learning for road extraction from remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 1905-1916 [DOI: 10.1109/JSTARS.2020.2983788]
Lin Y N, Xu D Y, Wang N, Shi Z and Chen Q X. 2020. Road extraction from very-high-resolution remote sensing images via a nested SE-Deeplab model. Remote Sensing, 12(18): #2985 [DOI: 10.3390/rs12182985]
Liu H and Wang X L. 2020. Remote sensing image segmentation model based on attention mechanism. Laser and Optoelectronics Progress, 57(4): #041015
刘航, 汪西莉. 2020. 基于注意力机制的遥感图像分割模型. 激光与光电子学进展, 57(4): #041015 [DOI: 10.3788/LOP57.041015]
Máttyus G, Wang S L, Fidler S and Urtasun R. 2015. Enhancing road maps by parsing aerial images around the world//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1689-1697 [ DOI:10.1109/ICCV.2015.197 http://dx.doi.org/10.1109/ICCV.2015.197 ]
Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision. Stanford, USA: IEEE: 565-571 [ DOI:10.1109/3DV.2016.79 http://dx.doi.org/10.1109/3DV.2016.79 ]
Mnih V. 2013. Machine Learning for Aerial Image Labeling. Toronto: University of Toronto
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [ DOI:10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Shao Z F, Zhou Z F, Huang X and Zhang Y. 2021. MRENet: simultaneous extraction of road surface and road centerline in complex urban scenes from very high-resolution images. Remote Sensing, 13(2): #239 [DOI: 10.3390/rs13020239]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651 [DOI: 10.1109/TPAMI.2016.2572683]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-04-10] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Song M J and Civco D. 2004. Road extraction using SVM and image segmentation. Photogrammetric Engineering and Remote Sensing, 70(12): 1365-1371 [DOI: 10.14358/PERS.70.12.1365]
Sujatha C and Selvathi D. 2015. Connected component-based technique for automatic extraction of road centerline in high resolution satellite images. EURASIP Journal on Image and Video Processing, 2015(1): #8 [DOI: 10.1186/s13640-015-0062-9]
Treash K and Amaratunga K. 2000. Automatic road detection in Grayscale aerial images. Journal of Computing in Civil Engineering, 14(1): 60-69 [DOI: 10.1061/(asce)0887-3801(2000)14:1(60)]
Wang J, Song J W, Chen M Q and Yang Z. 2015. Road network extraction: a neural-dynamic framework based on deep learning and a finite state machine. International Journal of Remote Sensing, 36(12): 3144-3169 [DOI: 10.1080/01431161.2015.1054049]
Wei Y N, Wang Z L and Xu M. 2017. Road structure refined CNN for road extraction in aerial image. IEEE Geoscience and Remote Sensing Letters, 14(5): 709-713 [DOI: 10.1109/LGRS.2017.2672734]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision-ECCV 2018. Munich, Germany: Springer: 3-19 [ DOI:10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Yu F and Koltun V. 2016. Multi-scale contextaggregation by dilated convolutions [EB/OL]. [2021-04-30] . https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf
Zhang J X, Lin X G, Liu Z J and Shen J. 2011. Semi-automatic road tracking by template matching and distance transformation in urban areas. International Journal of Remote Sensing, 32(23): 8331-8347 [DOI: 10.1080/01431161.2010.540587]
Zhang X R, Ma W K, Li C, Wu J, Tang X and Jiao L C. 2020. Fully convolutional network-based ensemble method for road extraction from aerial images. IEEE Geoscience and Remote Sensing Letters, 17(10): 1777-1781 [DOI: 10.1109/LGRS.2019.2953523]
Zhang Z X, Liu Q J and Wang Y H. 2018. Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5): 749-753 [DOI: 10.1109/LGRS.2018.2802944]
Zhou L C, Zhang C and Wu M. 2018. D-linknet: linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 192-1924 [ DOI:10.1109/CVPRW.2018.00034 http://dx.doi.org/10.1109/CVPRW.2018.00034 ]
相关作者
相关机构
京公网安备11010802024621