结合双视觉全卷积网络的遥感影像地物提取
Double vision full convolution network for object extraction in remote sensing imagery
- 2020年25卷第3期 页码:535-545
收稿:2019-06-14,
修回:2019-9-16,
录用:2019-9-23,
纸质出版:2020-03-16
DOI: 10.11834/jig.190276
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-06-14,
修回:2019-9-16,
录用:2019-9-23,
纸质出版:2020-03-16
移动端阅览
目的
2
遥感影像地物提取是遥感领域的研究热点。由于背景和地物类型复杂多样,单纯利用传统方法很难对地物类别进行准确区分和判断,因而常常造成误提取和漏提取。目前基于卷积神经网络CNN(convolutional neural network)的方法进行地物提取的效果普遍优于传统方法,但需要大量的时间进行训练,甚至可能出现收敛慢或网络不收敛的情况。为此,基于多视觉信息特征的互补原理,提出了一种双视觉全卷积网络结构。
方法
2
该网络利用VGG(visual geometry group)16和AlexNet分别提取局部和全局视觉特征,并经过融合网络对两种特征进行处理,以充分利用其包含的互补信息。同时,将局部特征提取网络作为主网络,减少计算复杂度,将全局特征提取网络作为辅助网络,提高预测置信度,加快收敛,减少训练时间。
结果
2
选取公开的建筑物数据集和道路数据集进行实验,并与二分类性能优异的U-Net网络和轻量型Mnih网络进行对比。实验结果表明,本文提出的双视觉全卷积网络的平均收敛时间仅为U-Net网络的15.46%;提取精度与U-Net相当,远高于Mnih;在95%的置信水平上,该网络的置信区间明显优于U-Net。
结论
2
本文提出的双视觉全卷积网络,融合了影像中地物的局部细节特征和全局特征,能保持较高的提取精度和置信度,且更易训练和收敛,为后续遥感影像地物提取与神经网络的设计提供了参考方向。
Objective
2
Object extraction is a fundamental task in remote sensing. The accurate extraction of ground objects
such as buildings and roads
is beneficial to change detection
updating geographic databases
land use analysis
and disaster relief. Relevant methods for object extraction
such as for roads or buildings
have been observed over the past years. Some of these methods are based on the geometric features of objects
such as lines and line intersections. The most traditional approaches can obtain satisfactory results in rural areas and suburbs with high identification and positional accuracy
but low accuracy in complex urban areas. With the rise of deep learning and computer vision technology
a growing number of researchers have attempted to solve the related problems through deep learning method
which is proven to greatly improve the precision of object extraction. However
due to memory capacity limitations
most of these deep learning methods are patch-based. This operation cannot fully utilize the contextual information. At the edge region of the patch
the prediction confidence is much lower than that of the central region due to the lack of relevant information. Therefore
additional epochs are needed for feature extraction and training. In addition
objects often appear at extremely different scales in remote sensing images; thus
determining the right size of the vision area or the sliding window is difficult. Using larger patches to predict small labels is also an effective solution. In this manner
the confidence of the predicted label map is greatly increased and the network is easier to train and converge.
Method
2
This study proposes a novel architecture of the network called double-vision full convolution network (DVFCN). This architecture mainly includes three parts:encoder part of local vision (ELV)
encoder part of global vision (EGV)
and fusion decoding part (FD). The ELV is used to extract the detailed features of buildings and EGV is used to give the confidence over a larger vision. The FD is applied to restore the feature maps to the original patch size. Visual geometry group(VGG)16 and AlexNet are applied as the backbone of the encoder network in ELV and EGV
respectively. To combine the information of the two pathways
the feature maps are concatenated and fed into the FD. After the last level of FD
a smooth layer and a sigmoid activation layer are used to improve the feature processing ability and project the multichannel feature maps into the desired segmentation. Finally
skip connections are also applied to the DVFCN structure so that low-level finer details can be compensated to high-level semantic features. Training the model started on an NVIDIA 1080ti GPU with 11 GB onboard memory. The minimization of this loss is solved by an Adam optimizer with mini-batches of size 16
start learning rate of 0.001
and L2 weight decay of 0.000 5. The learning rate drops by 0.5 per 10 epochs.
Result
2
To verify the effectiveness of DVFCN
we conducted the experiments on two public datasets:European building datasets and Massachusetts road datasets. In addition
two variants of the DVFCN were tested
and U-Net and Mnih were also operated for comparison. To comprehensively evaluate the classification performance of the model
we plotted the receiver operating characteristic (ROC) curves and precision-recall curves. The area under the ROC curve (AUC) and F1 score were regarded as evaluation metrics. The experimental results show that DVFCN and U-Net can achieve almost the same superior classification performance. However
the total training time of DVFCN was only 15.4% of that of U-Net. The AUC of U-Net on building datasets and road datasets were 0.965 3 and 0.983 7
which were only 0.002 1 and 0.005 5 higher than DVFCN
respectively. The extraction effect on road and built-up was better than that of Mnih. In addition
the confidence rates of the two networks were also calculated. The experimental results show that the confidence of interval DVFCN is better than that of U-Net under 95% confidence. The importance of ELV and EGV is also studied. Result shows that the ELV is more important than EGV because it can provide more detailed local information. EGV performs poorly by itself because it can only provide global information. However
the global information is important for the convergence of DVFCN.
Conclusion
2
The DVFCN is proposed for object extraction from remote sensing imagery. The proposed network can achieve nearly the same extraction performance as U-Net
but the training time is much reduced and the confidence is higher. In addition
DVFCN provides a new full convolution network architecture that combines the local and global information from different visions. The proposed model can be further improved
and a more effective method of combining local and global context information will be developed in the future. Thus
studying the utilization of global information through a global approach is important.
Alshehhi R, Marpu P R, Woon W L and Mura M D. 2017. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 130:139-149[DOI:10.1016/j.isprsjprs.2017.05.002]
Cheng G and Han J W. 2016. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117:11-28[DOI:10.1016/j.isprsjprs.2016.03.014]
Fan R S, Chen Y, Xu Q H and Wang J X. 2019. A high-resolution remote sensing image building extraction method based on deep learning. Acta Geodaetica et Cartographica Sinica, 48(1):34-41
范荣双, 陈洋, 徐启恒, 王竞雪. 2019.基于深度学习的高分辨率遥感影像建筑物提取方法.测绘学报, 48(1):34-41[DOI:10.11947/j.AGCS.2019.20170638]
Ghosh A, Ehrlich M, Shah S, Davis L and Chellappa R. 2018. Stacked U-nets for ground material segmentation in remote sensing imagery//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake: IEEE: 252-2524[ DOI:10.1109/CVPRW.2018.00047 http://dx.doi.org/10.1109/CVPRW.2018.00047 ]
Glorot X and Bengio Y. 2010. Understanding the difficulty of training deep feedforward neural networks//Proceedings of AISTATS. Sardinia, Italy: [s.n.]: 249-256
Huang Z M, Cheng G L, Wang H Z, Li H C, Shi L M and Pan C H. 2016. Building extraction from multi-source remote sensing images via deep deconvolution neural networks//International Geoscience and Remote Sensing Symposium. Beijing, China: IEEE: 1835-1838[ DOI:10.1109/IGARSS.2016.7729471 http://dx.doi.org/10.1109/IGARSS.2016.7729471 ]
Li Q, Li Y, Wang Y and Zhao Q H. 2017. Building extraction from high resolution remote sensing image by using Gestalt. Journal of Image and Graphics, 22(8):1162-1174
李青, 李玉, 王玉, 赵泉华. 2017.利用格式塔的高分辨率遥感影像建筑物提取.中国图象图形学报, 22(8):1162-1174[DOI:10.11834/jig.160588]
Li R R, Liu W J, Yang L, Sun S H, Hu W, Zhang F and Li W. 2018. DeepUNet:a deep fully convolutional network for pixel-level sea-land segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11):3954-3962[DOI:10.1109/JSTARS.2018.2833382]
Liu X D and Liu Y. 2012. Urban road extraction based on Hough transform and path morphology. Computer Engineering, 38(6):265-268
刘小丹, 刘岩. 2012.基于Hough变换和路径形态学的城区道路提取.计算机工程, 38(6):265-268[DOI:10.3969/j.issn.1000-3428.2012.06.088]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Uision and Pattern Recognition: 3431-3440[ DOI:10.1109/TPAMI.2016.2572683 http://dx.doi.org/10.1109/TPAMI.2016.2572683 ]
Marcu A and Leordeanu M. 2016. Dual local-global contextual pathways for recognition in aerial imagery[EB/OL].[2019-10-06] . https://arxiv.org/pdf/1605.05462.pdf https://arxiv.org/pdf/1605.05462.pdf
Mnih V. 2013. Machine Learning for Aerial Image Labeling. Toronto, Canada:University of Toronto
Mnih V and Hinton G E. 2010. Learning to detect roads in high-resolution aerial images//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer: 210-223[ DOI:10.1007/978-3-642-15567-3_16 http://dx.doi.org/10.1007/978-3-642-15567-3_16 ]
Panboonyuen T, Jitkajornwanich K, Lawawirojwong S, Srestasathiern P and Vateekul P. 2017. Road segmentation of remotely-sensed images using deep convolutional neural networks with landscape metrics and conditional random fields. Remote Sensing, 9(7):680[DOI:10.3390/rs9070680]
Qiao C, Luo J C, Shen Z F, Zhu Z W and Ming D P. 2012. Adaptive thematic object extraction from remote sensing image based on spectral matching. International Journal of Applied Earth Observation and Geoinformation, 19:248-251[DOI:10.1016/j.jag. 2012.05.012]
Rakhlin A, Davydow A and Nikolenko S. 2018. Land cover classification from satellite imagery with U-Net and Lovász-Softmax loss//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake: IEEE: 257-2574[ DOI:10.1109/CVPRW.2018.00048 http://dx.doi.org/10.1109/CVPRW.2018.00048 ]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[ DOI:10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Saito S and Aoki Y. 2015. Building and road detection from large aerial imagery//Proceedings of SPIE 9405, Image Processing: Machine Vision Applications Ⅷ. San Francisco, California, the United States: SPIE: 94050K[ DOI:10.1117/12.2083273 http://dx.doi.org/10.1117/12.2083273 ]
Song M J and Civco D. 2004. Road extraction using SVM and image segmentation. Photogrammetric Engineering and Remote Sensing, 70(12):1365-1371[DOI:10.14358/PERS.70.12.1365]
Wang J, Song J W, Chen M Q and Yang Z. 2015. Road network extraction:aneural-dynamic framework based on deep learning and a finite state machine. International Journal of Remote Sensing, 36(12):3144-3169[DOI:10.1080/01431161.2015.1054049]
Wu G M, Chen Q, Ryosuke S, Guo Z L, Shao X W and Xu Y W. 2018. High precision building detection from aerial imagery using a U-Net like convolutional architecture. Acta Geodaetica et Cartographica Sinica, 47(6):864-872
伍广明, 陈奇, Ryosuke S, 郭直灵, 邵肖伟, 许永伟. 2018.基于U型卷积神经网络的航空影像建筑物检测.测绘学报, 47(6):864-872[DOI:10.11947/j.AGCS.2018.20170651]
You Y F, Wang S Y, Wang B, Ma Y X, Shen M, Liu W H and Xiao L. 2019. Study on hierarchical building extraction from high resolution remote sensing imagery. Journal of Remote Sensing, 23(1):125-136
游永发, 王思远, 王斌, 马元旭, 申明, 刘卫华, 肖琳. 2019.高分辨率遥感影像建筑物分级提取.遥感学报, 23(1):125-136[DOI:10.11834/jrs.20197500]
Yuan J Y. 2018. Learning building extraction in aerial scenes with convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11):2793-2798[DOI:10.1109/TPAMI.2017.2750680]
Zhang Z X, Liu Q J and Wang Y H. 2018. Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5):749-753[DOI:10.1109/LGRS. 2018.2802944]
相关作者
相关机构
京公网安备11010802024621