迁移学习下高分快视数据道路快速提取
Rapid road extraction from quick view imagery of high-resolution satellites with transfer learning
- 2020年25卷第7期 页码:1501-1512
收稿:2019-09-09,
修回:2019-12-27,
录用:2020-1-3,
纸质出版:2020-07-16
DOI: 10.11834/jig.190441
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-09-09,
修回:2019-12-27,
录用:2020-1-3,
纸质出版:2020-07-16
移动端阅览
目的
2
传统的道路提取方法自动化程度不高,无法满足快速获取道路信息的需求。使用深度学习的道路提取方法多关注精度的提升,网络冗余度较高。而迁移学习通过将知识从源领域迁移到目标领域,可以快速完成目标学习任务。因此,本文利用高分辨率卫星快视数据快速获取的特性,构建了一种基于迁移学习的道路快速提取深度神经网络。
方法
2
采用基于预训练网络的迁移学习方法,可以将本文整个道路提取过程分为两个阶段:首先在开源大型数据库ImageNet上训练源网络,保存此阶段最优模型;第2阶段迁移预训练保存的模型至目标网络,利用预训练保存的权重参数指导目标网络继续训练,此时快视数据作为输入,只做目标任务的定向微调,从而加速网络训练。总体来说,前期预训练是一个抽取通用特征参数的过程,目标训练是针对道路提取任务特化的过程。
结果
2
本文构建的基于迁移学习的快速道路提取网络,迁移预训练模型与不迁移相比验证精度提升6.0%,单幅尺寸为256×256像素的数据测试时间减少49.4%。快视数据测试集平均精度可达88.3%。截取一轨中7 304×6 980像素位于天津滨海新区的快视数据,可在54 s内完成道路提取。与其他迁移模型对比,本文方法在快速预测道路的同时且能达到较高的准确率。
结论
2
实验结果表明,本文针对高分卫星快视数据,提出的利用预训练模型初始化网络能有效利用权重参数,使模型趋于轻量化,使得精度提升的同时也加快了提取速度,能够实现道路信息快速精准获取。
Objective
2
Quick view data generated by high-resolution satellites provide real-time reception and full resolution for quick view imaging. Such imaging offers a timely source of data for practical applications
such as fire detection
moving window display
disaster observation
and military information acquisition. Road extraction from remote sensing images has been a popular research topic in the field of remote sensing image analysis. Traditional object-oriented methods are not highly automated
and road features require prior knowledge for manual selection and design. These conditions lead to problems in real-time road information acquisition. The popular deep learning road extraction method mainly focuses on the improvement of precision and lacks research on the timeliness of road information extraction. Transfer learning can rapidly complete the task in the target area through weight sharing among different fields and make the model algorithm highly personalized. A transfer learning deep network for rapidly extracting roads is constructed to utilize quick view data from high-resolution satellites.
Method
2
First
we propose a least-square fitting method of devignetting to solve the most serious radiation problem of TDICCD (time delay and integration charge coupled devices) vignetting phenomenon appearing in raw quick view data. The results of the preprocessing of the quick view data serve as our training dataset. Then
we choose LinkNet as the target network after comparing the performance among different real-time semantic segmentation networks
such as ENet
U-Net
LinkNet
and D-LinkNet. LinkNet is efficient in computation memory
can learn from a relatively small training set
and allows residual unit ease training of deep networks. The rich bypass links each encoder with decoder. Thus
the networks can be designed with few parameters. The encoder starts with a kernel of size 7×7. In the next encoder block
its contracting path to capture context uses 3×3 full convolution. We use batch normalization in each convolutional layer
followed by ReLU nonlinearity. Reflection padding is used to extrapolate the missing context in the training data for predicting the pixels in the border region of the input image. The input of each encoder layer of LinkNet is bypassed to the output of its corresponding decoder. Lost spatial information about the max pooling can then be recovered by the decoder and its upsampling operations. Finally
we modify LinkNet to keep it consistent with ResNet34 network layer features
the so-called fine tuning
for accelerating LinkNet network training process. Fine tuning is a useful efficient method of transfer learning. The use of ResNet34 weight parameter pretrained on ImageNet initializing LinkNet34 can accelerate the network convergence and lead to improved performance with almost no additional cost.
Result
2
In the process of devignetting quick view data
the least-square linear fitting method proposed in this study can efficiently remove the vignetting strip of the original image
which meets practical applications. In our road extraction experiment
LinkNet34 using the pretrained ResNet34 as encoder has a 6% improvement in Dice accuracy compared with that when using ResNet34 not pretrained on the valid dataset. The time consumption of a single test feature map is reduced by 39 ms
and the test Dice accuracy can reach 88.3%. Pretrained networks substantially reduce training time that also helps prevent overfitting. Consequently
we achieve over 88 % test accuracy and 40 ms test time on the quick view dataset. With an input feature map size of 3×256×256 pixels
the data of Tianjin Binhai with a size of 7 304×6 980 pixels take 54 s. The original LinkNet using ResNet18 as its encoder only has a Dice coefficient of 85.7%. We evaluate ResNet50 and ResNet101 as pretrained encoders. The Dice accuracy of the former is not improved
whereas the latter takes too much test time. We compare the performance of LinkNet34 with those of three other popular deep transfer models for classification
namely
U-Net; two modifications of TernausNet and AlubNet using VGG11 (visual geometry group) and ResNet34 as encoders separately; and a modification of D-LinkNet. The two U-Net modifications are likely to incorrectly recognize roads as background or recognize something nonroad
such as tree
as road. D-LinkNet has higher Dice than LinkNet34 on the validation set
but the testing time takes 59 ms more than that of LinkNet34. LinkNet34 avoids the weaknesses of TernuasNet and AlubNet and makes better predictions than them. The small nonroad gap between two roads can also be avoided. Many methods mix the two roads into one. The method proposed in this study generally achieves good connectivity
accurate edge
and clear outline in the case of complete extraction of the entire road and fine location. It is especially suitable for rural linear roads and the extraction of area roads in towns. However
the extraction effect for complex road networks in urban areas is incomplete.
Conclusion
2
In this study
we build a deep transfer learning neural network
LinkNet34
which uses a pretrained network
ResNet34
as an encoder. ResNet34 allows LinkNet34 to learn without any significant increase in the number of parameters
solves the problem that the bottom layer features randomly initialized with weights of neural networks are inadequately rich
and accelerates network convergence. Our approach demonstrates the improvement in LinkNet34 by the use of the pretrained encoder and the better performance of LinkNet34 than other real-time segmentation architecture. The experimental results show that LinkNet34 can handle road properties
such as narrowness
connectivity
complexity
and long span
to some extent. This architecture proves useful for binary classification with limited data and realizes fast and accurate acquisition of road information. Future research should consider increasing the quick view database. The pretrained network LinkNet34 trains on the expanded quick view database and then transfers. The "semantic gap" between the source and target networks is reduced
and the data distribute similarly. These features are conducive to model initialization.
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet:a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481-2495[DOI:10.1109/TPAMI.2016.2644615]
Chaurasia A and Culurciello E. 2017. LinkNet: exploiting encoder representations for efficient semantic segmentation//Proceedings of 2017 IEEE Visual Communications and Image Processing. St. Petersburg, FL, USA: IEEE: 1-4[ DOI: 10.1109/VCIP.2017.8305148 http://dx.doi.org/10.1109/VCIP.2017.8305148 ]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE: 248-255[ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]
Gan J Y, Qi L, Qin C B and He G H. 2019. Lightweight fingerprint classification model combined with transfer learning. Journal of Image and Graphics, 24(7):1086-1095
甘俊英, 戚玲, 秦传波, 何国辉. 2019.结合迁移学习的轻量级指纹分类模型.中国图象图形学报, 24(7):1086-1095)[DOI:10.11834/jig.180499]
Gu J X, Yang R Z, Shi L and Wei H W. 2014. HJ-1C real-time image processing technology based on GPU. Journal of University of Chinese Academy of Sciences, 31(5):708-713
顾久祥, 杨仁忠, 石璐, 韦宏卫. 2014.基于GPU的HJ-1C实时成像处理技术.中国科学院大学学报, 31(5):708-713)[DOI:10.7523/j.issn.2095-6134.2014.05.018]
He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016b. Identity mappings in deep residual networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 630-645[ DOI: 10.1007/978-3-319-46493-0_38 http://dx.doi.org/10.1007/978-3-319-46493-0_38 ]
Iglovikov V and Shvets A. 2018. TernausNet: u-net with VGG11 encoder pre-trained on imagenet for image segmentation[EB/OL]. 2018-01-17[2019-07-09] . https://arxiv.org/pdf/1801.05746.pdf https://arxiv.org/pdf/1801.05746.pdf
Mnih V. 2013. Machine learning for aerial image labeling[EB/OL]. 2013-08-09[2019-07-09] . https://www.cs.utoronto.ca/~vmnih/docs/Mnih_Volodymyr_PhD_Thesis.pdf https://www.cs.utoronto.ca/~vmnih/docs/Mnih_Volodymyr_PhD_Thesis.pdf
Paszke A, Chaurasia A, Kim S and Culurciello E. 2016. ENet: a deep neural network architecture for real-time semantic segmentation[EB/OL]. 2016-06-07[2019-07-09] . https://arxiv.org/pdf/1606.02147.pdf https://arxiv.org/pdf/1606.02147.pdf
Ronneberger O, Fischer P andBrox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640-651[DOI:10.1109/TPAMI.2016.2572683]
Shi W Z, Zhu C Q and Wang Y. 2001. Road feature extraction from remotely sensed image:review and prospects. Acta Geodaetica et Cartographica Sinica, 30(3):257-262
史文中, 朱长青, 王昱. 2001.从遥感影像提取道路特征的方法综述与展望.测绘学报, 30(3):257-262)[DOI:10.3321/j.issn:1001-1595.2001.03.014]
Shvets A A, Rakhlin A, Kalinin A A and Iglovikov V I. 2018. Automatic instrument segmentation in robot-assisted surgery using deep learning//Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Orlando, FL, USA: IEEE: 624-628[ DOI: 10.1109/ICMLA.2018.00100 http://dx.doi.org/10.1109/ICMLA.2018.00100 ]
Su T. 2014. Research on the Registration and Mosaic Technology of TDICCD Stitching Images Based on Reflectors. Changchun: Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Science
苏婷. 2014.基于反射镜拼接的TDICCD图像配准与拼接技术研究.长春: 中国科学院研究生院(长春光学精密机械与物理研究所)
Tan C Q, Sun F C, Kong T, Zhang W C, Yang C and Liu C F. 2018. A survey on deep transfer learning//Proceedings of the 27th International Conference on Artificial Neural Networks. Rhodes, Greece: Springer: 270-279[ DOI: 10.1007/978-3-030-01424-7_27 http://dx.doi.org/10.1007/978-3-030-01424-7_27 ]
Yosinski J, Clune J, Bengio Y and Lipson H. 2014. How transferable are features in deep neural networks?[EB/OL]. 2014-09-06[2019-07-09] . https://arxiv.org/pdf/1411.1792.pdf https://arxiv.org/pdf/1411.1792.pdf
Zhang Z X, Liu Q J and Wang Y H. 2018. Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters, 15(5):749-753[DOI:10.1109/LGRS.2018.2802944]
Zhou L C, Zhang C and Wu M. 2018. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT: IEEE: 192-196[ 10.1109/CVPRW.2018.00034 DOI: http://dx.doi.org/10.1109/CVPRW.2018.00034 ]
相关作者
相关机构
京公网安备11010802024621