融入混合注意力的可变形空洞卷积近岸SAR小舰船检测
Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention
- 2022年27卷第12期 页码:3663-3676
收稿:2021-09-13,
修回:2021-11-26,
录用:2021-12-2,
纸质出版:2022-12-16
DOI: 10.11834/jig.210866
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-09-13,
修回:2021-11-26,
录用:2021-12-2,
纸质出版:2022-12-16
移动端阅览
目的
2
在近岸合成孔径雷达(synthetic aperture radar,SAR)图像舰船检测中,由于陆地建筑及岛屿等复杂背景的影响,小型舰船与周边相似建筑及岛屿容易混淆。现有方法通常使用固定大小的方形卷积核提取图像特征。但是小型舰船在图像中占比较小,且呈长条形倾斜分布。固定大小的方形卷积核引入了过多背景信息,对分类造成干扰。为此,本文针对SAR图像舰船目标提出一种基于可变形空洞卷积的骨干网络。
方法
2
首先用可变形空洞卷积核代替传统卷积核,使提取特征位置更贴合目标形状,强化对舰船目标本身区域和边缘特征的提取能力,减少背景信息提取。然后提出3通道混合注意力机制来加强局部细节信息提取,突出小型舰船与暗礁、岛屿等的差异性,提高模型细分类效果。
结果
2
在SAR图像舰船数据集HRSID(high-resolution SAR images dataset)上的实验结果表明,本文方法应用在Cascade-RCNN(cascade region convolutional neural network)、YOLOv4(you only look once v4)和BorderDet(border detection)3种检测模型上,与原模型相比,对小型舰船的检测精度分别提高了3.5%、2.6%和2.9%,总体精度达到89.9%。在SSDD(SAR ship detection dataset)数据集上的总体精度达到95.9%,优于现有方法。
结论
2
本文通过改进骨干网络,使模型能够改变卷积核形状和大小,集中获取目标信息,抑制背景信息干扰,有效降低了SAR图像近岸复杂背景下小型舰船的误检漏检情况。
Objective
2
Synthetic aperture radar (SAR) image based vessels detection is essential for marine-oriented detection and administration. Traditional constant false alarm rate (CFAR) algorithms have contributed on the targets analyses
such as reliance on hand-made features
slow speed
and susceptibility to interference from ship-like objects like roofs and containers. Convolutional neural network (CNN) based detectors have fundamentally improved detection accuracy. However
there are a large number of vessels detection results are restricted of complicated docking directions and multiple sizes in the high-resolution SAR images
so the recognition rate of the model remains low for some
especially small ships in the complex scenarios near the shore. Using the convolution kernel to extract features
the weights in the convolution kernel are multiplied with the values at the corresponding locations of the feature map. Therefore
the matching degree between the convolution kernel shape and the target shape could determine its efficiency and quality of feature extraction to a certain extent. If the shape of the convolution kernel is more similar to the target shape
the extracted feature map will contain the complete information of the target. Otherwise
the feature map will contain many background features that interfere with model classification and localization. Traditional methods are still challenged that the square convolutional kernel does not fit the shape of a ship with a long strip of random docking direction well. So
we tend to develop a backbone network based on deformable cavity convolution for that.
Method
2
Weighted fusion deformable atrous convolution (WFDAC) can somewhat adaptively change the shape and size of the convolution kernels and weight the features extracted by different convolution kernels in terms of the learned weights. In this way
the network can be made to actively learn any feature kernels are more capable of extracting features that match the target shape
thus the information-related is enhanced for the extraction of target region and suppressing background. The WFDAC module consists of two deformable convolutional kernels with different atrous rates and a 1 × 1 convolutional kernel that computes the fusion weights of the two deformable convolutional kernels in parallel. Furthermore
different perceptual fields are resulted in since the two parallel deformable convolutional kernels have different atrous rates. Therefore
deep feature extraction is challenged that smaller atrous rate-derived deformable convolutional kernel may duplicate the features within the perceptual field of larger atrous rate-context deformable convolutional kernel in shallow feature extraction. That is
features within the same receptive field are extracted and fused by at least two cross-layer deformable convolutional kernels. This can enhance the feature extraction efficiency of the network. In addition
to extract the discrepancy between small targets and near shore reefs and coastal zone buildings
we proposed a three-channel mixed attention (TMA) mechanism as well. It uses three parallel branches to obtain the cross-latitude interactions of model parameters by means of rotation and residual connection
as a method to calculate the weight relationship between model parameters. By multiplying the weights with the original parameter values
the differences between small vessels and shaped buildings and islands can be sharpened
and the weight of similarity features between them in model classification can be reduced
thus improving the model fine classification effect.
Result
2
The ablation and comparative experiments are conducted on SAR image ship datasets: high-resolution SAR images dataset (HRSID) and SAR ship detection dataset (SSDD). The model is first trained using the training set
and then the accuracy of the model is tested using the test set. We use several evaluation metrics to judge the model performance in terms of the internet of union (IoU) and the target pixel size. The experimental results show that our method can improve the detection accuracy of the model for SAR ship targets effectively
especially for small ones. Using our backbone network feature extraction network (FEN) instead of ResNet-50
the results on the HRSID dataset show that the detection accuracy is increased by 3.5%
2.6%
and 2.9%
respectively on the three detection models: cascade region convolutional neural network (Cascade-RCNN)
you only look once v4 (YOLOv4)
and border detection (BorderDet). For small ships
an overall accuracy is reached of 89.9%. In order to verify whether the models improve the detection accuracy of small ships in the nearshore-complicated background
we segment the test set of the HRSID dataset into two scenarios: nearshore and offshore. The test analyses show that the accuracy is improved by 3.5% and 1.2% in the nearshore and offshore scenarios
respectively. Additionally
we designed a set of experiments to validate the effect of the atrous rate on the WFDAC module
which the atrous rate of one branch of two parallel deformable convolutions is fixed to 1
and the atrous rate of the other branches are set to 1
3
and 5 sequentially. The experimental results show that the WFDAC module performs quite well when the atrous rate of one branch is 1 and the atrous rate of the other branch is 3. The overall accuracy on the SSDD dataset reached 95.9%.
Conclusion
2
Our backbone network-improved model can change the shape and size of the convolution kernel to focus on acquiring target information and suppressing background information interference. It reduces the false/loss ratio of small ships detection of SAR images effectively in the complex background of near shore.
Ao W, Xu F, Li Y C and Wang H P. 2018. Detection and discrimination of ship targets in complex background from spaceborne ALOS-2SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(2): 536-550 [DOI: 10.1109/JSTARS.2017.2787573]
Bochkovskiy A, Wang C Y and Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. [2021-04-23] . https://arxiv.org/pdf/2004.10934.pdf https://arxiv.org/pdf/2004.10934.pdf
Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162 [ DOI: 10.1109/CVPR.2018.00644 http://dx.doi.org/10.1109/CVPR.2018.00644 ]
Chen K, Wang J Q, Pang J M, Cao Y H, Xiong Y, Li X X, Sun S Y, Feng W S, Liu Z W, Xu J R, Zhang Z, Cheng D Z, Zhu C C, Cheng T H, Zhao Q J, Li B Y, Lu X, Zhu R, Wu Y, Dai J F, Wang J D, Shi J P, Ouyang W L, Loy C C and Lin D H. 2019. MMDetection: open MMLab detection toolbox and benchmark [EB/OL]. [2021-06-17] . https://arxiv.org/pdf/1906.07155v1.pdf https://arxiv.org/pdf/1906.07155v1.pdf
Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851 [ DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49 ]
Dai H, Du L, Wang Y and Wang Z C. 2016. A modified CFAR algorithm based on object proposals for ship target detection in SAR images. IEEE Geoscience and Remote Sensing Letters, 13(12): 1925-1929 [DOI: 10.1109/LGRS.2016.2618604]
Dai W X, Mao Y Q, Yuan R A, Liu Y J, Pu X M and Li C. 2020. A novel detector based on convolution neural networks for multiscale SAR ship detection in complex background. Sensors, 20(9): #2547 [DOI: 10.3390/s20092547]
Gui Y C, Li X H and Xue L. 2019. A multilayer fusion light-head detector for SAR ship detection. Sensors, 19(5): #1124 [DOI: 10.3390/s19051124]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Heiselberg P and Heiselberg H. 2017. Ship-iceberg discrimination in sentinel-2 multispectral imagery by supervised classification. Remote Sensing, 9(11): #1156 [DOI: 10.3390/rs9111156]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023 [DOI: 10.1109/TPAMI.2019.2913372]
Huang Z J, Huang L C, Gong Y C, Huang C and Wang X G. 2019. Mask scoring R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6402-6411 [ DOI: 10.1109/CVPR.2019.00657 http://dx.doi.org/10.1109/CVPR.2019.00657 ]
Li J W, Qu C W and Shao J Q. 2017. Ship detection in SAR images based on an improved faster R-CNN//Proceedings of 2017 SAR in Big Data Era: Models, Methods and Applications. Beijing, China: IEEE: 1-6 [ DOI: 10.1109/BIGSARDATA.2017.8124934 http://dx.doi.org/10.1109/BIGSARDATA.2017.8124934 ]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [ DOI: 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ]
Lin Z, Ji K F, Leng X G and Kuang G Y. 2019. Squeeze and excitation rank faster R-CNN for ship detection in SAR images. IEEE Geoscience and Remote Sensing Letters, 16(5): 751-755 [DOI: 10.1109/LGRS.2018.2882551]
Liu L, Gao Y S, Wang F and Liu X Z. 2019. Real-time optronic beamformer on receive in phased array radar. IEEE Geoscience and Remote Sensing Letters, 16(3): 387-391 [DOI: 10.1109/LGRS.2018.2875461]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot Multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Misra D, Nalamada T, Arasanipalai A U and Hou Q B. 2021. Rotate to attend: convolutional triplet attention module//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 3138-3147 [ DOI: 10.1109/WACV48630.2021.00318 http://dx.doi.org/10.1109/WACV48630.2021.00318 ]
Qiu H, Ma Y C, Li Z M, Liu S T and Sun J. 2020. BorderDet: border feature for dense object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 549-564 [ DOI: 10.1007/978-3-030-58452-8_32 http://dx.doi.org/10.1007/978-3-030-58452-8_32 ]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031]
Ruan C, Guo H and An J B. 2021. SAR inshore ship detection algorithm in complex background. Journal of Image and Graphics, 26(5): 1058-1066
阮晨, 郭浩, 安居白. 2021. 复杂背景下SAR近岸舰船检测. 中国图象图形学报, 26(5): 1058-1066 [DOI: 10.11834/jig.200266]
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458 [ DOI: 10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]
Wang Y Y, Wang C, Zhang H, Dong Y B and Wei S S. 2019. Automatic ship detection based on RetinaNet using multi-resolution gaofen-3 imagery. Remote Sensing, 11(5): #531 [DOI: 10.3390/rs11050531]
Wei S J, Zeng X F, Qu Q Z, Wang M, Su H and Shi J. 2020. HRSID: a high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access, 8: 120234-120254 [DOI: 10.1109/ACCESS.2020.3005861]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [ DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Zhao J P, Guo W W,Zhang Z H and Yu W X. 2019. A coupled convolutional neural network for small and densely clustered ship detection in SAR images. Science China Information Sciences, 62(4): #42301 [DOI: 10.1007/s11432-017-9405-6]
Zhao Y, Zhao L J, Xiong B L and Kuang G Y. 2020. Attention receptive pyramid network for ship detection in SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 2738-2756 [DOI: 10.1109/JSTARS.2020.2997081]
Zhu X Z, Hu H, Lin S and Dai J F. 2019. Deformable ConvNets V2: more deformable, better results//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9300-9308 [ DOI: 10.1109/CVPR.2019.00953 http://dx.doi.org/10.1109/CVPR.2019.00953 ]
相关作者
相关机构
京公网安备11010802024621