融入混合注意力的可变形空洞卷积近岸SAR小舰船检测

龚声蓉; 徐少杰; 周立凡; 朱杰; 钟珊

doi:10.11834/jig.210866

遥感图像处理 | 浏览量 : 0 下载量: 211 CSCD: 7

PDF
导出
分享
收藏
专辑

融入混合注意力的可变形空洞卷积近岸SAR小舰船检测
Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention
2022年27卷第12期页码：3663-3676
收稿：2021-09-13，

修回：2021-11-26，

录用：2021-12-2，

纸质出版：2022-12-16
DOI： 10.11834/jig.210866
稿件说明：

移动端阅览

龚声蓉, 徐少杰, 周立凡, 朱杰, 钟珊. 融入混合注意力的可变形空洞卷积近岸SAR小舰船检测[J]. 中国图象图形学报, 2022,27(12):3663-3676. DOI： 10.11834/jig.210866.

Shengrong Gong, Shaojie Xu, Lifan Zhou, Jie Zhu, Shan Zhong. Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention[J]. Journal of Image and Graphics, 2022, 27(12): 3663-3676. DOI： 10.11834/jig.210866.

摘要

目的

在近岸合成孔径雷达(synthetic aperture radar，SAR)图像舰船检测中，由于陆地建筑及岛屿等复杂背景的影响，小型舰船与周边相似建筑及岛屿容易混淆。现有方法通常使用固定大小的方形卷积核提取图像特征。但是小型舰船在图像中占比较小，且呈长条形倾斜分布。固定大小的方形卷积核引入了过多背景信息，对分类造成干扰。为此，本文针对SAR图像舰船目标提出一种基于可变形空洞卷积的骨干网络。

方法

首先用可变形空洞卷积核代替传统卷积核，使提取特征位置更贴合目标形状，强化对舰船目标本身区域和边缘特征的提取能力，减少背景信息提取。然后提出3通道混合注意力机制来加强局部细节信息提取，突出小型舰船与暗礁、岛屿等的差异性，提高模型细分类效果。

结果

在SAR图像舰船数据集HRSID(high-resolution SAR images dataset)上的实验结果表明，本文方法应用在Cascade-RCNN(cascade region convolutional neural network)、YOLOv4(you only look once v4)和BorderDet(border detection)3种检测模型上，与原模型相比，对小型舰船的检测精度分别提高了3.5%、2.6%和2.9%，总体精度达到89.9%。在SSDD(SAR ship detection dataset)数据集上的总体精度达到95.9%，优于现有方法。

结论

本文通过改进骨干网络，使模型能够改变卷积核形状和大小，集中获取目标信息，抑制背景信息干扰，有效降低了SAR图像近岸复杂背景下小型舰船的误检漏检情况。

Abstract

Objective

Synthetic aperture radar (SAR) image based vessels detection is essential for marine-oriented detection and administration. Traditional constant false alarm rate (CFAR) algorithms have contributed on the targets analyses

such as reliance on hand-made features

slow speed

and susceptibility to interference from ship-like objects like roofs and containers. Convolutional neural network (CNN) based detectors have fundamentally improved detection accuracy. However

there are a large number of vessels detection results are restricted of complicated docking directions and multiple sizes in the high-resolution SAR images

so the recognition rate of the model remains low for some

especially small ships in the complex scenarios near the shore. Using the convolution kernel to extract features

the weights in the convolution kernel are multiplied with the values at the corresponding locations of the feature map. Therefore

the matching degree between the convolution kernel shape and the target shape could determine its efficiency and quality of feature extraction to a certain extent. If the shape of the convolution kernel is more similar to the target shape

the extracted feature map will contain the complete information of the target. Otherwise

the feature map will contain many background features that interfere with model classification and localization. Traditional methods are still challenged that the square convolutional kernel does not fit the shape of a ship with a long strip of random docking direction well. So

we tend to develop a backbone network based on deformable cavity convolution for that.

Method

Weighted fusion deformable atrous convolution (WFDAC) can somewhat adaptively change the shape and size of the convolution kernels and weight the features extracted by different convolution kernels in terms of the learned weights. In this way

the network can be made to actively learn any feature kernels are more capable of extracting features that match the target shape

thus the information-related is enhanced for the extraction of target region and suppressing background. The WFDAC module consists of two deformable convolutional kernels with different atrous rates and a 1 × 1 convolutional kernel that computes the fusion weights of the two deformable convolutional kernels in parallel. Furthermore

different perceptual fields are resulted in since the two parallel deformable convolutional kernels have different atrous rates. Therefore

deep feature extraction is challenged that smaller atrous rate-derived deformable convolutional kernel may duplicate the features within the perceptual field of larger atrous rate-context deformable convolutional kernel in shallow feature extraction. That is

features within the same receptive field are extracted and fused by at least two cross-layer deformable convolutional kernels. This can enhance the feature extraction efficiency of the network. In addition

to extract the discrepancy between small targets and near shore reefs and coastal zone buildings

we proposed a three-channel mixed attention (TMA) mechanism as well. It uses three parallel branches to obtain the cross-latitude interactions of model parameters by means of rotation and residual connection

as a method to calculate the weight relationship between model parameters. By multiplying the weights with the original parameter values

the differences between small vessels and shaped buildings and islands can be sharpened

and the weight of similarity features between them in model classification can be reduced

thus improving the model fine classification effect.

Result

The ablation and comparative experiments are conducted on SAR image ship datasets: high-resolution SAR images dataset (HRSID) and SAR ship detection dataset (SSDD). The model is first trained using the training set

and then the accuracy of the model is tested using the test set. We use several evaluation metrics to judge the model performance in terms of the internet of union (IoU) and the target pixel size. The experimental results show that our method can improve the detection accuracy of the model for SAR ship targets effectively

especially for small ones. Using our backbone network feature extraction network (FEN) instead of ResNet-50

the results on the HRSID dataset show that the detection accuracy is increased by 3.5%

2.6%

and 2.9%

respectively on the three detection models: cascade region convolutional neural network (Cascade-RCNN)

you only look once v4 (YOLOv4)

and border detection (BorderDet). For small ships

an overall accuracy is reached of 89.9%. In order to verify whether the models improve the detection accuracy of small ships in the nearshore-complicated background

we segment the test set of the HRSID dataset into two scenarios: nearshore and offshore. The test analyses show that the accuracy is improved by 3.5% and 1.2% in the nearshore and offshore scenarios

respectively. Additionally

we designed a set of experiments to validate the effect of the atrous rate on the WFDAC module

which the atrous rate of one branch of two parallel deformable convolutions is fixed to 1

and the atrous rate of the other branches are set to 1

and 5 sequentially. The experimental results show that the WFDAC module performs quite well when the atrous rate of one branch is 1 and the atrous rate of the other branch is 3. The overall accuracy on the SSDD dataset reached 95.9%.

Conclusion

Our backbone network-improved model can change the shape and size of the convolution kernel to focus on acquiring target information and suppressing background information interference. It reduces the false/loss ratio of small ships detection of SAR images effectively in the complex background of near shore.

关键词

Keywords

references

Ao W, Xu F, Li Y C and Wang H P. 2018. Detection and discrimination of ship targets in complex background from spaceborne ALOS-2SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(2): 536-550 [DOI: 10.1109/JSTARS.2017.2787573]

Bochkovskiy A, Wang C Y and Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. [2021-04-23] . https://arxiv.org/pdf/2004.10934.pdf https://arxiv.org/pdf/2004.10934.pdf

Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162 [ DOI: 10.1109/CVPR.2018.00644 http://dx.doi.org/10.1109/CVPR.2018.00644 ]

Chen K, Wang J Q, Pang J M, Cao Y H, Xiong Y, Li X X, Sun S Y, Feng W S, Liu Z W, Xu J R, Zhang Z, Cheng D Z, Zhu C C, Cheng T H, Zhao Q J, Li B Y, Lu X, Zhu R, Wu Y, Dai J F, Wang J D, Shi J P, Ouyang W L, Loy C C and Lin D H. 2019. MMDetection: open MMLab detection toolbox and benchmark [EB/OL]. [2021-06-17] . https://arxiv.org/pdf/1906.07155v1.pdf https://arxiv.org/pdf/1906.07155v1.pdf

Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851 [ DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49 ]

Dai H, Du L, Wang Y and Wang Z C. 2016. A modified CFAR algorithm based on object proposals for ship target detection in SAR images. IEEE Geoscience and Remote Sensing Letters, 13(12): 1925-1929 [DOI: 10.1109/LGRS.2016.2618604]

Dai W X, Mao Y Q, Yuan R A, Liu Y J, Pu X M and Li C. 2020. A novel detector based on convolution neural networks for multiscale SAR ship detection in complex background. Sensors, 20(9): #2547 [DOI: 10.3390/s20092547]

Gui Y C, Li X H and Xue L. 2019. A multilayer fusion light-head detector for SAR ship detection. Sensors, 19(5): #1124 [DOI: 10.3390/s19051124]

He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Heiselberg P and Heiselberg H. 2017. Ship-iceberg discrimination in sentinel-2 multispectral imagery by supervised classification. Remote Sensing, 9(11): #1156 [DOI: 10.3390/rs9111156]

Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023 [DOI: 10.1109/TPAMI.2019.2913372]

Huang Z J, Huang L C, Gong Y C, Huang C and Wang X G. 2019. Mask scoring R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6402-6411 [ DOI: 10.1109/CVPR.2019.00657 http://dx.doi.org/10.1109/CVPR.2019.00657 ]

Li J W, Qu C W and Shao J Q. 2017. Ship detection in SAR images based on an improved faster R-CNN//Proceedings of 2017 SAR in Big Data Era: Models, Methods and Applications. Beijing, China: IEEE: 1-6 [ DOI: 10.1109/BIGSARDATA.2017.8124934 http://dx.doi.org/10.1109/BIGSARDATA.2017.8124934 ]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [ DOI: 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ]

Lin Z, Ji K F, Leng X G and Kuang G Y. 2019. Squeeze and excitation rank faster R-CNN for ship detection in SAR images. IEEE Geoscience and Remote Sensing Letters, 16(5): 751-755 [DOI: 10.1109/LGRS.2018.2882551]

Liu L, Gao Y S, Wang F and Liu X Z. 2019. Real-time optronic beamformer on receive in phased array radar. IEEE Geoscience and Remote Sensing Letters, 16(3): 387-391 [DOI: 10.1109/LGRS.2018.2875461]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot Multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]

Misra D, Nalamada T, Arasanipalai A U and Hou Q B. 2021. Rotate to attend: convolutional triplet attention module//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 3138-3147 [ DOI: 10.1109/WACV48630.2021.00318 http://dx.doi.org/10.1109/WACV48630.2021.00318 ]

Qiu H, Ma Y C, Li Z M, Liu S T and Sun J. 2020. BorderDet: border feature for dense object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 549-564 [ DOI: 10.1007/978-3-030-58452-8_32 http://dx.doi.org/10.1007/978-3-030-58452-8_32 ]

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031]

Ruan C, Guo H and An J B. 2021. SAR inshore ship detection algorithm in complex background. Journal of Image and Graphics, 26(5): 1058-1066

阮晨, 郭浩, 安居白. 2021. 复杂背景下SAR近岸舰船检测. 中国图象图形学报, 26(5): 1058-1066 [DOI: 10.11834/jig.200266]

Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458 [ DOI: 10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]

Wang Y Y, Wang C, Zhang H, Dong Y B and Wei S S. 2019. Automatic ship detection based on RetinaNet using multi-resolution gaofen-3 imagery. Remote Sensing, 11(5): #531 [DOI: 10.3390/rs11050531]

Wei S J, Zeng X F, Qu Q Z, Wang M, Su H and Shi J. 2020. HRSID: a high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access, 8: 120234-120254 [DOI: 10.1109/ACCESS.2020.3005861]

Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [ DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]

Zhao J P, Guo W W,Zhang Z H and Yu W X. 2019. A coupled convolutional neural network for small and densely clustered ship detection in SAR images. Science China Information Sciences, 62(4): #42301 [DOI: 10.1007/s11432-017-9405-6]

Zhao Y, Zhao L J, Xiong B L and Kuang G Y. 2020. Attention receptive pyramid network for ship detection in SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 2738-2756 [DOI: 10.1109/JSTARS.2020.2997081]

Zhu X Z, Hu H, Lin S and Dai J F. 2019. Deformable ConvNets V2: more deformable, better results//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9300-9308 [ DOI: 10.1109/CVPR.2019.00953 http://dx.doi.org/10.1109/CVPR.2019.00953 ]