发布时间: 2021-03-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200111
2021 | Volume 26 | Number 3

遥感图像处理

密集子区域切割的任意方向舰船快速检测

陈华杰, 吴栋, 谷雨

杭州电子科技大学通信信息传输与融合技术国防重点学科实验室, 杭州 310018

收稿日期: 2020-04-02; 修回日期: 2020-06-22; 预印本日期: 2020-06-29

基金项目: 国防基础科研项目(JCKY2018415C004)；国防科技重点实验室基金项目(6142804180407)；省级重点研发项目(2019C0505)

作者简介: 陈华杰, 1978年生, 男, 教授, 主要研究方向为机器学习、图像处理。E-mail: chj247@hdu.edu.cn;
吴栋, 男, 硕士研究生, 主要研究方向为遥感图像目标检测。E-mail: 55530713@qq.com;
谷雨, 男, 副教授, 主要研究方向为遥感图像处理、计算机视觉、深度学习。E-mail: guyu@hdu.edu.cn

中图法分类号: TP753

文献标识码: A

文章编号: 1006-8961(2021)03-0654-09

摘要

目的遥感图像上任意方向舰船目标的检测，是给出舰船在图像上的最小外切矩形边界框。基于双阶段深度网络的任意方向舰船检测算法速度较慢；基于单阶段深度网络的任意方向舰船检测算法速度较快，但由于舰船具有较大长宽比的形态特点，导致虚警率较高。为了降低单阶段目标检测的虚警率，进一步提升检测速度，针对舰船目标的形态特点，提出了基于密集子区域切割的快速检测算法。方法沿长轴方向，将舰船整体密集切割为若干个包含在正方形标注框内的局部子区域，确保标注框内最佳的子区域面积有效占比，保证核心检测网络的泛化能力；以子区域为检测目标，训练核心网络，在训练过程对重叠子区域进行整合；基于子图分割将检测得到的子区域进行合并，进而估计方向角度等关键舰船目标参数。其中采用子区域合并后处理替代了非极大值抑制后处理，保证了检测速度。结果在HRSC2016(high resolution ship collections)实测数据集上，与最新的改进YOLOv3(you only look once)、RRCNN (rotated region convolutional neural network)、RRPN (rotation region proposal networks)、R-DFPN-3(rotation dense feature pyramid network)和R-DFPN-4等5种算法进行了比较，相较于检测精度最高的R-DFPN-4对照算法，本文算法的mAP (mean average precision)(IOU (inter section over union)=0.5)值提高了1.9%，平均耗时降低了57.9%；相较于检测速度最快的改进YOLOv3对照算法，本文算法的mAP (IOU=0.5)值提高了3.6%，平均耗时降低了31.4%。结论本文所提出的任意方向舰船检测算法，结合了舰船目标的形态特点，在检测精度与检测速度均优于当前主流任意方向舰船检测算法，检测速度有明显提升。

关键词

任意方向舰船检测; 密集子区域切割; 子图分割; 子区域合并; 快速检测

Fast detection algorithm for ship in arbitrary direction with dense subregion cutting

Chen Huajie, Wu Dong, Gu Yu

Key Laboratory of Fundamental Science for National Defense-Communication Information Transmission and Fusion Technology Laboratory, Hangzhou Dianzi University, Hangzhou 310018, China

Supported by: National Defense Basic Scientific Research Program of China(JCKY2018415C004);National Defense Science and Technology Key Laboratory Foundation of China(6142804180407); Provincial Key Research and Development Projects(2019C0505)

Abstract

Objective Ship detection based on remotely sensed images aims to locate ships, which is of great significance in national water surveillance and territorial security. The rectangular bounding boxes for target location in the typical deep learning method are usually in the horizontal-vertical direction, whereas the distribution of ships on remotely sensed images is arbitrarily oriented or in varying directions. For narrow and long ships with arbitrary directions, the vertical-horizontal bounding box is fairly rough. When the ship deviates from the vertical or horizontal direction, the bounding box is inaccurate, and the bounding box has many nonship pixels. If multiple ships are close to one another on the image, several ships may not be located because they are overlapped by the bounding boxes of the neighboring ships. Therefore, using a finer bounding box in detection is beneficial for detecting ship targets, and more precise ship positioning information is helpful for subsequent ship target recognition. For this reason, the classical deep-learning-based target detection is extended, and a finer minimum circumscribed rectangular bounding box is utilized to locate the ship target. Existing extended detection algorithms can be divided into two categories: one-stage detection and two-stage detection. One-stage detection directly outputs the target's location estimation, whereas two-stage detection classifies the proposed regions to eliminate the false targets. The disadvantage of two-stage detection is its slower speed. One-stage detection is faster, but its false alarm rate is higher for narrow and long ships. A fast detection algorithm based on dense sub-region segmentation is proposed according to the shape characteristics of ship targets to reduce the false alarm rate of one-stage detection and further improve the detection speed. Method The basic idea of our algorithm is to segment a ship into several sub-regions on which detection and combination are carried out, according to the long and narrow shape characteristic of ships. First, the whole ship is intensively segmented along its long axis direction into several local sub-regions contained in square annotation boxes to maximize the proportion of the pixel area belonging to the ship, namely, the effective area ratio in every annotation box.The influence of background noise on a sub-region annotation box could be suppressed, and the reliable generalization ability of the sub-regions detection network is obtained. The multi-resolution structure is applied to the core detection network that contains three output branches from coarse resolution to fine resolution. The density of sub-region segmentation is estimated according to the minimum spatial compression ratio of the output branches to ensure that the sub-regions of the same ship are connected in each output branch. Second, the core sub-region detection network is trained, and several overlapping sub-regions in the coarse branches are reorganized during training. In the output layer with a finer resolution, the spatially adjacent sub-regions may be mapped to the same point in the output grid because the sub-regions are densely distributed. This process is called sub-region overlapping. Each point in the output grid can only correspond to one sub-region target at most; thus, these sub-regions should be reorganized into a new pseudo sub-region. The center point of the pseudo sub-region is the average value of the center points of the original sub-regions, and the size of the pseudo sub-region is consistent with that of the original sub-regions. With different resolutions in the output layer, the center points of the pseudo sub-region are slightly different, but the overall difference is not large. Lastly, the detected sub-regions are merged based on the subgraph segmentation method. The whole remotely sensed image is modeled as a graph, where each detected sub-region is recognized as a single node. The connectivity between every two sub-regions is constructed according to their spatial distance and size difference. The sub-graph segmentation is clustered into sub-regions belonging to the same ship. Based on the spatial distribution of the clustered sub-regions, the key parameters of the corresponding ship such as length, width, and rotation angle are estimated. Compared with conventional deep learning target detection methods, the core detection network structure of the proposed algorithm remains unchanged, and the post processing of sub-region merging replaces common non maximum suppression post processing. Result Our algorithm is compared with five state-of-the-art detection algorithms, namely, improved YOLOv3(you only look once), RRCNN(rotated region convolutional neural network), RRPN(rotation region proposal netwrok), R-DFPN-3(rotation dense feature pyramid network), and R-DFPN-4 on the HRSC2016(high resolution ship collections) dataset. The improved YOLOv3 belongs to one-stage detection, and the four other algorithms belong to two-stage detection. The quantitative evaluation metrics include mean average precision(mAP) and mean consuming time(mCT). Experiment results show that our algorithm outperforms all other algorithms on the HRSC2016 dataset. Compared with the result of R-DFPN-4 with the highest detection accuracy in comparison algorithms, mAP (i.e., higher is better) increases by 1.9%, and mCT(i.e., less is better) decreases by 57.9%. Compared with the result of improved YOLOv3 with the fastest detection speed in comparison algorithms, mAP increases by 3.6%, and mCT decreases by 31.4%. The running speed of our algorithm and the conventional YOLOv3 algorithm are further analyzed and compared. The core detection network applied to our algorithm is the same as that of the conventional YOLOv3 algorithm; thus, the running speed differs only in the post processing phase. The sub-region merging of our algorithm takes about 11 ms, and the nom-maximum-suppression(NMS) of conventional YOLOv3 takes approximately 5 ms on the HRSC2016 dataset. Compared with the conventional YOLOv3 algorithm, our algorithm can obtain finer positioning information for the rotating ships, and running time increases by only 9%. Conclusion A dense sub-region segmentation-based, arbitrarily oriented ship detection algorithm by using the long-and-narrow shape characteristics of the ship target is proposed. The experiment results show that our algorithm outperforms several state-of-the-art arbitrary-oriented ship detection algorithms, especially in detection speed.

Key words

arbitrary direction ship detection; dense sub-region segmentation; sub-graph segmentation; sub-region merging; fast detection

0 引言

遥感图像舰船检测在国家水域监控、领土安全监护等方面具有重要意义。传统的舰船检测方法(Eldhuset, 1996;Wang等，2017；Fingas和Brown，2001)主要通过纹理和形状等特征进行海陆分割，提取感兴趣区域，然后采用Contrast box算法(Yu等，2012)、半监督分类算法(Zhu等，2010)等获取候选对象区域，最后筛除掉虚假舰船目标的候选区域，最终得到真实的检测结果。基于深度学习的舰船检测主要是采用矩形边界框定义目标位置，通过边界框参数的回归对目标进行定位，主流的目标检测主要包括两种类型：以SSD(single shot multibox detector)(Liu等，2016)、YOLO(you only look once)(Redmon等，2016；Redmon和Farhadi，2018)、RetinaNet(Lin等，2017)为代表的单阶段检测和以Fast RCNN(region convolutional neural netwrok)(Girshick, 2015)、Faster RCNN(region convolutional neural network)(Ren等，2015)为代表的双阶段检测。

舰船目标在遥感图像上的分布是任意方向的，若直接采用上述深度学习检测方法，将任意方向的舰船用水平—竖直方向的矩形边界框进行框定，会存在如下问题：某一舰船的矩形框容易将其邻近的其他舰船也框住，影响其他舰船的检测；考虑到后续的目标识别等需求，有可能需要进行额外的方向角度的估计。

为此，针对任意方向舰船检测，研究人员对现有的深度学习目标检测方法进行扩展，给出舰船目标在图像上的最小外切矩形边界框，比如：YOLOv3的扩展方法(吴止锾等，2019)、Faster RCNN的扩展算法(rotatel region convolutional neural network, RRCNN)(Liu等，2017a)、RRPN(rotation region proposal network)算法(Ma等，2018)、R-DFPN(rotation dense feature pyramid network)算法(Yang等，2018)，这些算法的主要思路均是补充标注舰船目标的角度信息，在核心检测网络上添加新的分支，用于估计目标旋转角度。

现有的扩展方法，无论是基于单阶段检测方法还是双阶段检测都存在一定不足。由于舰船目标是狭长形的，船体的长宽比例较大；在舰船目标所处的核心检测网络的感受野内，目标的面积占比较低，其中存在大量的背景噪声，对训练得到的核心检测网络的泛化能力有不利影响。单阶段检测方法速度较快，但检测的虚警率相对难以控制；双阶段检测方法在第1阶段获取感兴趣区域后，在第2阶段对其再进行一次分类筛选以控制虚警率，检测速度较慢。

为了提高任意方向舰船检测的精度—速度综合性能，提出了基于密集子区域切割的遥感图像舰船检测算法。核心思路是将狭长形的舰船目标切分为若干个包含在正方形标注框内的子区域，采用单阶段检测的方法对子区域进行检测；再采用图聚类的方法将子区域拼合成为完整的舰船目标，同时根据子区域的分布，自然估计得到舰船的角度。

检测精度方面，引入子区域密集切割策略，子区域均包含在正方形标注框内，保证了标注框内较高的目标子区域面积占比，进而保证了训练得到的核心检测网络的泛化能力，有效控制虚警率。

检测速度方面，引入子区域拼合后处理，取代了非极大值抑制(non-maximum-suppression, NMS)后处理。从实测数据来看，子区域拼合处理的耗时与常规NMS耗时相当，保证了检测的快速性。

1 密集子区域切割的舰船目标检测

如图 1所示，密集子区域切割的舰船目标检测算法包含了训练与测试两个阶段。在训练阶段，相较于常规的目标检测算法，新算法添加了密集子区域分割的预处理，先将舰船总体目标分割为若干个局部子区域，自动生成子区域的标注信息；再根据局部子区域的标注信息，训练核心检测网络，从检测速度出发，核心检测网络选择YOLOv3网络。经典的YOLOv3网络有3个输出层，对应的压缩比由大到小为32, 16, 8，子区域切割的密集度是根据最小的压缩比来确定的，那么在较高的压缩比的输出层上，存在子区域重叠的情况。因此在训练过程中，还需要进行重叠子区域的整合处理。最后对核心检测网络输出的多个候选框进行NMS处理。

图 1 密集子区域切割的舰船目标检测总体方案

Fig. 1 General scheme of ship target detection based on dense subregion cutting

在测试阶段，该方案采用子区域合并后处理，替代NMS。子区域合并是将检测得到的局部区域目标通过聚类方式，重新整合为总体目标，并估计包括角度在内的总体目标的检测参数。

1.1 密集子区域分割

将一个完整目标分解为若干个局部区域，需要考虑如下2个因素：1)与背景的区分度：目标检测某种意义上将目标(无论是完整目标还是局部区域)与背景进行区分并定位。局部区域蕴含的目标特征信息少于完整目标，其与背景的区分度弱于完整目标与背景的区分度。一般情况下，局部区域越小，区分度越低；极端情况下，局部区域为图像上的一个像素点，则区分度降到最低。从与背景的区分度考虑，局部区域的尺寸应尽量大。2)检测学习的泛化能力：理想情况下局部区域标记框内的背景区域面积占比应该尽量减少，保证核心检测器的泛化能力。

综合考虑上述两个因素，子区域分割的基本思路是：沿舰船长轴方向，用若干个密集的正方形局部标记框覆盖总体目标，如图 2所示。

图 2 密集子区域分割

Fig. 2 Dense subregion cutting

1.1.1 局部标记框的大小

局部标记框应该覆盖任意角度下船的短边(宽)。如图 3所示，旋转角度$\theta $定义为水平方向与舰船长轴方向的夹角，$\theta $的取值范围限定在(-90°，90°]。正方形标记框的边长$ L$取决于舰船目标的宽度与旋转角度，即

$ L = \frac{{{W_s}}}{{\cos \theta }} $

(1)

图 3 旋转角度

Fig. 3 Rotation angle

式中，$ {{W_s}}$是船宽，对于同一艘舰船目标，宽度$ {{W_s}}$是固定的，$ L$取决于$\theta $。$ L$的取值范围是$ \left[ {{W_s}, {W_s} \times \sqrt 2 } \right]$，当$\theta $=±45°时，$ L$取最大值。

1.1.2 局部标记框的形状

设定局部标记框为正方形。局部区域采用正方形的原因是为了让局部标记框内目标有效面积占比尽量大。目标有效面积占比的定义为

$ R = \frac{{{S_t}}}{{{S_b}}} $

(2)

式中，${S_b} $是局部标记框的面积，标记框设定为矩形框；${S_t} $是位于该标记框内目标局部区域的面积；$R $是目标有效占比。$R $与旋转角度有关，当$\theta $=45°时，$R $达到最小值；$R $与矩形框的长宽比有关，当长宽比为1的时候，$R $最大。

如图 4所示，舰船的旋转角度$\theta $=45°，假定舰船的长宽比为5(典型值)，如果用常规的标注框如$B $，对应的有效面积占比$R $≤0.2，若采用本文的标注框如$A $，有效面积占比$R $=0.75。目标有效占比越高，正样本中的目标信息越多，背景噪声越小, 背景噪声对学习的不利影响也就越小；反之，背景噪声越大，有效信息的提取越难，同样的算法下，泛化能力越弱。基于局部区域内目标有效面积占比尽量大的原则，局部标记框的形状采用了正方形。

图 4 目标有效占比比较

Fig. 4 Comparison of effective percentage of target

1.1.3 局部区域分割的密集度

子区域密集分割基于的原则是：同一目标的所有局部区域的中心点在核心检测网络的特征层上的映射保持连通。

以图 5所示的典型YOLOv3网络为例，对输入图像进行多尺度处理，存在3个不同尺度的输出层y1、y2、y3，对应的空间尺度压缩比分别为32、16、8。若某个子区域的中心点为$ (x, {\rm{ }}y)$，该点映射到特征层后对应的网格点为

$ {{x_g} = [x/{b_1}]} $

(3)

$ {{y_g} = [y/{b_1}]} $

(4)

图 5 YOLOv3网络结构图

Fig. 5 YOLOv3 network structure

式中，[]表示取整，${b_1} $是压缩比。同一目标的多个子区域中心点为$\left(x_{1}, y_{1}\right), \left(x_{2}, y_{2}\right), \cdots, \left(x_{n}, y_{n}\right) $，在特征层的映射网格点为$\left({{x_{g1}}, {y_{g1}}} \right), \left({{x_{g2}}, {y_{g2}}} \right), \cdots \left({{x_{gn}}, {y_{gn}}} \right) $。相邻子区域的中心点之间的距离$ d$为

$ d = \sqrt {{{({x_{n - 1}} - {x_n})}^2} + {{({y_{n - 1}} - {y_n})}^2}} $

(5)

为保证在特征层上的映射点之间是8邻域连通的，需满足条件

$ {({x_{g(n - 1)}} - {x_{gn}})^2} + {({y_{g(n - 1)}} - {y_{gn}})^2} \le 2 $

(6)

在实际操作中，控制相邻子区域中心点距离$ d$小于核心检测网络的最小尺度压缩比例，即可保证其在输出层上的连通性。以YOLOv3为例，$ d$小于8像素。

1.1.4 子区域密集分割策略小结

详细的子区域密集分割计算流程为

输入：光学遥感图像中的舰船

输出：舰船密集子区域集合M

1) 获取舰船倾斜标注框信息：r=[cx, cy, w, h, a]

(cx，cy为舰船中心点坐标，w，h分别为舰船的长宽，a为舰船的倾斜角度)；

设置切割步长$step $

2) 切割生成子区域数量：N=w/$step $+1；

切割的正方形局部标注框大小：

$ L$=h/cosa；

初始切割中心：

Cx=cx-h·cosa/2，

Cy=cy-h·sina/2；

切割增量：

dex=$step $·cosa，

dey=$step $·sina；

正方形局部标注框信息：

$ p=(C x+i \cdot d e x, C y+i \cdot d e y, L, L)$

舰船密集子区域集合：M={p₁, p₂, …, p_(n-1)}；

3) 输出M。

在常规的目标检测算法框架下，引入子区域密集分割策略，存在相应的缺点与优点。主要缺点是：就与环境的可区分度而言，子区域的信息相对少，与舰船整体相比较，可区分度下降了；优点包括：正方形标注框内的子区域有效占比增大，使得核心检测网络的泛化能力增强。在保持核心检测网络泛化能力强这一优势的同时，还可通过子区域合并等后处理步骤，进一步筛除虚警，弥补不足。

1.2 重叠子区域整合

在YOLOv3检测中，对于每一个网格点都要估计其对应目标的参数，中心点为$ (x, {\rm{ }}y)$的子区域映射到特征层，对应的网格点为$x_{g}=\left[x / b_{1}\right], y_{g}= \left[y / b_{1}\right]$。对每个网格点均需要去估计对应的目标参数, 即

$ {{d_x} = x - [x/{b_1}] \times {b_1}} $

(7)

$ {{d_y} = y - [y/{b_1}] \times {b_1}} $

(8)

采用密集分割策略，为了保证在最大压缩比例的特征层上(如YOLOv3的y1层)子区域的中心点映射的连通性，那么在非最大压缩比例的特征层上(如YOLOv3的y2，y3层)子区域的中心点映射就存在重叠的现象：多个不同子区域的中心点在y2，y3层上映射到同一网格点。对于特征层上的一个网格点而言，就存在多个子区域。

假设网格点$\left({{x_g}, {y_g}} \right) $对应的多个子区域中心点为$\left({{x_1}, {y_1}} \right), \left({{x_2}, {y_2}} \right), \cdots, \left({{x_n}, {y_n}} \right) $，可采用如下方法进行整合

$ {d_x} = {x_{{\rm{mean }}}} - \left[ {{x_{{\rm{mean }}}}/{b_1}} \right] \times {b_1} $

(9)

${d_y} = {y_{{\rm{mean }}}} - \left[ {{y_{{\rm{mean }}}}/{b_1}} \right] \times {b_1} $

(10)

式中，$ {x_{{\rm{mean }}}}$, $ {y_{{\rm{mean }}}}$是映射到同一网格点的子区域中心点的均值。

1.3 子区域合并

由核心检测网络检测得到子区域，如图 6所示，在测试阶段要将其进行合并，将属于同一舰船的子区域归并为一类。子区域进行合并，要达到的目标有3个。1)准确划分：一幅图像上存在多艘船，甚至多艘船在空间上比较靠近，通过合理的划分，将属于同一目标的子区域归为同一类，2)筛除虚假目标：舰船目标的子区域呈现为多个线状分布的子区域，根据此特征可以将一些虚假目标筛除，3)准确估计目标参数：从若干子区域的检测参数(中心点、长、宽)估计得到完整目标的检测参数(中心点、长、宽、角度)。

图 6 子区域检测结果

Fig. 6 Results of sub-region detection

子区域合并的核心处理环节是基于图割的子区域聚类策略：将子区域作为图的一个节点，用边表示子区域之间的连接关系，利用图割的方法获得独立子图，从而完成划分。

1.3.1 图模型

1) 空间约束。即

$ {D = \sqrt {{{({x_1} - {x_2})}^2} + {{({y_1} - {y_2})}^2}} } $

(11)

$ {D < (w{h_1} + w{h_2})/2 \times c} $

(12)

式中，$D $是2个子区域的中心点距离，$w h_{1}, w h_{2} $是子区域的边长。系数$c $越大，那么子区域的连接相对越容易；$c $越小，连接条件就越苛刻。

对距离进行约束，考虑到图像上可能存在邻近的多目标，多个目标之间不能叠加。因此，将子区域的连接范围设置为船宽的若干比例是合理的。

2) 子区域的大小约束。为了防止邻近的大目标与小目标的子区域产生连接，联通的2个子区域必须满足条件

$ {R = w{h_1}/w{h_2}} $

(13)

$ {R\mathit{\boldsymbol{ < }}(1 - {c_{{\rm{diff}}}}){\rm{且}}R\mathit{\boldsymbol{ > }}1/(1 - {c_{{\rm{diff}}}})} $

(14)

式中, $w h_{1}, w h_{2} $是子区域的边长；$ c_{\text {diff }}$为固定系数，保证两个连接的区域边长的差距不能太大。

同时满足上述两个约束条件的子区域之间才能建立连接关系，通过这种方式，为每幅检测图像以子区域为节点构建一个对应的无向图$\mathit{\boldsymbol{G}} $。

1.3.2 基于子图分割的子区域划分

对无向图$\mathit{\boldsymbol{G}} $进行子图分割，将属于同一舰船的子区域节点划分到同一极大连通子图，亦即连通分量

$ \{ {\mathit{\boldsymbol{G}}_i} = ({V_i},{E_i})|i = 1, \cdots ,n\} $

(15)

式中，${\mathit{\boldsymbol{G}}_i} $是第$ i$个子图，${V_i} $是该子图的节点, ${E_i} $是该子图的边。

从连通子图中节点的分布情况，可以进一步剔除虚假目标。正常舰船目标对应子图中的节点个数为

$ num = {L_{{\rm{ship}}}}/step \times P $

(16)

式中，$ num$是节点个数，$L_{\text {ship }} $是舰船长度(以像素计)，$step $是切割步长，$P $是子区域检测概率。以一个舰船长度$L_{\text {ship }} $=100，$step $=6，$P $=0.9为例，$ num$ =15。虚假目标在图像上大概率呈现为零星分布。因此，对分割得到的极大连通子图进行筛选，将不满足如下条件的子图视做干扰，予以删除，即

$ num \ge t{h_{{\rm{num}}}} $

(17)

1.3.3 舰船目标参数估计

对通过筛选的连通子图，将其节点的坐标值，即子区域的中心点位置，进行线性拟合；进而估计整体舰船目标的角度、长度和宽度等关键目标参数。

属于同一极大连通子图的所有子区域的中心点$ \left({{x_1}, {y_1}} \right), \left({{x_2}, {y_2}} \right), \cdots, \left({{x_n}, {y_n}} \right)$进行一次多项式拟合得到函数

$ reg = wx + b $

(18)

所有子区域的中心点$ \left({{x_1}, {y_1}} \right), \left({{x_2}, {y_2}} \right), \cdots, \left({{x_n}, {y_n}} \right)$向直线进行投影，即

$ \begin{array}{c} x_{i}^{\prime}=\left(w \times y_{i}+x_{i}-w \times b\right) /(w \times w+1) \\ (i=1,2,3, \cdots, n) \end{array} $

(19)

$ \begin{array}{c} y_{i}^{\prime}=\left(w^{2} \times y_{i}+w \times x_{i}+b\right) /(w \times w+1) \\ (i=1,2,3, \cdots, n) \end{array} $

(20)

1) 估计舰船中心点$\left(c_{x}, c_{y}\right) $为

$ x_{\min }^{\prime} =\min \left(x_{1}^{\prime}, x_{2}^{\prime}, x_{3}^{\prime}, \cdots, x_{n}^{\prime}\right) $

(21)

$ x_{\max }^{\prime} =\max \left(x_{1}^{\prime}, x_{2}^{\prime}, x_{3}^{\prime}, \cdots, x_{n}^{\prime}\right) $

(22)

$ y_{\min }^{\prime} =\min \left(y_{1}^{\prime}, y_{2}^{\prime}, y_{3}^{\prime}, \cdots, y_{n}^{\prime}\right) $

(23)

$ y_{\max }^{\prime} =\max \left(y_{1}^{\prime}, y_{2}^{\prime}, y_{3}^{\prime}, \cdots, y_{n}^{\prime}\right) $

(24)

$ c_{x} =\left(x_{\min }^{\prime}+x_{\max }^{\prime}\right) / 2 $

(25)

$ c_{y} =\left(y_{\min }^{\prime}+y_{\max }^{\prime}\right) / 2 $

(26)

式中，$ \left(x_{1}^{\prime}, x_{2}^{\prime}, \cdots, x_{n}^{\prime}\right), \left(y_{1}^{\prime}, y_{2}^{\prime}, \cdots, y_{n}^{\prime}\right)$为中心点投影值，$x_{\min }^{\prime}, y_{\min }^{\prime}, x_{\max }^{\prime}, y_{\max }^{\prime} $为投影值的最小值和最大值。

2) 估计舰船角度为

$ a=\arctan w $

(27)

3) 估计舰船长边为

$ l=\sqrt{\left(x_{\max }^{\prime}-x_{\min }^{\prime}\right)^{2}+\left(y_{\max }^{\prime}-y_{\min }^{\prime}\right)^{2}} $

(28)

4) 估计舰船短边为

$ w_{s}=\max (\cos a, \sin a) \times w_{s-\text { mean }} $

(29)

式中，${w_{s - {\rm{ mean }}}} $是所有子区域的宽度的平均值。

2 实验验证

2.1 数据集

实验所使用的数据集为HRSC2016(high resolution ship collections)(Liu等，2017b)，该数据集共包含1 573幅光学遥感图像，图像分辨率在800×800像素到1 600×1 600像素之间，其中包含了不同旋转角度的舰船检测目标，这些图像包括民用港口、军事基地、近海地区和远海地区等不同的场景。选取其中的1 200幅图像作为训练集，375幅图像作为测试集, 均包含有不同舰船种类、不同旋转角度的舰船目标, 并且对训练集图像按舰船目标位置切割为416×416像素大小。

2.2 实验设置

对任意方向的舰船目标进行标注，如图 7(a)所示，采用的标注工具软件是roLabelImg。采用密集子区域分割策略对标注后的目标进行子区域分割, 其中切割步长设置为6，图 7(b)为切割后的示例。

图 7 舰船密集子区域分割

Fig. 7 Ship dense sub-region cutting

((a) ship marking; (b)dense sub-region segmentation)

本文算法运行于Pytorch框架，核心检测网络选择Pytorch版本的YOLOv3检测网络，训练阶段的相关操作与参数设置如下：对训练样本中的子区域的尺寸信息进行聚类后得到9组Anchor值；训练时使用Adam作为优化器，其中权重衰减$ {w}_{\text {decay }}$ =0.000 5，学习衰减率$ lr$ =0.001，提高稳定性的分母常数项$ eps$ =10^-8，计算梯度的平均和平方的系数分别为${\beta _1} $ =0.9，$ {\beta _2}$ =0.999。测试阶段的相关操作与参数设置如下：子区域检测的置信度阈值$c_{\rm{thres}} $=0.8，子区域合并中空间约束常量和大小约束常量分别为$c $=0.25，$ c_{\text {diff }}$=0.3，子图分割采用NetworkX工具箱实现。

2.3 实验结果

待检测图像先经过核心检测网络，检测得到子区域。图 8(a)给出了一系列不同舰船种类、不同旋转角度的子区域检测的示例，从图中可以看出：子区域呈现密集分布，有效覆盖了舰船目标区域。在此基础上，经过子区域合并处理，定位舰船目标并估计参数，如图 8(b)所示。

图 8 检测实例

Fig. 8 Detection example((a) results of sub-region detection; (b) results of sub-region merge)

将目前先进的几种任意方位舰船检测算法进行对比实验。对照算法包括：改进YOLOv3模型(吴止锾等，2019)、RRCNN(Liu等，2017a)、RRPN(Ma等，2018)、R-DFPN-3(Yang等，2018)和R-DFPN-4(Yang等，2018)。其中改进YOLOv3模型是单阶段检测算法，RRCNN、RRPN、R-DFPN-3和R-DFPN-4是双阶段检测算法。

采用mAP(mean average precision) (IOU(intersection over union)=0.5)作为检测精度指标，利用固定大小的多幅图像上的平均耗时作为检测速度指标。技术指标比较如表 2所示，其中改进YOLOv3模型、RRCNN、RRPN、R-DFPN-3和R-DFPN-4等对照算法的技术指标均直接引用自对应的参考文献。

表 1中的各个对照算法均运行在NVIDIA 1080Ti GPU硬件平台；测试图像的大小，除了改进YOLOv3模型外，均保持在600×600像素到800×800像素之间。本文算法也运行在相同的NVIDIA 1080Ti GPU硬件平台，对应的测试图像大小限定为800×800像素。从平均耗时指标来看，本文算法的检测速度是RRCNN、RRPN、R-DFPN-3和R-DFPN-4算法的2倍多。

表 1 检测算法指标比较
Table 1 Comparison of detection algorithm indicators

下载CSV

检测算法	测试图像大小/ 像素	平均耗时/ms	训练样本占比/%	mAP /%
改进YOLOv3	未知	102	80	87.9
RRCNN	600×600~800×800	246	80	84.6
RRPN	600×600~800×800	175	80	74.2
R-DFPN-3	600×600~800×800	190	80	86.9
R-DFPN-4	600×600~800×800	190	80	89.6
本文	800×800	80	76	91.5
注：加粗字体为最优结果。

表 1中训练样本占比是训练样本个数占数据集全部样本个数的比例，各个对照算法与本文算法的训练样本占比相当。在此情况下，比较mAP (IOU=0.5)指标，较最高的R-DFPN-4对照算法，本文算法提升了约2%。

在检测方法上，各个对照实验均采用了常用的NMS后处理手段，直接得到最终检测结果；本文算法不经过NMS后处理阶段，得到密集子区域的中间结果，再通过子区域合并处理得到最终检测结果。

在检测速度方面，将本文算法与经典YOLOv3算法进一步比较。二者均运行于Pytorch框架，核心检测网络是一致的；因此，主要比较子区域合并后处理与常规的NMS后处理的耗时。针对800×800像素大小的图像，本文算法在核心检测网络上平均耗时69 ms，在子区域合并后处理上平均耗时为11 ms；相同的硬件条件下，采用经典YOLOv3算法，对子区域直接进行NMS处理，平均耗时约为5 ms。由此可见：核心检测网络的运行占据了大部分耗时(86%)，子区域合并后处理的耗时多于NMS，但对总体耗时影响有限。

3 结论

针对遥感图像上任意方向分布的舰船目标，结合舰船目标具有较大长宽比的形态特点，本文提出了基于密集子区域切割的检测算法。通过将狭长形的舰船目标切分为若干密集的正方形子区域，采用YOLOv3对子区域进行检测，再采用子图分割的方法将子区域合并为完整的舰船目标，同时根据子区域的空间分布，估计得到舰船的角度等参数。在HRSC2016数据集的对照实验表明，本文算法的性能优于当前主流的任意方向舰船检测算法，特别是检测速度有明显的提升。

虽然本文算法在任意方向舰船目标上检测速度和检测精度都有很好的结果，但对于部分首尾相连的多个舰船目标，算法无法对其进行明确的区分。在接下来的工作中，将进一步对算法进行改进，提高算法的检测能力。

参考文献

Eldhuset K. 1996. An automatic ship and ship wake detection system for spaceborne SAR images in coastal regions. IEEE Trans on Geosciences and Remote Sensing, 34(4): 1010-1019 [DOI:10.1109/36.508418]

Fingas M F, Brown C E. 2001. Review of ship detection from airborne platforms. Canadian Journal of Remote Sensing, 27(4): 379-385 [DOI:10.1080/07038992.2001.10854880]

Girshick R. 2015. Fast R-CNN[EB/OL].[2020-03-02].https://arxiv.org/pdf/1504.08083.pdf

Lin T Y, Goyal P, Girshick R, He K M and Dollár P.2017. Focal loss for dense object detection. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[DOI: 10.1109/TPAMI.2018.2858826]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Nether lands: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2]

Liu Z K, Hu J G, Weng L B and Yang Y P. 2017a. Rotated region based CNN for ship detection//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE: 900-904[DOI: 10.1109/ICIP.2017.8296411]

Liu Z K, Yuan L, Weng L B and Yang Y Q. 2017b. A high resolution optical satellite image dataset for ship recognition and some new baselines//Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods. Porto, Portugal: ICPRAM: 324-331[DOI: 10.5220/0006120603240331]

Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B, Xue X Y. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11): 3111-3122 [DOI:10.1109/TMM.2018.2818020]

Redmon J, Divvala S, Girshick R and Farhadi F. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91]

Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL].[2020-03-02]. https://arxiv.org/pdf/1804.02767.pdf

Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks//Proceedings of Advances in Neural Information Processing Systems. Montreal, Canada: NIPS: 91-99

Wang C L, Bi F K, Zhang W P, Chen L. 2017. An intensity-space domain CFAR method for ship detection in HR SAR images. IEEE Geoscience and Remote Sensing Letters, 14(4): 529-533 [DOI:10.1109/LGRS.2017.2654450]

Wu Z H, Li L, Gao Y M. 2019. Rotation convolution ensemble YOLOv3 model for ship detection in remote sensing images. Computer Engineering and Applications, 55(22): 146-151 (吴止锾, 李磊, 高永明. 2019. 遥感图像舰船检测的旋转卷积集成YOLOv3模型. 计算机工程与应用, 55(22): 146-151) [DOI:10.3778/j.issn.1002-8331.1902-0144]

Yang X, Sun H, Fu K, Yang J R, Sun X, Yan M L, Guo Z. 2018. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sensing, 10(1): #132 [DOI:10.3390/rs10010132]

Yu Y D, Yang X B, Xiao S J, Lin J L. 2012. Automated ship detection from optical remote sensing images. Key Engineering Materials, 500: 785-791 [DOI:10.4028/www.scientific.net/KEM.500.785]

Zhu C R, Zhou H, Wang R S, Guo J. 2010. A novel hierarchical method of ship detection from spaceborne optical image based on shape and texture features. IEEE Transactions on Geoscience and Remote Sensing, 48(9): 3446-3456 [DOI:10.1109/TGRS.2010.2046330]