目的 深度学习已经大量应用于SAR图像目标识别领域，但大多数工作是基于MSTAR数据集的标准操作条件展开研究。当将深度学习应用于同类含变体目标时，例如T72子类，由于目标间差异小，所以仍存在着较大的挑战。本文从极大限度地保留SAR图像输入特征出发，设计了一种适用于SAR变体目标识别的深度卷积神经网络结构。方法 设计网络主要由多尺度空间特征提取模块和DenseNet中的稠密块、转移层构成。多尺度特征提取模块置于网络底层，通过使用尺寸分别为1×1，3×3，5×5，7×7，9×9的卷积核，提取丰富空间特征的同时保留输入图像信息。为使输入图像信息更加有效地向后传递，基于DenseNet中的稠密块和转移层进行后续网络层设计。在对训练样本进行样本扩充基础上, 分析了输入图像分辨率及目标存在平移和不同噪声水平等情况对模型识别精度的影响，与文献中用于SAR图像目标识别的深度模型的识别精度在标准操作条件下进行了对比分析。结果 实验结果表明，对T72 8类变体目标进行分类，设计的模型能够取得95.48%的识别精度，在存在目标平移和不同噪声水平情况下平均识别精度分别达到了94.61%和86.36%。对10类目标(包括不含变体和含变体情况)在进行数据增强的情况下进行模型训练与测试，分别达到了99.38%和98.81%的识别精度，略优于目前文献中提出的模型结构识别精度。结论 提出的模型可以充分利用输入图像以及各卷积层输出的特征，学习目标图像的细节差异，不仅适用于SAR图像变体目标的识别任务，同时在标准操作条件下的识别任务取得了较高的识别结果。
SAR target recognition with variants based on convolutional neural network
Objective Deep learning has been widely used in the field of Synthetic Aperture Radar (SAR) target recognition, but most work has been done for target recognition under standard operating conditions (SOC) of MSTAR datasets. There are still big challenges due to the small differences among targets when applied to target recognition with variants, such as T72 subclasses. In order to preserve the input features of SAR images, a deep convolutional neural network (CNN) architecture suitable for SAR targets recognition with variants is designed in this paper.Method The proposed network is composed of one multi-scale feature extraction module and several dense blocks and transition layers proposed in DenseNet. The multi-scale feature extraction module, which is placed at the bottom of the network, uses multiple convolution kernels whose sizes are 1×1, 3×3, 5×5, 7×7 and 9×9, respectively, to extract rich spatial features. The convolution kernels whose size are 1×1 are adopted to preserve the detailed information from the input image, and the convolution kernels with larger sizes are used in multi-scale feature extraction module to suppress the influence of speckle noises on extracted features, because speckle noise is a main factor of affecting recognition performance. To make the information from the input image transfer backwards effectively and make full use of the feature learned from all layers, dense blocks and transition layers are adopted to design the latter layers of the network. A full convolution layer is used behind three dense blocks and transition layers to transform the learned features to vectors, and the softmax layer is adopted to perform classification. Finally, training datasets are augmented though displacement and adding speckle noises to the original images, and the proposed model is implemented using Tensorflow and trained using these samples. The influences of input image resolution, target translation and different noise levels on recognition accuracy of the proposed network are studied after augmenting the training datasets, and performance comparisons with other deep learning models in the literature are also presented under standard operating conditions.Result The experimental results demonstrate that, the input image resolution has a great influence on the recognition accuracy for eight types of T72 targets, and the accuracy improves significantly with the increase of input resolution. However, The input resolution has little effect on the recognition accuracy for SOC, because there are great differences among targets in SOC. The resolution of the image as the input of the proposed model is set to be 88×88×1, considering the need to preserve both the target and the shadow information during data enhancement. To verify the performance of the proposed multi-scale feature extraction module, tests are performed using different multi-scale feature extraction strategies, and the proposed model can obtain the classification accuracy about 95.48% in the task of classifying 8 subclasses of T72 target with variants. In addition to recognizing test samples under SOC, the classification accuracies of the proposed model are also studied in the case of target translation and different noise levels. It can achieve the recognition accuracy of above 90% even if the target is displaced 16 pxiels away from the center of the original image. The proposed model still has a good performance when the noise intensity is set 0.5 or 1, but it will cause a significant decline in recognition accuracy when the noise intensity is greater than 1. The average classification accuracy can reach 94.61% and 86.36%, respectively, in the case of object translation and different noise levels. The recognition accuracies of 99.38%(SOC1-10), 99.50%(SOC1-14) and 98.81%(SOC-2) are achieved, respectively, using augmented training datasets, when training the models for 10-classes target recognition under SOC (without variants and with variants). Our proposed model achieves comparable recognition performance with other deep models presented in the literature.Conclusion Our model makes full use of the input information and features of each convolutional layer, and can capture the detailed difference among targets from the images. It can not only be applied to target recognition task with variants, but also can achieve satisfactory recognition results under standard operating conditions.