谷雨,徐英(杭州电子科技大学通信信息传输与融合技术国防重点学科实验室, 杭州 310018;杭州电子科技大学生命信息与仪器工程学院, 杭州 310018)
目的 针对用于SAR （synthetic aperture radar） 目标识别的深度卷积神经网络模型结构的优化设计难题，在分析卷积核宽度对分类性能影响基础上，设计了一种适用于SAR目标识别的深度卷积神经网络结构。方法 首先基于二维随机卷积特征和具有单个隐层的神经网络模型-超限学习机分析了卷积核宽度对SAR图像目标分类性能的影响；然后，基于上述分析结果，在实现空间特征提取的卷积层中采用多个具有不同宽度的卷积核提取目标的多尺度局部特征，设计了一种适用于SAR图像目标识别的深度模型结构；最后，在对MSTAR （moving and stationary target acquisition and recognition） 数据集中的训练样本进行样本扩充基础上，设定了深度模型训练的超参数，进行了深度模型参数训练与分类性能验证。结果 实验结果表明，对于具有较强相干斑噪声的SAR图像而言，采用宽度更大的卷积核能够提取目标的局部特征，提出的模型因能从输入图像提取目标的多尺度局部特征，对于10类目标的分类结果（包含非变形目标和变形目标两种情况）接近或优于已知文献的最优分类结果，目标总体分类精度分别达到了98.39%和97.69%，验证了提出模型结构的有效性。结论 对于SAR图像目标识别，由于与可见光图像具有不同的成像机理，应采用更大的卷积核来提取目标的空间特征用于分类，通过对深度模型进行优化设计能够提高SAR图像目标识别的精度。
Architecture design of deep convolutional neural network for SAR target recognition
Gu Yu,Xu Ying(Fundamental Science on Communication Information Transmission and Fusion Technology Laboratory, Hangzhou Dianzi University, Hangzhou 310018, China;College of Life Information Science & Instrument Engineering, Hangzhou Dianzi University, Hangzhou 310018, China)
Objective To solve issues in the optimization design of deep convolutional neural network (DCNN) model architecture for synthetic aperture radar (SAR) target recognition, a DCNN model architecture for SAR target recognition is presented based on the analysis of the influence of convolution kernel size on classification performance. Method First, two-dimensional random convolution features and extreme learning machines (ELMs), which are a single-hidden-layer neural network, are used to analyze the influence of convolution kernel size on SAR target recognition performance. Experimental results show that recognition performance increases as the kernel size increases although convolution kernels generate randomly and the convolution kernel with size 3×3 is unsuitable for SAR image recognition. Second, a DCNN architecture for SAR target recognition, in which the pixel resolution of input image is set to 88×88, is presented based on directed acyclic graph architecture. Multiple convolution kernels with different sizes, which are set as 5×5, 7×7, 9×9, and 11×11, are first adopted in the spatial-feature-extraction convolutional layer of DCNN to extract multi-scale local features from input images, and convolution kernels with large size, including 7×7, 5×5, and 6×6, are then used in the last convolutional layers to extract semantic features. A fully connected layer is used as the classifier to recognize various types of targets and softmax loss function is used to train the parameters of the convolutional layers. The dropout strategy, which can improve regularization performance, is used between the fully connected layer and the output layer. Rectified linear units following behind each convolutional layer are used as activation functions, and pooling operations with width 3 and stride 2 are used to perform downsampling behind each activation function layer. Finally, MSTAR database, where the training samples are randomly augmented through sampling and adding speckle noises, is used to train the parameters of the proposed model architecture after setting proper training hyperparameters, and the recognition performances are tested in standard operating conditions, where target configurations with non-deformable and deformable conditions are considered. Result The MatConvNet toolbox is used to implement the proposed DCNN model architecture. In this task, 90% of the augmented training samples are used to train the parameters of each convolutional layer, and other training samples are used to verify the trained parameters. The dropout rate is set as 0.1. The training procedure stops after 28 epochs, and the trained parameters are used to test recognition performance. The experimental results demonstrate that superior performance can be achieved for SAR image recognition because large-size kernels are used to extract spatial features from input image to overcome the influence of high-level speckle noise. This result is different from that of natural scene classification scenario using visible images, where small kernel sizes, such as 3×3, 3×1, and 1×3, are used to achieve high recognition performance. The classification results based on the proposed architecture for 10 classes (including non-deformable and deformable target configurations) are compared with two DCNN models. The experimental results show that it can achieve comparable or better results than that of state-of-the-art deep model architectures, where the overall recognition performances reach 98.39% and 97.69%, respectively, for the two scenarios. The deep model using 3×3 convolutional layers can only achieve 93.16% recognition rate, which confirms our analysis on the influence of convolution kernel size on SAR image recognition performance. The recognition performance using the proposed DCNN model architecture is also better than that using random convolution features and ELM. This finding demonstrates that the DCNN model architecture can achieve satisfactory performance when deep architecture is carefully designed and more training samples are used to train these parameters. Conclusion A large convolution kernel size should be used to extract spatial features for SAR target recognition due to different imaging mechanisms compared with visible images, and better performance can be achieved through the optimization design of deep model architecture with augmented training samples.