Current Issue Cover
CGAN样本生成的遥感图像飞机识别

王耀领1,2, 王宏琦1, 许滔1,2(1.中国科学院空天信息创新研究院, 北京 100190;2.中国科学院大学, 北京 100049)

摘 要
目的 基于深度学习的飞机目标识别方法在遥感图像解译领域取得了很大进步,但其泛化能力依赖于大规模数据集。条件生成对抗网络(conditional generative adversarial network,CGAN)可用于产生逼真的生成样本以扩充真实数据集,但对复杂遥感场景的建模能力有限,生成样本质量低。针对这些问题,提出了一种结合CGAN样本生成的飞机识别框架。方法 改进条件生成对抗网络,利用感知损失提高生成器对遥感图像的建模能力,提出了基于掩膜的结构相似性(structural similarity,SSIM)度量损失函数(masked-SSIM loss)以提高生成样本中飞机区域的图像质量,该损失函数与飞机的掩膜相结合以保证只作用于图像中的飞机区域而不影响背景区域。选取一个基于残差网络的识别模型,与改进后的生成模型结合,构成飞机识别框架,训练过程中利用生成样本代替真实的卫星图像,降低了对实际卫星数据规模的需求。结果 采用生成样本与真实样本训练的识别模型在真实样本上的进行实验,前者的准确率比后者低0.33%;对于生成模型,在加入感知损失后,生成样本的峰值信噪比(peak signal to noise ratio,PSNR)提高了0.79 dB,SSIM提高了0.094;在加入基于掩膜的结构相似性度量损失函数后,生成样本的PSNR提高了0.09 dB,SSIM提高了0.252。结论 本文提出的基于样本生成的飞机识别框架生成了质量更高的样本,这些样本可以替代真实样本对识别模型进行训练,有效地解决了飞机识别任务中的样本不足问题。
关键词
Aircraft recognition of remote sensing image based on sample generated by CGAN

Wang Yaoling1,2, Wang Hongqi1, Xu Tao1,2(1.Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Objective Aircraft type recognition is a fundamental problem in remote sensing image interpretation, which aims to identify the type of aircraft in an image. Aircraft type recognition algorithms have been widely studied and improved ceaselessly. The traditional recognition algorithms are efficient, but their accuracy is limited by the small capacity and poor robustness. The deep-learning-based methods have been widely implemented because of good robustness and generalization, especially in the object recognition task. In remote sensing scenes, the objects are sparsely distributed; hence, the available samples are few. In addition, labeling is time consuming, resulting in a modest number of labeled samples. Generally, the deep-learning-based models rely on a large amount of labeled data due to thousands of weights needed to learn. Consequently, these models suffer from scarce data that are insufficient to meet the demand of large-scale datasets, especially in the remote sensing scene. Generative adversarial network (GAN) can produce realistic synthetic data and enlarge the scale of the real dataset. However, these algorithms usually take random noises as input; therefore, they are unable to control the position, angle, size, and category of objects in synthetic images. Conditional GAN (CGAN) have been proposed by previous researchers to generate synthetic images with designated content in a controlled scheme. CGANs take the pixel-wise labeled images as the input data and output the generated images that meet constraints from its corresponding input images. However, these generative adversarial models have been widely studied for natural sceneries, which are not suitable for remote sensing imageries due to the complex scenes and low resolutions. Hence, the GANs perform poorly when adopted to generate remote sensing images. An aircraft recognition framework of remote sensing images based on sample generation is proposed, which consists of an improved CGAN and a recognition model, to alleviate the lack of real samples and deal with the problems mentioned above. Method In this framework, the masks of real aircraft images are labeled pixel by pixel. The masks of images serve as the conditions of the CGAN that are trained by the pairs of real aircraft images and corresponding masks. In this manner, the location, scale, and type of aircraft in the synthetic images can be controlled. Perceptual loss is introduced to promote the ability of the CGANs to model the scenes of remote sensing. The L2 distance between the features of real images and synthetic images extracted by the VGG-16 (visual geometry group 16-layer net) network measures the perceptual loss between the real images and synthetic images. Masked structural similarity (SSIM) loss is proposed, which forces the CGAN to focus on the masked region and improve the quality of the aircraft region in the synthetic images. SSIM is a measurement of image quantity according to the structure and texture. Masked SSIM loss is the sum of the product between masks and SSIM pixel by pixel. Afterward, the loss function of the CGAN consists of perceptual loss, masked SSIM loss, and origin CGAN loss. The recognition model in this framework is ResNet-50, which outputs the type and recognition score of an aircraft. In this paper, the recognition model trained on synthetic images is compared with the model trained on real images. The remote sensing images from QuickBird are cropped to build the real dataset, in which 800 images for each type are used for training and 1 000 images are used for testing. After data augmentation, the training dataset consists of 40 000 images, and the synthetic dataset consists of synthetic images generated by the generation module with flipped, rotated, and scaled masks. The generators are selected from different training stages to generate 2 000 synthetic images per type and determine the best end time in the training procedure. The chosen generator is used to produce different numbers of images for 10 aircraft types and find an acceptable number of synthetic images. These synthetic images serve as the training set for the recognition model, whose performances are compared. All our experiments are carried out on a single NVIDIA K80 GPU device with the framework of Pytorch, and the Adam optimizer is implemented to train the CGAN and ResNet-50 for 100 epochs. Result The quantities of the synthetic images from the generator with and without our proposed loss functions on the training dataset are compared. The quantitative evaluation metrics contain peak signal to noise ratio (PSNR) and SSIM. Results show that PSNR and SSIM increase by 0.88 and 0.346 using our method, respectively. In addition, recognition accuracy increases with the training epoch of the generator and the number of synthetic images. Finally, the accuracy of the recognition model trained on the synthetic dataset is 0.33% less than that of the real dataset. Conclusion An aircraft recognition framework of remote sensing images based on sample generation is proposed. The experiment results show that our method effectively improves the ability of CGAN to model the remote sensing scenes and alleviates the absence of data.
Keywords

订阅号|日报