Current Issue Cover
面向遥感影像的深度语义哈希检索

陈诚1, 邹焕新1, 邵宁远1, 孙嘉赤1, 秦先祥2(1.国防科技大学电子科学学院, 长沙 410073;2.空军工程大学信息与导航学院, 西安 710077)

摘 要
目的 哈希检索旨在将海量数据空间中的高维数据映射为紧凑的二进制哈希码,并通过位运算和异或运算快速计算任意两个二进制哈希码之间的汉明距离,从而能够在保持相似性的条件下,有效实现对大数据保持相似性的检索。但是,遥感影像数据除了具有影像特征之外,还具有丰富的语义信息,传统哈希提取影像特征并生成哈希码的方法不能有效利用遥感影像包含的语义信息,从而限制了遥感影像检索的精度。针对遥感影像中的语义信息,提出了一种基于深度语义哈希的遥感影像检索方法。方法 首先在具有多语义标签的遥感影像数据训练集的基础上,利用两个不同配置参数的深度卷积网络分别提取遥感影像的影像特征和语义特征,然后利用后向传播算法针对提取的两类特征学习出深度网络中的各项参数并生成遥感影像的二进制哈希码。生成的二进制哈希码之间能够有效保持原始高维遥感影像的相似性。结果 在高分二号与谷歌地球遥感影像数据集、CIFAR-10数据集及FLICKR-25K数据集上进行实验,并与多种方法进行比较和分析。当编码位数为64时,相对于DPSH(deep supervised Hashing with pairwise labels)方法,在高分二号与谷歌地球遥感影像数据集、CIFAR-10数据集、FLICKR-25K数据集上,mAP(mean average precision)指标分别提高了约2%、6%7%、0.6%。结论 本文提出的端对端的深度学习框架,对于带有一个或多个语义标签的遥感影像,能够利用语义特征有效提高对数据集的检索性能。
关键词
Deep semantic Hashing retrieval of remote sensing images

Chen Cheng1, Zou Huanxin1, Shao Ningyuan1, Sun Jiachi1, Qin Xianxiang2(1.College of Electronic Science, National University of Defense Technology, Changsha 410073, China;2.School of Information and Navigation, Air Force Engineering University, Xi'an 710077, China)

Abstract
Objective Hashing methods, which aim at mapping the high-dimensional data to compact binary Hashing codes in Hamming space and rapidly calculate the Hamming distance by bit operation and XOR operation, can effectively achieve search and retrieval with remaining similarity for big data. However, a massive number of remote sensing images are associated with semantic information. Traditional methods of extracting image features and generating Hash codes cannot effectively use semantic information, thereby limiting the accuracy of remote sensing image retrieval. This study proposes an image retrieval method based on DSH(deep semantic Hashing) for mining semantic information of remote sensing images with tags or other semantic annotations. The contribution of this study includes introducing Hashing methods for RS images which encode the high-dimensional image feature vector to binary bits by using a limited number of labeled (annotated) images. Furthermore, DSH directly learns the discrete Hashing codes without relaxation thereby deteriorating the accuracy of the learned Hashing codes. Hence, DSH provides highly time-efficient (in terms of storage and speed) and accurate search capability within huge data archives. Method The DSH model performs simultaneous feature learning and Hashing codes learning in an end-to-end framework, which is organized into two main parts, namely feature learning and Hashing learning. In feature learning, we use two deep neural networks for images and semantic annotations. The deep neural network for image is a convolutional neural network (CNN) adapted from vgg_net. Particularly, feature learning has seven layers of vgg_16 network pretrained on ImageNet. We replace the eighth layer as a fully-connected layer with the output of the learned image features. The first seven layers use the rectified linear unit (ReLU) as the activation function, and the eighth layer uses identity function as the activation function. For semantic annotations, we use semantic vectors as the input to a deep neural network with two fully-connected layers. Moreover, we use ReLU and identity function for two fully-connected layers as activation function. In Hashing learning, we assume that f(xi;θx) represents the learned feature for image xi, which corresponds to the output of the CNN for images. Furthermore, let g(yj;θy) denote the learned feature for semantic yi, which corresponds to the output of the deep neural network for semantic vectors. Here, θx is the network parameter of the CNN for images, and θy is the network parameter of the deep neural network for semantic vectors. For binary codes, B={bi}i=1n, Then, we define the similarities with the likelihood and optimization function and learn the parameters of the CNN through an alternating learning strategy, which learns one parameter while fixing the other parameters. Result We have conducted experiments on three archives. The first archive consists of 2 000 images acquired from GF-2 satellite and Google Earth. Each image in the archive is a section of 224×224 pixels and is associated with several textual tags. In our experiments, we consider several tags, which are similar to one semantic annotation. We use CIFAR-10 dataset as the second archive, which is a single-label dataset consisting 60 000 color images with a size of 32×32 pixels. Each image belongs to one of the ten classes. The third archive is the FLICKR-25K dataset, which consists of 25 000 images associated with several textual tags. We consider several tags that are similar to one semantic annotation such as the first archive. Each image in the archive is a section of 224×224 pixels. On GF-2 satellite and Google Earth remote sensing image dataset, when the Hashing bit is 64, the mean average precision (mAP) value can be improved by approximately 2% contrary to DPSH(deep supervised Hashing with pairwise labels). On the CIFAR-10 dataset, the proposed method attains an improvement by 6%7% compared with DPSH for the mAP evaluation when the Hashing bit is 64. On the FLICKR-25K dataset, the proposed method attains improvement by approximately 0.6% compared with DPSH for the mAP evaluation when the Hashing bit is 64. Conclusion In this study, we propose an end-to-end deep learning framework, which considers image visual and semantic features based on deep learning and generates Hashing functions for Hashing codes by utilizing the semantic information, thereby providing high accuracy for RS image retrieval. Experimental results show our proposed method greatly improves the detection accuracy of image retrieval. Notably, the archives used in the experiments are benchmarks, which are composed of a moderate number of images, whereas in many actual applications, the search is expected to be applied to considerably larger archives.
Keywords

订阅号|日报