面向遥感影像的深度语义哈希检索

陈诚; 邹焕新; 邵宁远; 孙嘉赤; 秦先祥

doi:10.11834/jig.180420

遥感图像处理 | 浏览量 : 0 下载量: 5 CSCD: 0

PDF
导出
分享
收藏
专辑

面向遥感影像的深度语义哈希检索
Deep semantic Hashing retrieval of remote sensing images
2019年24卷第4期页码：655-663
收稿：2018-07-04，

修回：2018-8-10，

纸质出版：2019-04-24
DOI： 10.11834/jig.180420
稿件说明：

移动端阅览

陈诚, 邹焕新, 邵宁远, 孙嘉赤, 秦先祥. 面向遥感影像的深度语义哈希检索[J]. 中国图象图形学报, 2019,24(4):655-663. DOI： 10.11834/jig.180420.

Cheng Chen, Huanxin Zou, Ningyuan Shao, Jiachi Sun, Xianxiang Qin. Deep semantic Hashing retrieval of remote sensing images[J]. Journal of Image and Graphics, 2019, 24(4): 655-663. DOI： 10.11834/jig.180420.

摘要

目的

哈希检索旨在将海量数据空间中的高维数据映射为紧凑的二进制哈希码，并通过位运算和异或运算快速计算任意两个二进制哈希码之间的汉明距离，从而能够在保持相似性的条件下，有效实现对大数据保持相似性的检索。但是，遥感影像数据除了具有影像特征之外，还具有丰富的语义信息，传统哈希提取影像特征并生成哈希码的方法不能有效利用遥感影像包含的语义信息，从而限制了遥感影像检索的精度。针对遥感影像中的语义信息，提出了一种基于深度语义哈希的遥感影像检索方法。

方法

首先在具有多语义标签的遥感影像数据训练集的基础上，利用两个不同配置参数的深度卷积网络分别提取遥感影像的影像特征和语义特征，然后利用后向传播算法针对提取的两类特征学习出深度网络中的各项参数并生成遥感影像的二进制哈希码。生成的二进制哈希码之间能够有效保持原始高维遥感影像的相似性。

结果

在高分二号与谷歌地球遥感影像数据集、CIFAR-10数据集及FLICKR-25K数据集上进行实验，并与多种方法进行比较和分析。当编码位数为64时，相对于DPSH（deep supervised Hashing with pairwise labels）方法，在高分二号与谷歌地球遥感影像数据集、CIFAR-10数据集、FLICKR-25K数据集上，mAP（mean average precision）指标分别提高了约2%、6%7%、0.6%。

结论

本文提出的端对端的深度学习框架，对于带有一个或多个语义标签的遥感影像，能够利用语义特征有效提高对数据集的检索性能。

Abstract

Objective

Hashing methods

which aim at mapping the high-dimensional data to compact binary Hashing codes in Hamming space and rapidly calculate the Hamming distance by bit operation and XOR operation

can effectively achieve search and retrieval with remaining similarity for big data. However

a massive number of remote sensing images are associated with semantic information. Traditional methods of extracting image features and generating Hash codes cannot effectively use semantic information

thereby limiting the accuracy of remote sensing image retrieval. This study proposes an image retrieval method based on DSH(deep semantic Hashing) for mining semantic information of remote sensing images with tags or other semantic annotations. The contribution of this study includes introducing Hashing methods for RS images which encode the high-dimensional image feature vector to binary bits by using a limited number of labeled (annotated) images. Furthermore

DSH directly learns the discrete Hashing codes without relaxation thereby deteriorating the accuracy of the learned Hashing codes. Hence

DSH provides highly time-efficient (in terms of storage and speed) and accurate search capability within huge data archives.

Method

The DSH model performs simultaneous feature learning and Hashing codes learning in an end-to-end framework

which is organized into two main parts

namely feature learning and Hashing learning. In feature learning

we use two deep neural networks for images and semantic annotations. The deep neural network for image is a convolutional neural network (CNN) adapted from vgg_net. Particularly

feature learning has seven layers of vgg_16 network pretrained on ImageNet. We replace the eighth layer as a fully-connected layer with the output of the learned image features. The first seven layers use the rectified linear unit (ReLU) as the activation function

and the eighth layer uses identity function as the activation function. For semantic annotations

we use semantic vectors as the input to a deep neural network with two fully-connected layers. Moreover

we use ReLU and identity function for two fully-connected layers as activation function. In Hashing learning

we assume that

(

;

) represents the learned feature for image

which corresponds to the output of the CNN for images. Furthermore

let

(

;

) denote the learned feature for semantic

which corresponds to the output of the deep neural network for semantic vectors. Here

is the network parameter of the CNN for images

and

is the network parameter of the deep neural network for semantic vectors. For binary codes

}

Then

we define the similarities with the likelihood and optimization function and learn the parameters of the CNN through an alternating learning strategy

which learns one parameter while fixing the other parameters.

Result

We have conducted experiments on three archives. The first archive consists of 2 000 images acquired from GF-2 satellite and Google Earth. Each image in the archive is a section of 224×224 pixels and is associated with several textual tags. In our experiments

we consider several tags

which are similar to one semantic annotation. We use CIFAR-10 dataset as the second archive

which is a single-label dataset consisting 60 000 color images with a size of 32×32 pixels. Each image belongs to one of the ten classes. The third archive is the FLICKR-25K dataset

which consists of 25 000 images associated with several textual tags. We consider several tags that are similar to one semantic annotation such as the first archive. Each image in the archive is a section of 224×224 pixels. On GF-2 satellite and Google Earth remote sensing image dataset

when the Hashing bit is 64

the mean average precision (mAP) value can be improved by approximately 2% contrary to DPSH(deep supervised Hashing with pairwise labels). On the CIFAR-10 dataset

the proposed method attains an improvement by 6%7% compared with DPSH for the mAP evaluation when the Hashing bit is 64. On the FLICKR-25K dataset

the proposed method attains improvement by approximately 0.6% compared with DPSH for the mAP evaluation when the Hashing bit is 64.

Conclusion

In this study

we propose an end-to-end deep learning framework

which considers image visual and semantic features based on deep learning and generates Hashing functions for Hashing codes by utilizing the semantic information

thereby providing high accuracy for RS image retrieval. Experimental results show our proposed method greatly improves the detection accuracy of image retrieval. Notably

the archives used in the experiments are benchmarks

which are composed of a moderate number of images

whereas in many actual applications

the search is expected to be applied to considerably larger archives.

关键词

Keywords

references

Yang X L, Yao J L, Wang X H, et al. Image copy detection method based on contextual descriptor[J].Journal of Image and Graphics, 2017, 22(8):1098-1105.[

杨醒龙, 姚金良, 王小华, 等.构建近邻上下文的拷贝图像检索[J].中国图象图形学报, 2017, 22(8):1098-1105. DOI:10.11834/jig.160562

Yu J Q, Wu Z B, Wu F, et al. Multimedia technology 2016:advances and trends in image retrieval[J].Journal of Image and Graphics, 2017, 22(11):1467-1485.[

于俊清, 吴泽斌, 吴飞, 等.多媒体工程:2016-图像检索研究进展与发展趋势[J].中国图象图形学报, 2017, 22(11):1467-1485. DOI:10.11834/jig.170503

Chen F, Lyu S H, Li J, et al. Multi-label image retrieval by Hashing with object proposal[J]. Journal of Image and Graphics, 2017, 22(2):232-240.[

陈飞, 吕绍和, 李军, 等.目标提取与哈希机制的多标签图像检索[J].中国图象图形学报, 2017, 22(2):232-240. DOI:10.11834/jig.20170211

Cao Y, Long M S, Wang J M, et al. Deep visual-semantic quantization for efficient image retrieval[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017.[ DOI: 10.1109/CVPR.2017.104 http://dx.doi.org/10.1109/CVPR.2017.104

Bahmanyar R, de Oca A M M, Datcu M, et al. The semantic gap:an exploration of user and computer perspectives in earth observation images[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(10):2046-2050.[DOI:10.1109/LGRS.2015.2444666

Gong Y C, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011: 817-824.[ DOI: 10.1109/CVPR.2011.5995432 http://dx.doi.org/10.1109/CVPR.2011.5995432 ]

Andoni A, Indyk P. Near-optimal Hashing algorithms for approximate nearest neighbor in high dimensions[C]//47th Annual IEEE Symposium on Foundations of Computer Science. Berkeley, CA, USA: IEEE, 2006: 117-129.[ DOI: 10.1109/FOCS.2006.49 http://dx.doi.org/10.1109/FOCS.2006.49 ]

Weiss Y, Fergus R, Torralba A. Multidimensional spectral Hashing[C]//Proceedings of the 12th European Conference on Computer Vision-ECCV 2012. Florence, Italy: Springer, 2012: 340-353.[ DOI: 10.1007/978-3-642-33715-4_25 http://dx.doi.org/10.1007/978-3-642-33715-4_25 ]

Raginsky M, Lazebnik S. Locality-sensitive binary codes from shift-invariant kernels[C]//Advances in Neural Information Processing Systems 22-Proceedings of the 2009 Conference. Vancouver, BC, Canada: Neural Information Processing Systems, 2009: 1509-1517.

Gong Y C, Lazebnik S, Gordo A, et al. Iterative quantization:a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2916-2929.[DOI:10.1109/TPAMI.2012.193

Yang H F, Lin K, Chen C S. Supervised learning of semantics-preserving Hash via deep convolutional neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(2):437-451.[DOI:10.1109/TPAMI.2017.2666812

Xia R K, Pan Y, Lai H J, et al. Supervised Hashing for image retrieval via image representation learning[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Quebec City, Canada: AAAI, 2014.

Lai H J, Pan Y, Liu Y, et al. Simultaneous feature learning and Hash coding with deep neural networks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015: 3270-3278.[ DOI: 10.1109/CVPR.2015.7298947 http://dx.doi.org/10.1109/CVPR.2015.7298947 ]

Li W J, Wang S, Kang W C. Feature learning based deep supervised Hashing with pairwise labels[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York: AAAI, 2016: 1711-1717.

Zhao F, Huang Y Z, Wang L, et al. Deep semantic ranking based Hashing for multi-label image retrieval[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 1556-1564.[ DOI: 10.1109/CVPR.2015.7298763 http://dx.doi.org/10.1109/CVPR.2015.7298763 ]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-05-23] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf .

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.

Nguyen V A, Do M N. Deep learning based supervised Hashing for efficient image retrieval[C]//Proceedings of 2016 IEEE International Conference on Multimedia and Expo. Seattle, WA, USA: IEEE, 2016.[ DOI: 10.1109/ICME.2016.7552927 http://dx.doi.org/10.1109/ICME.2016.7552927 ]

Vedaldi A, Lenc K. MatConvNet: convolutional neural networks for MATLAB[C]//Proceedings of the 23rd ACM International Conference on Multimedia. Brisbane, Australia: ACM, 2015: 689-692.[ DOI: 10.1145/2733373.2807412 http://dx.doi.org/10.1145/2733373.2807412 ]

Cakir F, He K, Bargal S A, et al. MIHash: online Hashing with mutual information[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 437-445.[ DOI: 10.1109/ICCV.2017.55 http://dx.doi.org/10.1109/ICCV.2017.55 ]