目标提取与哈希机制的多标签图像检索

陈飞; 吕绍和; 李军; 王晓东; 窦勇

doi:10.11834/jig.20170211

第11届图像图形技术与应用学术会议栏目 | 浏览量 : 0 下载量: 2370 CSCD: 7

PDF
导出
分享
收藏
专辑

目标提取与哈希机制的多标签图像检索
Multi-label image retrieval by hashing with object proposal
2017年22卷第2期页码：232-240
网络出版：2017-01-21，

纸质出版：2017
DOI： 10.11834/jig.20170211
稿件说明：

移动端阅览

陈飞, 吕绍和, 李军, 王晓东, 窦勇. 目标提取与哈希机制的多标签图像检索[J]. 中国图象图形学报, 2017,22(2):232-240. DOI： 10.11834/jig.20170211.

Chen Fei, Lyu Shaohe, Li Jun, Wang Xiaodong, Dou Yong. Multi-label image retrieval by hashing with object proposal[J]. Journal of Image and Graphics, 2017, 22(2): 232-240. DOI： 10.11834/jig.20170211.

摘要

哈希是大规模图像检索的有效方法。为提高检索精度，哈希码应保留语义信息。图像之间越相似，其哈希码也应越接近。现有方法首先提取描述图像整体的特征，然后生成哈希码。这种方法不能精确地描述图像包含的多个目标，限制了多标签图像检索的精度。为此提出一种基于卷积神经网络和目标提取的哈希生成方法。首先提取图像中可能包含目标的一系列区域，然后用深度卷积神经网络提取每个区域的特征并进行融合，通过生成一组特征来刻画图像中的每个目标，最后再产生整幅图像的哈希码。采用Triplet Loss的训练方法，使得哈希码尽可能保留语义信息。在VOC2012、Flickr25K和NUSWIDE数据集上进行多标签图像检索。在NDCG(normalized discounted cumulative gain)性能指标上，当返回图像数量为 1 000时，对于VOC2012，本文方法相对于DSRH(deep semantic ranking hashing)方法提高24个百分点，相对于ITQ-CCA(iterative quantization-canonical correlation analysis)方法能提高36个百分点；对于Flickr25，本文方法比DSRH方法能提高2个左右的百分点；对于NUSWIDE，本文方法相对于DSRH方法能提高4个左右的百分点。对于平均检索准确度，本文方法在NUSWIDE和Flickr25上能提高25个百分点。根据多项评价指标可以看出，本文方法能以更细粒度来精确地描述图像，显著提高了多标签图像检索的性能。本文新的特征学习模型，对图像进行细粒度特征编码是一种可行的方法，能够有效提高数据集的检索性能。

Abstract

Hashing is an effective means for large-scale image retrieval. Preserving the semantic similarity in hash codes (i.e.

the distance between the hash codes of two images)should be small when the images are similar to improve the retrieval performance. Conventional methods first extract the overall image feature and then generate a single hash code. Such methods cannot characterize the image content for multiple objects

which results in a low accuracy of multi-label image retrieval. This study proposes a new hash generation method with object proposals. We propose a new deep-network-based framework to construct hash functions that learn directly from images that contain multiple labels. The model first derives a series of interesting regions that may contain objects and then generates the features of each region through deep convolutional neural networks. It finally generates a group of hash codes to describe all the objects in an image. The compact hash code will be generated to represent the entire image. A novel triplet-loss based training method is adopted to preserve the semantic order of the hash codes. The image retrieval experiments on the VOC2012

Flickr25K

and NUSWIDE datasets show that the NDCG (normalized discounted cumulative gain)value of our method can be improved by 2% to 4% unlike DSRH (deep semantic ranking hashing)and 3% to 6% unlike ITQ-CCA (iterative quantization-canonical correlation analysis)on VOC2012. Our method can attain the improvements by approximately 2% on Flickr25 and 4% on NUSWIDE. Our method can obtain 2% to 5% on the Flickr25 and NUSWIDE datasets over the DSRH for the map evaluation. Thus

the new method can describe an image accurately in a fine-grained way

and the performance is improved significantly for multi-label image retrieval. This study proposes a new model to learn compact features

and experiment results show that the fine-grained feature embedding of an image is practicable. Thus

our method outperforms other state-of-the-art hashing methods in terms of image retrieval.