目标提取与哈希机制的多标签图像检索

陈飞; 吕绍和; 李军; 王晓东; 窦勇

发布时间： 2017-01-21
摘要点击次数： 6583
全文下载次数： 2452
DOI: 10.11834/jig.20170211
2017 | Volume 22 | Number 2

第11届图像图形技术与应用学术会议栏目
<< 上一篇
下一篇>>

目标提取与哈希机制的多标签图像检索

陈飞, 吕绍和, 李军, 王晓东, 窦勇(国防科学技术大学并行与分布处理重点实验室, 长沙 410073)

摘要

目的哈希是大规模图像检索的有效方法。为提高检索精度，哈希码应保留语义信息。图像之间越相似，其哈希码也应越接近。现有方法首先提取描述图像整体的特征，然后生成哈希码。这种方法不能精确地描述图像包含的多个目标，限制了多标签图像检索的精度。为此提出一种基于卷积神经网络和目标提取的哈希生成方法。方法首先提取图像中可能包含目标的一系列区域，然后用深度卷积神经网络提取每个区域的特征并进行融合，通过生成一组特征来刻画图像中的每个目标，最后再产生整幅图像的哈希码。采用Triplet Loss的训练方法，使得哈希码尽可能保留语义信息。结果在VOC2012、Flickr25K和NUSWIDE数据集上进行多标签图像检索。在NDCG(normalized discounted cumulative gain)性能指标上，当返回图像数量为 1 000时，对于VOC2012，本文方法相对于DSRH(deep semantic ranking hashing)方法提高24个百分点，相对于ITQ-CCA(iterative quantization-canonical correlation analysis)方法能提高36个百分点；对于Flickr25，本文方法比DSRH方法能提高2个左右的百分点；对于NUSWIDE，本文方法相对于DSRH方法能提高4个左右的百分点。对于平均检索准确度，本文方法在NUSWIDE和Flickr25上能提高25个百分点。根据多项评价指标可以看出，本文方法能以更细粒度来精确地描述图像，显著提高了多标签图像检索的性能。结论本文新的特征学习模型，对图像进行细粒度特征编码是一种可行的方法，能够有效提高数据集的检索性能。

关键词

图像检索卷积神经网络哈希多标签

Multi-label image retrieval by hashing with object proposal

Chen Fei, Lyu Shaohe, Li Jun, Wang Xiaodong, Dou Yong(National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China)

Abstract

Objective Hashing is an effective means for large-scale image retrieval. Preserving the semantic similarity in hash codes (i.e., the distance between the hash codes of two images)should be small when the images are similar to improve the retrieval performance. Conventional methods first extract the overall image feature and then generate a single hash code. Such methods cannot characterize the image content for multiple objects, which results in a low accuracy of multi-label image retrieval. This study proposes a new hash generation method with object proposals. Method We propose a new deep-network-based framework to construct hash functions that learn directly from images that contain multiple labels. The model first derives a series of interesting regions that may contain objects and then generates the features of each region through deep convolutional neural networks. It finally generates a group of hash codes to describe all the objects in an image. The compact hash code will be generated to represent the entire image. A novel triplet-loss based training method is adopted to preserve the semantic order of the hash codes. Result The image retrieval experiments on the VOC2012, Flickr25K, and NUSWIDE datasets show that the NDCG (normalized discounted cumulative gain)value of our method can be improved by 2% to 4% unlike DSRH (deep semantic ranking hashing)and 3% to 6% unlike ITQ-CCA (iterative quantization-canonical correlation analysis)on VOC2012. Our method can attain the improvements by approximately 2% on Flickr25 and 4% on NUSWIDE. Our method can obtain 2% to 5% on the Flickr25 and NUSWIDE datasets over the DSRH for the map evaluation. Thus, the new method can describe an image accurately in a fine-grained way, and the performance is improved significantly for multi-label image retrieval. Conclusion This study proposes a new model to learn compact features, and experiment results show that the fine-grained feature embedding of an image is practicable. Thus, our method outperforms other state-of-the-art hashing methods in terms of image retrieval.

Keywords

image retrieval convolutional neural networks hash multi-label