构建近邻上下文的拷贝图像检索

杨醒龙; 姚金良; 王小华; 方小飞

发布时间： 2017-07-27
摘要点击次数： 2289
全文下载次数： 306
DOI: 10.11834/jig.160562
2017 | Volume 22 | Number 8

构建近邻上下文的拷贝图像检索

杨醒龙¹, 姚金良¹, 王小华^1,2, 方小飞¹(1.杭州电子科技大学, 杭州 310018;2.中国计量大学, 杭州 310018)

摘要

目的以词袋模型为基础的拷贝图像检索方法是当前最有效的方法。然而，由于局部特征量化存在信息损失，导致视觉词汇区别能力不足和视觉词汇误匹配增加，从而影响了拷贝图像检索效果。针对视觉词汇的误匹配问题，提出一种基于近邻上下文的拷贝图像检索方法。该方法通过局部特征的上下文关系消除视觉词汇歧义，提高视觉词汇的区分度，进而提高拷贝图像的检索效果。方法首先，以距离和尺度关系选择图像中某局部特征点周围的特征点作为该特征点的上下文，选取的上下文中的局部特征点称为近邻特征点；再以近邻特征点的信息以及与该局部特征的关系为该局部特征构建上下文描述子；然后，通过计算上下文描述子的相似性对局部特征匹配对进行验证；最后，以正确匹配特征点的个数衡量图像间的相似性，并以此相似性选取若干候选图像作为返回结果。结果在Copydays图像库进行实验，与Baseline方法进行比较。在干扰图像规模为100 k时，相对于Baseline方法，mAP提高了63%。当干扰图像规模从100 k增加到1 M时，Baseline的mAP值下降9%，而本文方法下降3%。结论本文拷贝图像检索方法对图像编辑操作，如旋转、图像叠加、尺度变换以及裁剪有较高的鲁棒性。该方法可以有效地应用到图像防伪、图像去重等领域。

关键词

局部特征视觉词汇拷贝图像检索词袋模型图像检索

Image copy detection method based on contextual descriptor

Yang Xinglong¹, Yao Jinliang¹, Wang Xiaohua^1,2, Fang Xiaofei¹(1.Hangzhou Dianzi University, Hangzhou 310018, China;2.China Jiliang University, Hangzhou 310018, China)

Abstract

Objective With the rapid growth of Web images, image retrieval is becoming an important application.However, Web images are easily downloaded, edited, and re-uploaded.Therefore, a great number of image copies on the Web can be found.Finding and filtering image copies can improve the effectiveness of image search engines.Two images are considered as copy in this study based on whether these two images are generated from the same original image by certain image editing operations.Editing operations include cropping, scaling, rotating, changing the compression rate, adding other contents, framing, and other non-affine geometric transformations.Over the last few years, several methods have been proposed to detect image copy in large-scale image dataset.In the early stage of image copy detection research, an entire image is represented as a feature vector or descriptor.The similarity of feature vector or descriptor is measured to verify whether two images are copies.These methods are efficient and do not require high storage cost.However, these methods are not robust to certain common image editing operations, such as chopping and adding of objects.Local feature is more robust in image copy detection compared with global feature.However, local feature, such as SIFT, is a high-dimension feature vector and result in high time cost in local feature matching, especially in large-scale image dataset.The bag-of-words (BoW) model was applied to image copy detection field and used by state-of-the-art methods on image copy detection to solve the aforementioned problem.In these methods, an image is represented as a bag of local features, which are then quantized into visual words.Inverted file indexing is applied to register images via these visual words and improves retrieval efficiency.However, visual words have significantly lesser discriminative power than text words because of quantization loss.The loss of quantification on local feature causes a large number of mismatching local features, which affect the precision of image copy detection.Some methods have been proposed to eliminate the influence of visual word mismatches and improve image copy detection performance.The geometric verification for rejecting visual word mismatches has become popular as a visual words post-verification step.However, geometric verification methods initially need to obtain the matched pairs of visual words between query and candidate images.Then, the spatial similarity of the matched visual words between the two images is calculated to reject mismatches of visual words.The process of rejecting mismatches is usually applied to only some top-ranked candidate images because of due to expensive computational cost and large number of candidate images in large-scale datasets.In addressing the problems of post-verification processes of visual words, one basic idea that has been explored is designing a contextual descriptor that can be used to filter the mismatches of visual words immediately according to the similarity of descriptors.An image copy detection method based on contextual descriptor is proposed in this paper.The contextual descriptor consists of the information regarding the neighboring local features and improves the discriminative power of visual words.Method In the proposed method, the information about the contextual descriptor consists of neighboring visual words and the spatial relations of local features.The neighboring visual words represent the neighboring local features, whereas the spatial relations are represented as angles in the contextual descriptor.If the matching visual words have similar neighboring local feature, then the pair of visual word is considered as a true match.The process of the proposed method is as follows.The Euclidean distance and scale of the local feature are used as the context of a local feature in an image to select the neighbors.The information about neighbors, such as position, dominant orientation, and visual words are used to construct the contextual descriptor.Subsequently, each candidate match of the local feature is verified according to the similarity measure of the contextual descriptor whether it is a true match.In this measure, if the matching visual words have the same neighbors and similar spatial relations, then the matching pair of visual words is considered as a true match.Finally, the similarity between images is measured by the number of true matches of visual words.In this study, neighboring visual words and spatial relations are used to verify matching visual words.The verification measure is significantly strict.Most mismatched visual words are eliminated. Result Some experiments are performed on the Copydays database and compared the proposed method with the baseline method.Experiments show that the mean average precision (mAP) of the proposed method increased by 63% compared with the baseline method, whereas the distracter images are 100 k.Although the distracter images increased from 100 k to 1 M, the experimental results of the baseline still decreased by 9% and the proposed method only decreased by 3%.In the proposed method, neighboring visual words are indexed into the contextual descriptor before image copy detection.Thus, the proposed method belongs to the pre-verification method and has less detection time compared with the post-verification method, which is confirmed by the experimental results.Conclusion In this study, an image copy detection method that is robust to most image editing operations, such as rotation, image overlay, scale, and cropping, is proposed.The proposed method obtains high mAP in public test dataset and is efficient in real application scene, such as image copyright prevention and image duplication removal.

Keywords

local feature visual word image copy detection Bag-of-Words image detection

在线采编平台

在线出版

年度会议

下载中心

年度信息