发布时间: 2017-08-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.160562
2017 | Volume 22 | Number 8

图像分析和识别

构建近邻上下文的拷贝图像检索

杨醒龙¹, 姚金良¹, 王小华^1,2, 方小飞¹

1. 杭州电子科技大学, 杭州 310018;

2. 中国计量大学, 杭州 310018

收稿日期: 2016-11-04; 修回日期: 2017-05-04

基金项目: 国家自然科学基金项目（61202280）

第一作者简介: 杨醒龙(1993—), 男, 杭州电子科技大学计算机技术专业硕士研究生, 主要研究方向为图像检索。E-mail:735396307@qq.com

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2017)08-1098-08

摘要

目的以词袋模型为基础的拷贝图像检索方法是当前最有效的方法。然而，由于局部特征量化存在信息损失，导致视觉词汇区别能力不足和视觉词汇误匹配增加，从而影响了拷贝图像检索效果。针对视觉词汇的误匹配问题，提出一种基于近邻上下文的拷贝图像检索方法。该方法通过局部特征的上下文关系消除视觉词汇歧义，提高视觉词汇的区分度，进而提高拷贝图像的检索效果。方法首先，以距离和尺度关系选择图像中某局部特征点周围的特征点作为该特征点的上下文，选取的上下文中的局部特征点称为近邻特征点；再以近邻特征点的信息以及与该局部特征的关系为该局部特征构建上下文描述子；然后，通过计算上下文描述子的相似性对局部特征匹配对进行验证；最后，以正确匹配特征点的个数衡量图像间的相似性，并以此相似性选取若干候选图像作为返回结果。结果在Copydays图像库进行实验，与Baseline方法进行比较。在干扰图像规模为100 k时，相对于Baseline方法，mAP提高了63%。当干扰图像规模从100 k增加到1 M时，Baseline的mAP值下降9%，而本文方法下降3%。结论本文拷贝图像检索方法对图像编辑操作，如旋转、图像叠加、尺度变换以及裁剪有较高的鲁棒性。该方法可以有效地应用到图像防伪、图像去重等领域。

关键词

局部特征; 视觉词汇; 拷贝图像检索; 词袋模型; 图像检索

Image copy detection method based on contextual descriptor

Yang Xinglong¹, Yao Jinliang¹, Wang Xiaohua^1,2, Fang Xiaofei¹

1. Hangzhou Dianzi University, Hangzhou 310018, China;

2. China Jiliang University, Hangzhou 310018, China

Supported by: National Natural Science Foundation of China(61202280)

Abstract

Objective With the rapid growth of Web images, image retrieval is becoming an important application.However, Web images are easily downloaded, edited, and re-uploaded.Therefore, a great number of image copies on the Web can be found.Finding and filtering image copies can improve the effectiveness of image search engines.Two images are considered as copy in this study based on whether these two images are generated from the same original image by certain image editing operations.Editing operations include cropping, scaling, rotating, changing the compression rate, adding other contents, framing, and other non-affine geometric transformations.Over the last few years, several methods have been proposed to detect image copy in large-scale image dataset.In the early stage of image copy detection research, an entire image is represented as a feature vector or descriptor.The similarity of feature vector or descriptor is measured to verify whether two images are copies.These methods are efficient and do not require high storage cost.However, these methods are not robust to certain common image editing operations, such as chopping and adding of objects.Local feature is more robust in image copy detection compared with global feature.However, local feature, such as SIFT, is a high-dimension feature vector and result in high time cost in local feature matching, especially in large-scale image dataset.The bag-of-words (BoW) model was applied to image copy detection field and used by state-of-the-art methods on image copy detection to solve the aforementioned problem.In these methods, an image is represented as a bag of local features, which are then quantized into visual words.Inverted file indexing is applied to register images via these visual words and improves retrieval efficiency.However, visual words have significantly lesser discriminative power than text words because of quantization loss.The loss of quantification on local feature causes a large number of mismatching local features, which affect the precision of image copy detection.Some methods have been proposed to eliminate the influence of visual word mismatches and improve image copy detection performance.The geometric verification for rejecting visual word mismatches has become popular as a visual words post-verification step.However, geometric verification methods initially need to obtain the matched pairs of visual words between query and candidate images.Then, the spatial similarity of the matched visual words between the two images is calculated to reject mismatches of visual words.The process of rejecting mismatches is usually applied to only some top-ranked candidate images because of due to expensive computational cost and large number of candidate images in large-scale datasets.In addressing the problems of post-verification processes of visual words, one basic idea that has been explored is designing a contextual descriptor that can be used to filter the mismatches of visual words immediately according to the similarity of descriptors.An image copy detection method based on contextual descriptor is proposed in this paper.The contextual descriptor consists of the information regarding the neighboring local features and improves the discriminative power of visual words. Method In the proposed method, the information about the contextual descriptor consists of neighboring visual words and the spatial relations of local features.The neighboring visual words represent the neighboring local features, whereas the spatial relations are represented as angles in the contextual descriptor.If the matching visual words have similar neighboring local feature, then the pair of visual word is considered as a true match.The process of the proposed method is as follows.The Euclidean distance and scale of the local feature are used as the context of a local feature in an image to select the neighbors.The information about neighbors, such as position, dominant orientation, and visual words are used to construct the contextual descriptor.Subsequently, each candidate match of the local feature is verified according to the similarity measure of the contextual descriptor whether it is a true match.In this measure, if the matching visual words have the same neighbors and similar spatial relations, then the matching pair of visual words is considered as a true match.Finally, the similarity between images is measured by the number of true matches of visual words.In this study, neighboring visual words and spatial relations are used to verify matching visual words.The verification measure is significantly strict.Most mismatched visual words are eliminated. Result Some experiments are performed on the Copydays database and compared the proposed method with the baseline method.Experiments show that the mean average precision (mAP) of the proposed method increased by 63% compared with the baseline method, whereas the distracter images are 100 k.Although the distracter images increased from 100 k to 1 M, the experimental results of the baseline still decreased by 9% and the proposed method only decreased by 3%.In the proposed method, neighboring visual words are indexed into the contextual descriptor before image copy detection.Thus, the proposed method belongs to the pre-verification method and has less detection time compared with the post-verification method, which is confirmed by the experimental results. Conclusion In this study, an image copy detection method that is robust to most image editing operations, such as rotation, image overlay, scale, and cropping, is proposed.The proposed method obtains high mAP in public test dataset and is efficient in real application scene, such as image copyright prevention and image duplication removal.

Key words

local feature; visual word; image copy detection; Bag-of-Words; image detection

0 引言

图 1 拷贝图像示例

Fig. 1 An example of copy-images ((a) original image; (b) cropping; (c) rotation; (d) image overlaying)

针对此现象，早期的研究者提出了基于全局特征的拷贝图像检索方法，此类方法将一幅图像表示为一个全局特征。如基于顺序测度^[1-2]的方法，基于图像边缘的方法^[3]。基于全局特征的方法计算较为简单，检索速度快，但对于旋转、裁剪等操作鲁棒性不高。随着具有旋转和尺度变换鲁棒性的局部特征，特别是尺度不变特征变换(SIFT)^[4]的提出，基于局部特征的图像检索方法被广泛使用，并被应用于拷贝图像检索中。

基于局部特征的拷贝图像检索方法采用词袋模型^[5]表示图像。为实现局部特征的快速匹配，将局部特征量化为视觉词汇是当前流行的方式。然而由于局部特征在量化过程中存在信息损失，导致视觉词汇的区分能力不足。从而，影响了拷贝图像检索方法的效果。

针对视觉词汇的误匹配问题，学者们提出了一些方法用于过滤误匹配特征点，从而提高拷贝图像的检索效果^[6-9]。RANSAC算法^[9]是一种有着很高精度的后验证方法。但由于该方法需要对随机抽样匹配对进行多次仿射估计，导致时间复杂度较高，难以满足实时检索的要求。为提高视觉词汇验证效率，有学者提出了通过局部特征的上下文关系对候选特征点进行验证的方法^[6-8]，如空间位置关系、主方向关系或者尺度关系。此类方法大大提高了图像检索效率，而且有很好的检索效果。但此类方法属于后验证方法，需要先检索得到候选图像，再对候选图像进行一一验证，影响了检索效率。

为解决上述问题，本文依据局部特征之间稳定的几何关系，提出一种基于近邻上下文的拷贝图像检索方法。该方法在建立索引时，以待索引特征点与其近邻特征点之间的关系构建上下文描述子，并将该上下文描述子与待索引特征点的视觉词汇一起存储到倒排索引库中。在检索拷贝图像时，先通过视觉词汇查询倒排文件获取视觉词汇匹配的对应局部特征点，再根据上下文描述子的相似性来验证该匹配对是否正确匹配。实验结果表明，该方法有较高的准确率与召回率。

1 相关工作

当前，基于局部特征拷贝图像检索的关键问题是如何提高视觉词汇匹配的准确率。一个直观的思路是为视觉词汇增加附加信息，从而提高视觉词汇的区分能力。Mortensen^[10]提出了一种改进的SIFT算法：该算法在局部特征上考虑全局性，为每一个SIFT特征点增加全局纹理向量。也有研究者着重研究局部特征量化的过程；Zheng^[11]提出采用多个字典量化局部特征的方法来提高图像检索的召回率。但多个字典量化会造成数据冗余，增加了检索时间和存储空间。为解决这一问题，Zheng提出了一种基于多个字典的贝叶斯合并方法降低字典的关联度，减少冗余数据。此方法可以有效提高图像检索的召回率，但使用多个字典量化，会降低图像的检索效率。

提高视觉词汇匹配准确率的另一种思路是通过局部特征之间的上下文关系对局部特征匹配对进行验证。此类方法可分为后验证方法和先验证方法。

后验证方法是先通过匹配视觉词汇提取候选图像，然后以特征点之间关系的一致性约束验证候选特征点^{[6-8, 12]}实现对候选图像的验证。Zheng^[12]提出了一种提炼视觉短语空间约束的方法进行大规模图像检索，弥补了视觉短语方法中缺少空间关系约束的问题。Zhou^[6]提出了一种编码局部特征点空间位置关系的方法，通过汉明距离确认局部特征点与其他特征点之间位置关系的一致性，从而剔除不满足空间约束的特征点。以上两种方法考虑了视觉短语或视觉词汇之间空间位置关系，有很好的检索效果，但在旋转鲁棒性方面计算较为复杂。Jegou^[8]提出了一种汉明嵌入的方法(HE)以及一种基于弱几何一致性的后验证方法(WGC)。汉明嵌入用于增加视觉词汇的区分能力，弱几何一致性验证通过局部特征之间的尺度以及主方向关系对特征点进行验证。两者结合可以大大提高图像的检索效果。

先验证方法^[13-16]是在索引特征点时构建局部特征的上下文描述子，并将上下文描述子与该特征点的视觉词汇一起索引到索引库中。Liu^[17]提出了索引局部特征空间上下文信息的方法。该方法是以某一局部特征作为坐标系中心，以主方向为坐标轴建立坐标系，并计算近邻特征点的权重。最后采用正交投影矩阵以二进制签名的方法将近邻特征描述子压缩保存。该方法需要计算周围特征点的权重和，很容易受到周围特征点的干扰，而且对于尺度变换等操作鲁棒性不足。Yao^[13]提出一种基于局部上下文描述子的拷贝图像检索方法。该方法将局部特征与其近邻特征点之间的几何关系量化为上下文描述子。在检索过程中，通过近邻特征点描述子的相对顺序对候选特征点进行验证。此方法效率很高，但对于特征点之间的几何关系较为敏感。先验证方法不需要提取候选图像，仅通过特征点的上下文描述子对候选特征点进行验证。因此，先验证方法相对于后验证方法效率较高。

鉴于先验证方法在大规模图像库中检索效率的优势以及存在的问题，提出一种基于近邻上下文的拷贝图像检索方法。该方法在建立索引时为每个局部特征构建上下文描述子，在图像检索时通过计算上下文描述子的相似性对局部特征匹配对进行验证，达到提高拷贝图像检索准确率的目的。

2 基于近邻上下文的拷贝图像检索方法

2.1 工作框架与索引结构

本文方法框架如图 2所示，分为索引过程和检索过程两个过程。

图 2 本文方法框架

Fig. 2 The framework of our method

索引过程步骤如下：

1) 对于一幅待索引图像，提取其局部特征。

2) 通过视觉词汇词典以乘积量化的方法将局部特征量化为视觉词汇。

3) 为每个局部特征的视觉词汇构建上下文描述子，其中上下文描述子用于检索时验证视觉词汇匹配对。

4) 通过倒排索引的方法将局部特征信息存储到倒排索引库中。

检索过程步骤如下：

检索过程的步骤1)2)3) 与索引过程步骤1)2)3) 相同。

4) 以步骤2) 量化得到的视觉词汇与倒排索引库中的视觉词汇进行匹配，得到大量视觉词汇匹配对。

5) 依据上下文描述子对从索引库中匹配到的视觉词汇进行验证。

6) 对于每幅候选图像，统计步骤5) 所得到的正确匹配的局部特征个数，并以这个值衡量该候选图像与待检索图像的相似性。

在实现时，采用倒排索引对局部特征进行索引，索引结构如图 3所示。该索引以视觉词汇作为关键词，对于每个视觉词汇记录其在索引库中的偏移和个数。每个视觉词汇包含如下信息：ItemId(视觉词汇)，ImgId(图像唯一标识)，Contexts(上下文描述子)。每个上下文描述子包含若干近邻特征点信息，每个近邻特征点的信息包括主方向差，角位差以及近邻特征点的视觉词汇。

图 3 该方法的索引结构

Fig. 3 The storage structure of our method

2.2 上下文描述子生成

1) 近邻特征点选取。为某一待索引局部特征点构建上下文描述子时，首先要为其选取近邻特征点作为该点的上下文。考虑到尺度越大的局部特征对于尺度变换操作具有较高的鲁棒性，因此，尽可能选取具有大尺度的局部特征作为近邻特征点。另外，距离较远的特征点容易受到图像裁剪、添加其他对象等操作的影响，因此，尽可能选取邻近的点作为近邻特征点。为了提高查找近邻特征点的效率，本方法的具体计算步骤如下：

(1) 提取图像的SIFT特征点，将特征点按照尺度从大到小排序。

(2) 在有序的特征点中选取比该待索引特征点尺度大的特征点加入候选列表。若候选列表的特征点数量不足，则按照尺度大小补足。

(3) 选取候选列表中与该待索引特征点欧氏距离最小的若干特征点作为近邻特征点。

2) 上下文描述子生成。本文方法通过集合来表示局部特征点的若干近邻特征点。为了能够更为准确地描述这些近邻特征点与该待索引特征点的关系，特征点之间的关系采用主方向关系和方位关系来表示。对每个近邻特征点，构建了主方向差、角位差(待索引特征点和近邻特征点连线，与待索引特征点的主方向的夹角)以及该近邻特征点的视觉词汇来表示该近邻特征点。主方向差、角位差计算方法如下：

主方向差是待索引特征点主方向与其近邻特征点主方向的角度差。如图 4所示，待索引特征点${P_0} $与近邻特征点${P_1} $主方向之间的夹角为$\beta $，即主方向差为$\beta $。主方向差为

图 4 近邻特征点${P_0} $、${P_1} $的角位差为$\alpha $，主方向差为$\beta $

Fig. 4 $ {P_0}$、${P_1} $ are SIFT features, ${P_1} $ is a neighbor feature of ${P_0} $$\alpha $is difference of direction and orientation, $\beta $ is difference of orientations

$ \beta = \left| {Or{i_{{P_0}}}-Or{i_{{P_1}}}} \right| $

(1)

式中，${Or{i_{{P_0}}}} $、$ {Or{i_{{P_1}}}}$分别表示待索引特征点${P_0} $与近邻特征点${P_1} $的主方向。

角位差是待索引特征点和近邻特征点连线与该待索引特征点主方向之间的夹角，即图 4中的$\alpha $。角位差主要描述了近邻特征点在待索引特征点的方位，并以主方向作为参考角度，从而消除图像旋转带来的影响。角位差为

$ \alpha = \left| {fun\left( {{P_0}, {P_1}} \right)-Or{i_{{P_0}}}} \right| $

(2)

式中，${fun\left( {{P_0}, {P_1}} \right)} $用以计算$ {P_0}$，${P_1} $连线与水平方向的夹角，${Or{i_{{P_0}}}} $为${P_0} $的主方向。

在该方法中，一个特征点被索引的信息包括：视觉词汇(ItemId)、图像唯一标识(ImgId)以及一个上下文描述子(Contexts)。其中，每个上下文描述子包含若干近邻特征点信息。

2.3 上下文描述子相似性验证

本文方法首先以待检索图像特征点的视觉词汇从倒排索引库中匹配到大量的候选特征点。然后通过计算特征点匹配对之间上下文描述子的相似性验证候选特征点。候选特征点的验证过程如下：

对于匹配(视觉词汇相等)的特征点，依次比较上下文描述子中的近邻特征点。若待检索特征点与候选特征点的某个近邻特征点视觉词汇相同，而且主方向差之差和角位差之差小于给定阈值，认为该候选视觉词汇匹配对的一个近邻特征点相似。在本文方法中，若一个近邻特征点相似，则认为该候选特征匹配对正确匹配。主方向差之差和角位差之差为

$ {D_-}{\beta _{i, i'}} = \left| {{Q_-}{\beta _i}-{C_ - }{\beta _{i'}}} \right| $

(3)

$ {D_-}{\alpha _{i, i'}} = \left| {{Q_-}{\alpha _i}-{C_ - }{\alpha _{i'}}} \right| $

(4)

式中，${{Q_-}{\beta _i}} $表示待检索特征点与其一近邻特征点$i $之间主方向差，${{C_-}{\beta _{i'}}} $表示对应的候选特征点与其近邻特征${i'} $之间的主方向差。${D_-}{\beta _{i, i'}} $表示该待检索特征点与其近邻特征点$i $的主方向差与对应的候选特征点与其近邻特征点$ {i'}$的主方向差的角度差。${D_-}{\alpha _{i, i'}} $表示角位差的差。若该局部特征匹配点对是正确的匹配对，则这${D_-}{\beta _{i, i'}} $与${D_-}{\alpha _{i, i'}} $两个值应趋近于0。即

$ {P_j} = \left\{ \begin{array}{l} 1\;\;\;itemI{d_i} = itemI{d_{i'}}\;\;且\\ \;\;\;\;{D_-}{\beta _{i, i'}} < {T_1}\;且{D_-}{\alpha _{i, i'}} < {T_2}\\ 0\;\;\;其他 \end{array} \right. $

(5)

式中，${T_1} $和${T_2} $是用于验证角度的阈值，$\;itemI{d_i} $是近邻特征$i $的视觉词汇。

统计每幅候选图像的正确匹配的局部特征个数

$ {C_{{\rm{img}}}} = \sum\limits_{j = 0}^n {{P_j}} $

(6)

式中，$n $是候选图像特征点总个数，${P_j} $表示第$j $个局部特征是否通过验证。最后，以正确匹配的局部特征个数衡量图像间的相似性。在本文方法中，对于候选图像集，通过对每个候选图像的正确匹配局部特征个数进行排序，排序越靠前表示该候选图像与待检索图像相似性越强，然后选取若干个作为拷贝图像返回。

3 实验分析

3.1 实验设置

为测试该方法的有效性和效率，选取Copydays图像库^[8]作为拷贝图像测试库。该库包含157幅原始图像，对于每一幅图像包含19幅拷贝图像。这些拷贝图像主要由3种编辑操作得到：裁剪、JPEG压缩、图像增强。为测试该方法的稳定性，该实验采用Flickr^[18]图像库作为干扰图像，该库包含1 M图像。

实验结果采用mAP(mean average precision)^[5]来进行评价。mAP算法不仅考虑了拷贝图像的召回率，而且兼顾了正确拷贝图像在检索结果中的次序，能够很好的体现图像检索的效果。在计算mAP时，选取返回结果中排序靠前的前50幅候选图像作为返回结果进行计算。

3.2 参数测试

在该实验中，影响检索效果的参数主要有索引的近邻特征点个数($N $)，主方向差以及角位差的验证阈值。在这里将主方向差与角位差的验证阈值设置为相等的参数($T $)。为测试参数对实验效果的影响，该实验取100 k干扰图像对不同参数进行测试。实验结果如表 1所示。

表 1 在100 k干扰图像下，不同近邻特征点个数$N $以及不同角度阈值$T $下的mAP值
Table 1 The mAP results with different neighbor feature numbers ($N $) and the angle difference thresholds ($T $) on 100 k database

下载CSV

$N $	$T $
$N $	3	5	7	9	11
3	0.793	0.81	0.817	0.817	0.816
5	0.824	0.838	0.842	0.841	0.841
7	0.839	0.85	0.854	0.854	0.853
9	0.843	0.853	0.857	0.857	0.856

从表 1可以看出，随着索引的近邻特征点个数的增加，mAP会随之增加。这是因为本文方法是通过两个角度阈值与视觉词汇对候选局部特征匹配对进行验证，这样的验证十分严格，因此会过滤掉绝大部分的误匹配点。但当选取近邻特征数$N $较小时，该方法对于压缩以及图像增强(此操作会使有些SIFT特征点消失，也有可能会产生新的SIFT特征，影响近邻特征的匹配)等操作鲁棒性不足。因此提升$N $的值，增加索引的近邻特征点的个数，可以增加该方法对压缩以及图像增强操作的鲁棒性。从图 5可以看出，随着$N $的增加，平均检索时间也会增加。因此权衡检索效率以及实验效果，后续实验的参数选取$N $为7，$T $为7。

图 5 检索时间随近邻特征点个数的变化

Fig. 5 The mean search time with different neighbor feature numbers

要求候选局部特征更多的近邻特征点相似，会增加验证局部特征的严格程度。这样可以取得更高的返回精确率，但会降低召回率。通过实验发现，当要求候选局部特征匹配对两个近邻特征点相似时，mAP从0.854下降到0.743。因此，后续实验中近邻特征点相似阈值设置为1，角度阈值选取7，索引的近邻特征点个数选取7。

3.3 评价

为评价该方法的效果，该实验与其他3种方法进行了对比：第1种是编码特征点之间空间位置关系的后验证方法(SC)^[6]，这里选取参数$r $和$S $分别为1和0.5。第2种是基于上下文描述子(contextual descriptor)的先验证方法(CD)^[13]，这里选取参数$Ts $和$N $分别为4和8。最后一种是Baseline方法^[5]，该方法仅仅统计匹配视觉词汇的个数，不对候选特征点做任何验证。

从图 6和表 2可以看出，相对于SC、CD和Baseline方法，本文方法有更好的检索效果和较低的平均检索时间。另外，测试了方法在网络拷贝图像上的效果。网络图像是全部来自于互联网的拷贝图像。如图 7所示，该实验对一组网络图片进行了检索，干扰图像为100 k。从实验结果可以看出，该方法对裁剪、旋转、图像叠加以及色彩对比度变换等操作都有很好的效果。

图 6 不同方法的实验效果比较(干扰图像为100 k)

Fig. 6 Comparing mAP with different methods on the 100 k database

表 2 在100 k干扰图像下，不同方法mAP与平均检索时间
Table 2 The mAP result and mean search time in different methods on the 100 k database

下载CSV

方法	mAP/%	检索时间/s
本文	85.4	3.7
SC	81	4.3
CD	82.3	3.6
Baseline	21.5	3.2

图 7 网络图像测试(第1幅为查询图像，其余为返回图像)

Fig. 7 The proposed method works on web image (the first is query image, the others are return image)

为测试该方法在不同规模干扰图像下的检索效果，分别在100 k、200 k、500 k以及1 M干扰图像下对3种方法进行实验，结果如图 8所示。从图 8可以看出，相对于Baseline，本文方法对于干扰图像规模的鲁棒性较强。本文方法在干扰图像规模从100 k增加到1 M时，mAP下降约为3 %，而Baseline方法下降幅度为9 %。

图 8 不同规模干扰图像下的mAP

Fig. 8 The change of mAP on different size databases

4 结论

本文提出了一种基于近邻上下文的拷贝图像检索方法。该方法选择若干上下文特征点作为近邻特征点来构建上下文描述子。并通过上下文描述子的相似性对局部特征匹配对进行验证。经实验验证，该方法对于裁剪、旋转以及图像叠加等操作有着很好的鲁棒性。而且，相对于后验证方法，有着很高的效率。但由于本文方法在索引局部特征时需要保存近邻特征点的视觉词汇等信息，因此需要较多的存储空间。在以后的工作中，可以考虑对上下文描述子进行压缩，以减少索引空间。

另外本文方法在检索拷贝图像时有较好的效果，但对于透视变换等改变视角的图像检索效果一般。这是因为本文方法在检索图像时关系设置比较严格，透视变换会影响SIFT特征的主方向以及视觉词汇，因此难以直接应用于通用的图像检索。

参考文献

[1] Xu ZH, Ling H F, Zou F H, et al.Fast and robust video copy detection scheme using full DCT coefficients[C]//Proceedings of IEEE International Conference on Multimedia and Expo.New York, NY, USA:IEEE, 2009:434-437.[DOI:10.1109/ICME.2009.5202527]

[2] Xu Z H, Ling H F, Zou F H, et al.Robust image copy detection using multi-resolution histogram[C]//Proceedings of the International Conference on Multimedia Information Retrieval.Philadelphia, Pennsylvania, USA:ACM, 2010:129-136.[DOI:10.1145/1743384.1743410]

[3] Lin CC, Klara N, Hung C J.An image copy detection scheme based on edge features[C]//Proceedings of 2008 IEEE International Conference on Multimedia and Expo.Hannover, Germany:IEEE, 2008:665-668.[DOI:10.1109/ICME.2008.4607522]

[4] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. [DOI:10.1023/B:VISI.0000029664.99615.94]

[5] Philbin J, Chum O, Isard M, et al.Object retrieval with large vocabularies and fast spatial matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Minneapolis, Minnesota, USA:IEEE, 2007:1-8.[DOI:10.1109/CVPR.2007.383172]

[6] Zhou W G, Lu Y J, Li H Q, et al.Spatial coding for large scale partial-duplicate web image search[C]//Proceedings of the 18th ACM International Conference on Multimedia.Firenze, Italy:ACM 2010:511-520.[DOI:10.1145/1873951.1874019]

[7] Zhou W G, Li H Q, Lu Y J, et al. SIFT match verification by geometric coding for large-scale partial-duplicate web image search[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2013, 9(1): #4.

[8] Jegou H, Douze M, Schmid C.Hamming embedding and weak geometric consistency for large scale image search[C]//Proceedings of the 10th European Conference on Computer Vision.Marseille, France:Springer-Verlag, 2008:304-317.[DOI:10.1007/978-3-540-88682-2_24]

[9] Fischler M A, Bolles R C. Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6): 381–395. [DOI:10.1145/358669.358692]

[10] Mortensen E N, Deng H L, Shapiro L.A SIFT descriptor with global context[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Diego, California, USA:IEEE, 2005, 1:184-190.[DOI:10.1109/CVPR.2005.45]

[11] Zheng L, Wang S J, Zhou W G, et al.Bayes merging of multiple vocabularies for scalable image retrieval[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition.Columbus, OH, USA:IEEE, 2014:1963-1970.[DOI:10.1109/CVPR.2014.252]

[12] Zheng L, Wang S J. Visual phraselet:refining spatial constraints for large scale image search[J]. IEEE Signal Processing Letters, 2013, 20(4): 391–394. [DOI:10.1109/LSP.2013.2249513]

[13] Yao J L, Yang B, Zhu Q M. Near-duplicate image retrieval based on contextual descriptor[J]. IEEE Signal Processing Letters, 2015, 22(9): 1404–1408. [DOI:10.1109/LSP.2014.2377795]

[14] Wang X Y, Yang M, Cour T, et al.Contextual weighting for vocabulary tree based image retrieval[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV).Barcelona, Spain:IEEE, 2011:209-216.[DOI:10.1109/ICCV.2011.6126244]

[15] Cao Y, Wang C H, Li Z W, et al.Spatial-bag-of-features[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).San Francisco, CA, USA:IEEE, 2010:3352-3359.[DOI:10.1109/CVPR.2010.5540021]

[16] Zhang Y M, Jia Z Y, Chen T.Image retrieval with geometry-preserving visual phrases[C]//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition.Providence, RI, USA:IEEE, 2011:809-816.[DOI:10.1109/CVPR.2011.5995528]

[17] Liu Z, Li H Q, Zhou W G, et al.Embedding spatial context information into inverted filefor large-scale image retrieval[C]//Proceedings of the 20th ACM International Conference on Multimedia.Nara, Japan:ACM, 2012:199-208.[DOI:10.1145/2393347.2393380]

[18] Huiskes M J, Thomee B, Lew M S.New trends and ideas in visual concept detection:the MIR flickr retrieval evaluation initiative[C]//Proceedings of the International Conference on Multimedia Information Retrieval.Philadelphia, Pennsylvania, USA:ACM, 2010:527-536.[DOI:10.1145/1743384.1743475]