发布时间: 2018-05-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170297 2018 | Volume 23 | Number 5 图像处理和编码

1. 中国科学院大学, 北京 100049;
2. 中国科学院成都计算机应用研究所, 成都 610041
 收稿日期: 2017-06-26; 修回日期: 2017-11-14 基金项目: 四川省科技厅重点研发基金项目（2017SZ0010，2016JZ0035）；中科院西部之光人才培养计划基金项目 第一作者简介: 刘恒(1993-), 男, 现为中国科学院大学计算机专业硕士研究生, 主要研究方向为大规模图像检索, 模式识别。E-mail:2396076208@qq.com. 中图法分类号: TP391.41 文献标识码: A 文章编号: 1006-8961(2018)05-0652-10

# 关键词

Applying multi-index additive quantization encoding method for approximate nearest neighbor search
Liu Heng1,2, Yao Yu2, Zeng Ling1,2, Tao Pan1,2
1. University of Chinese Academy of Sciences, Beijing 100049, China;
2. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China

# Abstract

Objective As the amount of image data produced every day increases, large-scale image retrieval technology is one of the hot topics in the field of computer vision. The basic idea is to extract features from all the images in the database and define the similarity measure to perform nearest neighbor search. The key of massive image retrieval is to design a nearest neighbor search algorithm that can meet efficiency and storage needs. An approximate nearest neighbor search based on multi-index additive quantization method is presented to improve the approximate representation accuracy and reduce the storage space requirements of image visual features. Method If each image is described by a set of local descriptors, then an exhaustive search is prohibitive as we need to index billions of descriptors and perform multiple queries. The image descriptors should be stored in memory to ensure real-time retrieval; however, this approach creates a storage problem. Artificial neural network (ANN) algorithms, which mainly include index structure and quantization methods, are typically compared based on the trade-off between search quality and efficiency. On the basis of the superiority of nonlinear search, we employ an inverted multi-index structure to avoid an exhaustive search. The multi-index structure divides the original data space into multiple subspaces. Each subspace that uses an inverted index stores the list of vectors that lie in the proximity of each codeword as each entry of the multi-index table corresponds to a part of the original vector space and contains a list of points that fall within that part. The purpose of a multi-index structure is to generate a list of data vectors that lie close to any query vector efficiently by only searching in a small dataset in which the near neighbors of the query vector most likely lie, thus ensuring a substantial speed-up over the exhaustive search. As a solution to the storage problem, compact code representations of descriptors are used. Vector quantization is an effective and efficient ANN search method. These methods quantize data by using codewords to reduce the cardinality of the data space. Among the vector quantization methods, additive quantization that approximates the vectors using sums of M codewords from M different codebooks generalizes product quantization and further improves product quantization accuracy while retaining its computational efficiency to a large degree. In this study, we use the additive quantization encoding method to encode the residual data produced by the multi-index structure, which further reduces the quantitative loss of original space. We regard the method mentioned previously as a two-stage quantizer that approximates the residual vector of the preceding stage using one of the centroids in the stage codebook and generates a new residual vector for the succeeding quantization stage. The multi-index structure is used as the first-order quantizer to approximate the vectors, and the additive quantization method is utilized as the second-order quantizer to approximate the residual. The non-exhaustive search strategy retrieves only the near neighbors in a few inverted lists, which can significantly reduce the retrieval time cost. With the use of the additive quantization method in the retrieval process, the original data need not be stored in memory and only the index of the codeword in the codebook should be stored, the sum of which is nearest the data item, significantly reducing memory consumption. Result Experiments on three datasets, i.e., SIFT1M, GIST1M, and MNIST, were conducted to verify the effectiveness of the proposed algorithm. The recall rate of the proposed algorithm is approximately 4% to 15% higher and its average precision is approximately 12% higher than that of existing algorithms. The search time of the proposed algorithm is the same as that of the fastest algorithm. Conclusion An approximate nearest neighbor search based on the multi-index additive quantization method proposed in this study can effectively improve the approximate representation accuracy and reduce the storage space requirements of image visual features. The proposed method also improves retrieval accuracy and recall in large-scale datasets. The proposed algorithm focuses on the nearest neighbor search, which is suitable for large-scale images and other multimedia data.

# Key words

multi-index; compact encoding; additive quantization; approximate nearest neighbor search; vector quantization

# 0 引言

1) 利用多索引技术把原始数据空间划分成很多密集的小区域，在检索阶段只需在少量的小区域中搜索。

2) 使用加法量化方法编码多索引阶段产生的残差，在搜索阶段不用存储整个数据集，大大减少内存消耗。

# 1 量化编码问题定义

 $q({{\mathit{\boldsymbol{x}}}_{i}})\in \mathit{\boldsymbol{C}}=\{{{\mathit{\boldsymbol{c}}}_{i}}|{{\mathit{\boldsymbol{c}}}_{i}}\in {{\mathbb{R}}^{D}};i=1, 2\text{, }\cdots \text{, }K\}$ (1)

 $E=\sum\limits_{x\in X}{\left\| x-c\left( i\left( x \right) \right) \right\|_{2}^{2}}$ (2)

1) $x$ 必须编码到在码书中与 $x$ 距离最小的码字。

2) 在编码器 $i(·)$ 固定的情况下，码字 $c(i)$ 是索引到第 $i$ 个Voronoi区域的所有矢量的平均值。

# 2.1 建立多索引结构

 \begin{align} & {{\mathit{\boldsymbol{L}}}_{i, j}}=\{\mathit{\boldsymbol{x}}=\left[{{\mathit{\boldsymbol{x}}}_{1}}, {{\mathit{\boldsymbol{x}}}_{2}} \right]\in \mathit{\boldsymbol{X }}\!\!|\!\!\text{ }, \\ & i=\arg \underset{k}{\mathop{\min }}\, {{d}_{1}}({{\mathit{\boldsymbol{x}}}_{1}}, {{\mathit{\boldsymbol{u}}}_{k}})且 \\ & j=\arg \underset{k}{\mathop{\min }}\, {{d}_{2}}({{\mathit{\boldsymbol{x}}}_{2}}, {{\mathit{\boldsymbol{v}}}_{k}})\} \\ \end{align} (3)

 $\begin{array}{c} {\mathit{\boldsymbol{L}}_{i, j}} = \{ r\left( \mathit{\boldsymbol{x}} \right) = [{\mathit{\boldsymbol{x}}_1}-{\mathit{\boldsymbol{u}}_i}, {\mathit{\boldsymbol{x}}_2}-{\mathit{\boldsymbol{v}}_j}]\\ \mathit{\boldsymbol{|x}} = \left[{{\mathit{\boldsymbol{x}}_1}, {\mathit{\boldsymbol{x}}_2}} \right] \in \mathit{\boldsymbol{X}}, \\ i = \arg \mathop {\min }\limits_k {d_1}\left( {{\mathit{\boldsymbol{x}}_1}, {\mathit{\boldsymbol{u}}_k}} \right) 且 \\ j = \arg \mathop {\min }\limits_k {d_2}({\mathit{\boldsymbol{x}}_2}, {\mathit{\boldsymbol{v}}_k})\} \end{array}$ (4)

# 2.2 加法量化编码

 $\mathit{\boldsymbol{C}} = \{ \mathit{\boldsymbol{c}} = \sum\limits_{m = 1}^M {{\mathit{\boldsymbol{C}}_m}} \left( {{i_m}} \right)|{i_m} \in \{ 1, \cdots, k\} \}$ (5)

 $\mathit{\boldsymbol{\tilde x}} = \sum\limits_{m = 1}^M {{\mathit{\boldsymbol{C}}_m}} ({i_m}(\mathit{\boldsymbol{x}}))$ (6)

 $\mathop {\min }\limits_{{C_1}, {C_2}, \cdots, {C_M}} \sum\limits_{j = 1}^N {\left\| {{\mathit{\boldsymbol{x}}_j}-\sum\limits_{m = 1}^M {{\mathit{\boldsymbol{C}}_m}} \left( {{i_m}\left( {{\mathit{\boldsymbol{x}}_j}} \right)} \right)} \right\|} _{_2}^{^2}$ (7)

# 2.3 多索引加法量化编码分析

 $\begin{array}{c} {q_{{\rm{IMI}}}}\left( \mathit{\boldsymbol{x}} \right) = \{ [{\mathit{\boldsymbol{u}}_i}, {\mathit{\boldsymbol{v}}_j}]|\\ i = \arg \mathop {\min }\limits_k {d_1}\left( {{\mathit{\boldsymbol{x}}_1}, {\mathit{\boldsymbol{u}}_k}} \right) \wedge \\ j = \arg \mathop {\min }\limits_k {d_2}({\mathit{\boldsymbol{x}}_2}, {\mathit{\boldsymbol{v}}_k})\} \end{array}$ (8)

 $\mathit{\boldsymbol{r}}\left( x \right) = [{\mathit{\boldsymbol{x}}_1}-{\mathit{\boldsymbol{u}}_i}, {\mathit{\boldsymbol{x}}_2}-{\mathit{\boldsymbol{v}}_j}]$ (9)

 ${q_{aq}}\left( {r\left( \mathit{\boldsymbol{x}} \right)} \right) = \sum\limits_{m = 1}^M {{\mathit{\boldsymbol{C}}_m}} ({i_m}(r(\mathit{\boldsymbol{x}})))$ (10)

 $\mathit{\boldsymbol{\tilde x}} \buildrel \Delta \over = {q_{{\rm{IMI}}}}\left( \mathit{\boldsymbol{x}} \right) + {q_{aq}}(\mathit{\boldsymbol{x}}-{q_{{\rm{IMI}}}}(\mathit{\boldsymbol{x}}))$ (11)

# 2.4 相似性度量

 $\begin{array}{l} \tilde d{\left( {\mathit{\boldsymbol{x}}, \mathit{\boldsymbol{y}}} \right)^2} = d{\left( {\mathit{\boldsymbol{\tilde x}}, \mathit{\boldsymbol{y}}} \right)^2} = \left\| {\mathit{\boldsymbol{\tilde x}}-\mathit{\boldsymbol{y}}} \right\|_{_2}^{^2} = \\ d((\mathit{\boldsymbol{y}}-{q_{{\rm{IMI}}}}\left( \mathit{\boldsymbol{x}} \right), {q_{aq}}(\mathit{\boldsymbol{x}}-{q_{{\rm{IMI}}}}(\mathit{\boldsymbol{x}}))) \end{array}$ (12)

# 3.1 实验数据集

Table 1 The description of the dataset

 MNIST SIFT GIST 维度 784 128 960 描述子类型 像素 局部描述子 全局描述子 数据集大小 60 000 1 000 000 1 000 000 查询数据量 1 000 10 000 1 000

# 3.3 量化损失对比

Table 2 Comparisons of the distortion with 64 bits code length on SIFT1M dataset

 算法 量化损失 TC[10] 33 070.2 OPQ[9] 22 239.8 MPCA-E[11] 16 339.1 KSSQ[12] 15 253.1 IMI-AQ 13 719.1 注：表中加粗表示本文算法的量化损失最小。

# 3.4 召回率与查准率

Table 3 Comparisons of recall@r on SIFT1M with 32 and 64 bit codes

 算法 recall@1 recall@10 recall@100 PQ 0.052, 0.224 0.230, 0.599 0.595, 0.924 TC 0.057, 0.205 0.197, 0.535 0.519, 0.877 CKM/OPQ 0.068, 0.243 0.273, 0.638 0.658, 0.940 KSSQ 0.145, 0.325 0.434, 0.754 0.802, 0.923 IVFADC[5] 0.155, 0.296 0.461, 0.714 0.825, 0.960 CompQ[13] 0.135, 0.352 0.435, 0.795 0.831, 0.987 IMI-AQ 0.224, 0.388 0.608, 0.834 0.926, 0.986 注：加粗字体表示结果最优。

Table 4 Comparisons of recall@r on GIST1M with $M$ equals 4 and 8

 算法 recall@1 recall@10 recall@100 PQ 0.023, 0.076 0.068, 0.218 0.176, 0.504 TC 0.053, 0.096 0.104, 0.223 0.291, 0.547 CKM/OPQ 0.054, 0.118 0.142, 0.334 0.396, 0.715 KSSQ 0.078, 0.136 0.149, 0.396 0.345, 0.741 IVFADC 0.590, 0.910 0.155, 0.238 0.397, 0.511 CompQ 0.080, 0.157 0.214, 0.421 0.534, 0.808 IMI-AQ 0.100, 0.148 0.252, 0.401 0.560, 0.734 注：加粗字体表示结果最优。

# 3.6 内存消耗分析

Table 5 Comparisons of memory consumption for different algorithm

 算法 内存消耗 OPQ ${\rm{O}}(D^2+KD)+{\rm{O}}(MN)$ AQ ${\rm{O}}(MKD)+{\rm{O}}(MN)$ IVFADC ${\rm{O}}(υD+KD)+{\rm{O}}(MN)$ IMI-AQ ${\rm{O}}(υD+MKD)+{\rm{O}}(MN)$

# 参考文献

• [1] Silpa-Anan C, Hartley R. Optimised KD-trees for fast image descriptor matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE, 2008: 1-8. [DOI: 10.1109/CVPR.2008.4587638]
• [2] Guo Q Z, Zeng Z, Zhang S W, et al. Adaptive bit allocation product quantization[J]. Neurocomputing, 2016, 171: 866–877. [DOI:10.1016/j.neucom.2015.07.062]
• [3] Gray R M. Vector quantization[J]. IEEE ASSP Magazine, 1984, 1(2): 4–29. [DOI:10.1109/MASSP.1984.1162229]
• [4] Du Q, Emelianenko M, Ju L L. Convergence of the Lloyd algorithm for computing centroidal voronoi tessellations[J]. SIAM Journal on Numerical Analysis, 2006, 44(1): 102–119. [DOI:10.1137/040617364]
• [5] Jégou H, Douze M, Schmid C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117–128. [DOI:10.1109/TPAMI.2010.57]
• [6] Babenko A, Lempitsky V. The inverted multi-index[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(6): 1247–1260. [DOI:10.1109/TPAMI.2014.2361319]
• [7] Babenko A, Lempitsky V. Additive quantization for extreme vector compression[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 931-938. [DOI: 10.1109/CVPR.2014.124]
• [8] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [DOI:10.1109/5.726791]
• [9] Ge T Z, He K M, Ke Q F, et al. Optimized product quantization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 744–55. [DOI:10.1109/TPAMI.2013.240]
• [10] Brandt J. Transform coding for fast approximate nearest neighbor search in high dimensions[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA: IEEE, 2010: 1815-1822. [DOI: 10.1109/CVPR.2010.5539852]
• [11] Ozan E C, Kiranyaz S, Gabbouj M. M-PCA binary embedding for approximate nearest neighbor search[C]//Proceedings of 2015 IEEE Trustcom/BigDataSE/ISPA. Helsinki, Finland: IEEE, 2015: 1-5. [DOI: 10.1109/Trustcom.2015.554]
• [12] Ozan E C, Kiranyaz S, Gabbouj M. K-subspaces quantization for approximate nearest neighbor search[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1722–1733. [DOI:10.1109/TKDE.2016.2535287]
• [13] Ozan E C, Kiranyaz S, Gabbouj M. Competitive quantization for approximate nearest neighbor search[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(11): 2884–2894. [DOI:10.1109/TKDE.2016.2597834]