发布时间: 2019-03-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180264 2019 | Volume 24 | Number 3 图像分析和识别

1. 浙江大学医学院附属第一医院, 杭州 310003;
2. 西安交通大学软件学院, 西安 710049
 收稿日期: 2018-04-17; 修回日期: 2018-09-07 基金项目: 国家自然科学基金项目（61573273，61603289） 第一作者简介: 栾婷婷, 1992年生, 女, 硕士研究生, 主要研究方向为计算机视觉。E-mail:ltingtingdl@163.com;徐思雨, 女, 硕士研究生, 主要研究方向为计算机视觉。E-mail:xsy_xjtu@163.com;王佳星, 男, 硕士研究生, 主要研究方向为机器学习。E-mail:csuwjx@stu.xjtu.edu.cn;时璇, 女, 硕士研究生, 主要研究方向为机器学习。E-mail:shixuan0220@stu.xjtu.edu.cn;李垚辰, 男, 讲师, 主要研究方向为计算机视觉。E-mail:yaochenli@mail.xjtu.edu.cn. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2019)03-0389-11

# 关键词

Hashing method for image retrieval based on product quantization with Huffman coding
Luan Tingting1,2, Zhu Jihua2, Xu Siyu2, Wang Jiaxing2, Shi Xuan2, Li Yaochen2
1. The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, China;
2. School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China
Supported by: National Natural Science Foundation of China(61573273, 61603289)

# Abstract

Objective Hashing method is one of the most popular approaches for content-based image retrieval. The main idea of this approach is to learn the same size of binary codes for each image and then use the Hamming distance to measure the similarity of images. Effective Hashing methods should include at least three properties. First, the learned codes should be short so that large amounts of images can be stored in a small memory. Second, the learned codes should transform images that are perceptually or semantically similar into binary strings with a small Hamming distance. Third, the method should be efficient to learn the parameters of the binary code and encode a new test image. Most Hashing approaches include two important steps to achieve binary coding, namely, projection and quantization. For projection, most Hashing approaches perform principal component analysis (PCA) to reduce the dimensionality of raw data. For quantization, different Hashing approaches may design different strategies. In the quantization stage, most traditional Hashing methods usually allocate the same number of bit to each data subspace for image retrieval. However, information quantities are different in each data subspace. Accordingly, a uniform quantization may result in inefficient codes and high quantization distortion problems, especially when the data have unbalanced information quantities. To address this problem, this study proposes an effective coding method based on product quantization, called Huffman coding. Method Similar to most Hashing approaches, the proposed method utilizes PCA to reduce the dimensionality of raw data in the projection stage. A vector quantization scheme is then carefully designed at the quantization stage. The proposed approach first utilizes product quantization to quantize data after dimensionality reduction to preserve data distribution in the original space. For each subspace, the variance can be directly calculated as the measure of its information quantity. For effectiveness, the subspace with high information quantity should be allocated with a large number of bit for binary coding and vice versa. To achieve this goal, the reciprocal value of the variance proportion can be used to build a Huffman tree, which can then be applied to generate Huffman codes. Accordingly, different bit and values of binary code can be assigned to each subspace. In other words, numerous bit will be allocated to encode subspaces with large variance and few for subspaces with small variance. The variance is easy to calculated, and therefore, the proposed approach is simple and efficient for binary coding. Experimental results illustrate that the Huffman coding method is effective for image retrieval. Result During the experiment, the proposed approach is tested on three public datasets, namely, MNIST, NUS-WIDE, and 22K LabelMe. For each image, a 512D GIST descriptor can be extracted as the input of the Hashing approach. To verify its good performance, the proposed approach is compared with four related approaches:original product quantization method, PCA-based product quantization method, iterative quantization method, and transform coding (TC) method. The experimental results are reported in the form of quantization distortion, mean average precision, recall, and training time. Results show that the average quantization distortion of the proposed approach can be decreased by approximately 49%, and the mean average precision of the retrieval results is increased by approximately 19% compared with the existing method based on product quantization. The training time of the proposed approach is also compared with that of TC from 32 bit to 256 bit on MNIST. The proposed approach can reduce 22.5 s of the training time on average. Conclusion This study proposes Huffman coding for image retrieval in the product quantization stage. According to information quantities, the Huffman-based product quantization scheme can allocate different numbers of bit to each data subspace, which can effectively increase coding efficiency and quantization accuracy. The proposed approach is tested on three public datasets and compared with four related approaches. Experimental results demonstrate that the proposed approach is superior to some state-of-the-art algorithms for image retrieval on mean average precision and recall. The proposed approach does not belong to precise coding methods; thus, our future work will focus on precise Hashing method for effective image retrieval.

# Key words

Hashing; image retrieval; approximate nearest neighbor search; product quantization; bit allocation; coding efficiency

# 1 相关工作

1) 向量量化方法

 $\mathit{\boldsymbol{q}}\left( {{\mathit{\boldsymbol{x}}_i}} \right) \in \left\{ {{c_i};i \in \mathit{\boldsymbol{I}}} \right\}$ (1)

2) 乘积量化方法

 $\underbrace {{\mathit{\boldsymbol{x}}_1}, \cdots ,{\mathit{\boldsymbol{x}}_p}}_{\mathit{\boldsymbol{x}}_{\rm{l}}^s}, \cdots ,\underbrace {{\mathit{\boldsymbol{x}}_{d - p + 1}}, \cdots ,{\mathit{\boldsymbol{x}}_d}}_{\mathit{\boldsymbol{x}}_m^s}$ (2)

 $\mathit{\boldsymbol{q}}\left( {{\mathit{\boldsymbol{x}}^s}} \right) = \left( {q\left( {\mathit{\boldsymbol{x}}_1^s} \right), \cdots ,q\left( {\mathit{\boldsymbol{x}}_m^s} \right)} \right)$ (3)

 $\mathit{\boldsymbol{C}} = \mathit{\boldsymbol{C}}_1^s \times \cdots \times \mathit{\boldsymbol{C}}_m^s$ (4)

# 2.1 子空间数据方差分布

 $V\left( \mathit{\boldsymbol{X}} \right) = \sum\limits_{i = 1}^M {\left( {V\left( {{\mathit{\boldsymbol{X}}^i}} \right)} \right)}$ (6)

# 2.2 基于哈夫曼树的编码位数分配策略

 $\begin{array}{*{20}{c}} {{B_i} \sim {V^i}}\\ {{\rm{s}}.\;{\rm{t}}.\;\;\;\;B = \sum\limits_{i = 1}^M {{B_i}} } \end{array}$ (7)

 ${B_i} = r\left( {\frac{{{H_i}}}{H} \times B} \right),\;\;\;1 \le i \le M$ (10)

# 3.1 对称距离方法(SD)

 $\begin{array}{*{20}{c}} {{d_{{\rm{SD}}}}\left( {{\mathit{\boldsymbol{x}}_1},{\mathit{\boldsymbol{x}}_2}} \right) = d\left( {q\left( {{\mathit{\boldsymbol{x}}_1}} \right),q\left( {{\mathit{\boldsymbol{x}}_2}} \right)} \right) = }\\ {\sqrt {\left( {{\mathit{\boldsymbol{c}}_{{I_1}}} - {\mathit{\boldsymbol{c}}_{{I_2}}}} \right){{\left( {{\mathit{\boldsymbol{c}}_{{I_1}}} - {\mathit{\boldsymbol{c}}_{{I_2}}}} \right)}^{\rm{T}}}} } \end{array}$ (11)

 $\begin{array}{*{20}{c}} {{d_{{\rm{AD}}}}\left( {{\mathit{\boldsymbol{x}}_1},{\mathit{\boldsymbol{x}}_2}} \right) = d\left( {{\mathit{\boldsymbol{x}}_1},q\left( {{\mathit{\boldsymbol{x}}_2}} \right)} \right) = }\\ {\sqrt {\left( {{\mathit{\boldsymbol{x}}_1} - {\mathit{\boldsymbol{c}}_{{I_2}}}} \right){{\left( {{\mathit{\boldsymbol{x}}_1} - {\mathit{\boldsymbol{c}}_{{I_2}}}} \right)}^{\rm{T}}}} } \end{array}$ (12)

# 4.1 数据集与参数设置

Table 1 Bit allocation of HPQ on MNIST

 比特数 子空间编码分配 32 {7，6，5，4，4，4，1，1} 64 {8，7，7，5，5，4，4，4，4，4，4，4，1，1，1，1} 128 {8，8，7，6，6，5，5，4，4，4，4，4，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，3，3，3，2，2} 256 {9，8，8，7，7，6，6，6，6，5，5，5，5，5，5，5，5，5，5，5，5，5，5，5，4，4，4，4，4，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，2，2，2，2，2，2，2，2，2，2，2，2}

Table 2 Bit allocation of HPQ on NUS-WIDE

 比特数 子空间编码分配 32 {7，6，5，4，4，4，1，1} 64 {7，6，6，6，5，5，4，4，4，3，3，3，2，2，2，2} 128 {8，7，7，6，6，5，5，5，5，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3} 256 {10，9，8，8，7，7，7，7，5，5，5，5，5，5，5，5，5，5，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，2，2，2，2，2，2，2，2，2，2}

Table 3 Bit allocation of HPQ on 22K LabelMe

 比特数 子空间编码分配 32 {7，6，5，4，4，4，1，1} 64 {9，8，6，6，5，5，5，4，2，2，2，2，2，2，2，2} 128 {7，7，7，6，6，5，5，5，5，5，4，4，4，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，2，2，2，2} 256 {9，8，8，7，7，6，6，6，5，5，5，5，5，5，5，5，5，5，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，4，3，3，3，3，3，3，3，3，3，3，3，3，3，3，3，2，2，2，2，2，2，2，2，2，2}

# 4.2 实验结果分析

Table 4 Comparison of quantization distortion on MNIST

 比特数 方法 PQ TC HPQ 32 0.732 5 0.095 9 0.077 2 64 0.660 6 0.106 4 0.076 1 128 0.520 1 0.097 1 0.084 0 256 0.360 7 0.082 5 0.079 7 注：加粗字体表示最优结果。

Table 5 Comparison of quantization distortion on NUS-WIDE

 比特数 方法 PQ TC HPQ 32 0.730 4 0.095 8 0.077 5 64 0.661 4 0.106 4 0.076 2 128 0.521 3 0.097 1 0.084 2 256 0.361 0 0.080 8 0.079 2 注：加粗字体表示最优结果。

Table 6 Comparison of quantization distortion on 22K LabelMe

 比特数 方法 PQ TC HPQ 32 0.313 7 0.089 3 0.080 4 64 0.358 1 0.076 0 0.073 3 128 0.387 8 0.056 5 0.053 2 256 0.403 1 0.038 3 0.031 1 注：加粗字体表示最优结果。

Table 7 Comparison of mean average precision on MNIST

 比特数 方法 PQ PCA-PQ ITQ TC HPQ 32 0.267 3 0.581 4 0.635 2 0.625 4 0.716 6 64 0.557 7 0.589 5 0.758 0 0.8151 0.864 9 128 0.761 0 0.631 9 0.793 4 0.936 6 0.947 5 256 0.893 7 0.685 3 0.813 4 0.977 9 0.978 1 注：加粗字体表示最优结果。

Table 8 Comparison of mean average precision on NUSWIDE

 比特数 方法 PQ PCA-PQ ITQ TC HPQ 32 0.430 4 0.368 5 0.185 1 0.438 6 0.521 1 64 0.532 9 0.415 9 0.202 8 0.599 9 0.677 1 128 0.672 3 0.467 7 0.217 0 0.786 9 0.834 1 256 0.804 7 0.518 7 0.218 5 0.909 9 0.924 4 注：加粗字体表示最优结果。

Table 9 Comparison of mean average precision on 22K LabelMe

 比特数 方法 PQ PCA-PQ ITQ TC HPQ 32 0.220 9 0.191 8 0.282 2 0.251 7 0.309 2 64 0.319 5 0.244 2 0.335 5 0.454 7 0.479 9 128 0.374 6 0.265 6 0.364 0 0.712 1 0.742 1 256 0.494 7 0.267 9 0.388 3 0.880 7 0.886 6 注：加粗字体表示最优结果。

Table 10 Training time on MNIST dataset

 /s 比特数 方法 PQ TC HPQ 32 16.911 1 14.467 6 7.446 0 64 20.558 6 25.026 8 16.301 1 128 27.229 3 50.433 7 26.836 3 256 41.937 7 105.547 3 55.916 1 注：加粗字体表示最优结果。

# 参考文献

• [1] Shen F M, Zhou X, Yang Y, et al. A fast optimization method for general binary code learning[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5610–5621. [DOI:10.1109/TIP.2016.2612883]
• [2] Gui J, Liu T L, Sun Z N, et al. Fast supervised discrete hashing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(2): 490–496. [DOI:10.1109/TPAMI.2017.2678475]
• [3] Li W J, ZHOU Z H. Learning to hash for big data:current status and future trends[J]. Chinese Science Bulletin, 2015, 60(5-6): 485–490. [李武军, 周志华. 大数据哈希学习:现状与趋势[J]. 科学通报, 2015, 60(5-6): 485–490. ] [DOI:10.1360/N972014-00841]
• [4] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[C]//Proceedings of the 25th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999: 518-529.
• [5] Kulis B, Grauman K. Kernelized locality-sensitive hashing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(6): 1092–1104. [DOI:10.1109/TPAMI.2011.219]
• [6] Zhang X Y, Wang M L, Cui J T. Efficient indexing of binary LSH for high dimensional nearest neighbor[J]. Neurocomputing, 2016, 213: 24–33. [DOI:10.1016/j.neucom.2016.05.095]
• [7] Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the 20th Annual Symposium on Computational Geometry. Brooklyn, NY, USA: ACM, 2004: 253-262.[DOI: 10.1145/997817.997857]
• [8] Liu W, Wang J, Kumar S, et al. Hashing with graphs[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA: Omnipress, 2011: 1-8.
• [9] Gong Y C, Lazebnik S, Gordo A, et al. Iterative quantization:a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2916–2929. [DOI:10.1109/TPAMI.2012.193]
• [10] Shen F M, Shen C H, Liu W, et al. Supervised discrete hashing[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 37-45.[DOI: 10.1109/CVPR.2015.7298598]
• [11] Wang J, Kumar S, Chang S F. Semi-supervised hashing for scalable image retrieval[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 3424-3431.[DOI: 10.1109/CVPR.2010.5539994]
• [12] Kong W H, Li W J. Double-bit quantization for hashing[C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Ontario, Canada: AAAI, 2012: 634-640
• [13] Kong W H, Li W J, Guo M Y. Manhattan hashing for large-scale image retrieval[C]//Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland, Oregon, USA: ACM, 2012: 45-54.[DOI: 10.1145/2348283.2348293]
• [14] Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117–128. [DOI:10.1109/TPAMI.2010.57]
• [15] Ge T Z, He K M, Ke Q F, et al. Optimized product quantization for approximate nearest neighbor search[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 2946-2953.[DOI: 10.1109/CVPR.2013.379]
• [16] Norouzi M, Fleet D J. Cartesian k-means[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 3017-3024.[DOI: 10.1109/CVPR.2013.388]
• [17] Brandt J. Transform coding for fast approximate nearest neighbor search in high dimensions[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010: 1815-1822.[DOI: 10.1109/CVPR.2010.5539852]
• [18] Kong W H, Li W J. Isotropic hashing[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM, 2012: 1646-1654.
• [19] Huffman D A. A method for the construction of minimum-redundancy codes[J]. Proceedings of the IRE, 1952, 40(9): 1098–1101. [DOI:10.1109/JRPROC.1952.273898]