|
发布时间: 2021-11-16 |
图像分析和识别 |
|
|
收稿日期: 2020-06-17; 修回日期: 2020-11-20; 预印本日期: 2020-11-27
基金项目: 中央高校基本科研业务费专项资金资助(2019IB010);湖北省教育厅指导性项目(B2019049)
作者简介:
万方, 男, 讲师, 硕士生导师, 主要研究方向为计算机视觉和图像检索。E-mail: 20021026@hbut.edu.cn
强浩鹏, 男, 硕士研究生, 主要研究方向为深度学习和跨模态检索。E-mail: qianghaopengshuxuejd@whut.edu.cn 雷光波, 女, 讲师, 主要研究方向为视觉SLAM和深度学习。E-mail: 64694786@qq.com *通信作者: 万方 20021026@hbut.edu.cn
中图法分类号: P23
文献标识码: A
文章编号: 1006-8961(2021)11-2659-11
|
摘要
目的 基于深度学习的图像哈希检索是图像检索领域的热点研究问题。现有的深度哈希方法忽略了深度图像特征在深度哈希函数训练中的指导作用,并且由于采用松弛优化,不能有效处理二进制量化误差较大导致的生成次优哈希码的问题。对此,提出一种自监督的深度离散哈希方法(self-supervised deep discrete hashing,SSDDH)。方法 利用卷积神经网络提取的深度特征矩阵和图像标签矩阵,计算得到二进制哈希码并作为自监督信息指导深度哈希函数的训练。构造成对损失函数,同时保持连续哈希码之间相似性以及连续哈希码与二进制哈希码之间的相似性,并利用离散优化算法求解得到哈希码,有效降低二进制量化误差。结果 将本文方法在3个公共数据集上进行测试,并与其他哈希算法进行实验对比。在CIFAR-10、NUS-WIDE(web image dataset from National University of Singapore)和Flickr数据集上,本文方法的检索精度均为最高,本文方法的准确率比次优算法DPSH(deep pairwise-supervised hashing)分别高3%、3%和1%。结论 本文提出的基于自监督的深度离散哈希的图像检索方法能有效利用深度特征信息和图像标签信息,并指导深度哈希函数的训练,且能有效减少二进制量化误差。实验结果表明,SSDDH在平均准确率上优于其他同类算法,可以有效完成图像检索任务。
关键词
深度学习; 图像检索; 哈希学习; 自监督; 离散优化
Abstract
Objective Hashing techniques have attracted much attention and are widely applied in the nearest neighbor search for image retrieval on large-scale datasets due to the low storage cost and fast retrieval speed. With the great development of deep learning, deep neural networks have been widely incorporated in image hashing retrieval, and existing deep learning-based hashing methods demonstrate the effectiveness of the end-to-end deep learning architecture for hashing learning. However, these methods have several problems. First, these existing deep hashing methods ignore the guiding role of deep image feature information in training deep hashing functions. Second, most deep hashing methods are to solve a relaxed problem first to simplify the optimization involved in a binary code learning procedure and then quantize the solved continuous solution to achieve the approximate binary solution. This optimization strategy leads to a large binary quantization error, result ing in the generation of suboptimal hash codes. Thus, to solve these two problems, a self-supervised deep discrete hashing method (SSDDH) is proposed in this study. Method The proposed SSDDH consists of two steps. First, using matrix decomposition, the binary hash code is obtained by solving the self-supervised loss function composed of the deep feature matrix extracted by the convolutional neural network and the image label matrix. The obtained binary hash code is used as the supervision information to guide the training of deep hash function. Second, a pair-wise loss function is constructed to maintain the similarity between the hash codes generated by deep hash function while maintaining the similarity between these hash codes and binary hash codes. The discrete optimization algorithm is used to solve the optimal solution of the objective function, thus effectively reducing the binary quantization error. Result Several experiments are conducted on three public datasets to validate the performance of the proposed algorithm. The first experiment compares the mean average precision (mAP) values of different existing hash methods on different hash code lengths, including unsupervised methods, supervised shallow methods, and supervised deep methods. The performance experimental results show that the mAP of our method SSDDH achieves the best performance in all cases with different values of the code length. On the CIFAR-10 and NUS-WIDE(web image dataset from National University of Singapore) datasets, the mAP of SSDDH is 3% higher than the next highest method named DPSH(deep pairwise-supervised hashing). On the Flickr dataset, SSDDH is also 1% higher than the highest method DPSH. The second experiment involves the CIFAR-10 dataset. The precision recall (PR) curves of DPSH and SSDDH are plotted. Query result comparison shows the PR curves of DPSH and SSDDH with 48-bit hash codes on CIFAR-10, and our SSDDH remarkably outperforms its competitor. SSDDH and DPSH are also compared in terms of the accuracy of the top 20 returned images when the hash code length is 48 bits. The result of the experiment is visualized for easy observation. We also found that the retrieval performance of SSDDH is considerably higher than that of DPSH. Experiment 3 is designed for parameter sensitivity analysis of SSDDH. Here, a parameter is used, while the others are fixed. Our method is insensitive to the parameters. This finding relatively demonstrates the robustness and effectiveness of the proposed method. Experiment 4 is conducted on CIFAR-10 when the hash code length is 48 bits to explore the difference between DPSH and SSDDH in time complexity. At the later stage of model training, SSDDH performance is better than DPSH at the same time consumption. Conclusion Considering that the existing deep hash methods ignore the guiding role of deep image feature information in the training of deep hash function and have the problem of large binary quantization error, this study proposes a self-supervised deep discrete hashing method named SSDDH. The deep feature matrix extracted by the convolutional neural network and the image label matrix are used to obtain the binary hash codes and make the binary hash codes the supervised information to guide the training of deep hash function. The similarity between the hash codes generated by deep hash function and the similarity between these hash codes and binary hash codes are maintained by constructing a pair-wise loss function. The binary quantization error is effectively reduced using the discrete cyclic coordinate descent. Comparison with several existing methods on three commonly used public datasets proves that this method is more efficient than the existing hash retrieval method. Future work lies in two aspects: First, focus will be on learning better fine-grained representation with more effectively. Second, semi-supervised regularization will be applied to our framework to make full use of the unlabeled data. Both will be employed to boost the image retrieval accuracy further. Third, our current approach will be extended to cross-modal retrieval, such as given a text query, to obtain all semantic relevant images from the database.
Key words
deep learning; image retrieval; hash learning; self-supervised; discrete optimization
0 引言
随着网络社交媒体和移动网络等技术的蓬勃发展,每天都会产生海量的多媒体数据,基于内容的图像检索技术(content-based image retrieval,CBIR)受到了广泛关注,而基于哈希的方法因其较低的存储成本与时间复杂度成为图像检索领域的热点。哈希算法在编码过程中,利用特定的哈希函数将检索系统提取的图像的高维特征映射成长度固定的二进制码,大幅降低了空间复杂度;在检索过程中,利用按位的与或运算的高效性,利用汉明距离(Hamming distance)计算得到实例间的相似度,提高了检索速度(Yan等,2018;刘冶等,2016;Shen等,2020)。
通常,哈希方法根据是否依赖数据划分为数据无关方法和数据相关方法(Jiang等,2018)。数据无关的哈希方法通过随机投影,将数据点从原始特征空间投影到汉明空间,得到的哈希函数与训练数据分布无关。典型方法有Andoni和Indyk(2006)提出的局部敏感哈希(locality-sensitive hashing,LSH) 以及Kulis和Grauman(2009)提出的LSH的扩展算法--核局部敏感哈希(kernelized locality sensitive hashing,KLSH)。LSH使用从高斯分布中采样的一组随机生成的超平面,将原始高维数据投影到超平面上,最后将投影结果作为哈希函数的输出进行阈值化。但仅在生成较长的哈希码时,数据无关的方法才能获得较好的检索性能(Liu等,2012)。与之相比,数据相关的方法在采用较小的哈希码长度时可以取得更好的检索性能。
根据是否引入监督信息,数据相关的哈希方法分为无监督哈希方法和监督哈希方法。无监督哈希方法使用数据点的特征信息得到哈希函数,且不需要任何外界提供的语义或者标签信息。这类方法旨在保持训练数据之间在原空间中的相似程度。代表算法有Weiss等人(2008)提出的谱哈希(spectral hashing,SH)、Gong等人(2013)提出的迭代量化(iterative quantization,ITQ)、Jin等人(2014)提出的密度敏感哈希(density sensitive hashing,DSH)以及Liu等人(2014)提出的离散图哈希(discrete graph hashing,DGH)。SH从图分割的角度考虑,求解图拉普拉斯矩阵的特征向量,并对其二值化,从而得到相应的二进制码。ITQ算法使用主成分分析方法对数据进行降维操作,引入旋转矩阵,最小化哈希码和原始数据之间的量化损失,从而提高了哈希检索的性能。由于很多哈希方法采用松弛的优化策略,导致当哈希码位数增加时,算法检索能力下降。DGH通过离散优化算法降低二进制量化误差,并采用锚点图捕获数据集中隐含的近邻结构。
相比无监督方法,引入了监督信息的哈希方法在实践应用中达到更高的检索精度。早期的监督哈希方法主要包括Wang等人(2010)提出的哈希的顺序投影学习法(sequential projection learning for hashing,SPLH)、Liu等人(2012)提出的基于核的监督哈希算法(supervised hashing with kernels,KSH)、Zhang等人(2014)提出的潜在因子哈希(latent factor hashing,LFH)和Shen等人(2015)提出的离散监督哈希(supervised discrete hashing,SDH)。KSH通过最小化相似图像之间的哈希码的距离,通过最大化不相似图像之间的距离生成哈希码。LFH将数据点映射到一个潜在子空间中,在该子空间中嵌入标签信息以保持语义结构。考虑到离散约束,SDH不仅保留了语义结构,而且不需要任何松弛即可离散地学习哈希码。但是,这些方法中图像特征提取过程和哈希编码过程通常是独立进行的,可能导致生成的哈希码与特征之间不能很好地兼容(Jiang和Li,2017)。
随着深度学习在图像分类(Simonyan和Zisserman,2015;Krizhevsky等,2012)、目标检测(Liu等,2016b;Ren等,2017)和语义分割(Noh等,2015;He等,2020)等领域表现出强大性能,更多深度哈希方法相继提出,主要包括Xia等人(2014)提出的基于卷积神经网络(convolutional neural networks,CNN)的哈希(convolution neural network hashing,CNNH)、Liu等人(2016a)提出的深度监督哈希(deep supervised hashing,DSH)、Zhu等人(2016)提出的深度哈希网络(deep hashing network,DHN)、Li等人(2016)提出的深度成对监督哈希(deep pairwise-supervised hashing,DPSH)以及Cao等人(2018)提出的哈希生成对抗网络(hashing generative adversarial network,HashGAN)。CNNH需要两个步骤得到图像的哈希码,首先构造训练数据集的相似矩阵得到近似哈希码,然后利用CNN学习哈希函数得到哈希码。DSH、DHN、DPSH和HashGAN为端到端的方法,即特征学习与哈希编码同时进行。DPSH通过构造成对交叉熵损失函数得到较好的哈希码。HashGAN借鉴GAN(Goodfellow等,2014)的思想,从真实图像和生成模型合成的多种图像中学习紧凑的二进制哈希码。深度哈希方法将图像特征提取过程和哈希编码过程统一在同一个网络中,很好地解决了早期哈希方法中存在的兼容性问题(王志明和张航,2019;Lu等,2020)。
现有的深度哈希方法通常直接采用松弛策略生成连续的哈希码,然后利用符号函数(或阈值分割)将连续的哈希码转换成二进制码。这类方法虽然便于实现,但会带来很大的量化损失,从而生成低质量的哈希码。对此,本文提出一种自监督的深度离散哈希方法(self-supervised deep discrete hashing,SSDDH),网络结构如图 1所示。首先,通过矩阵分解,利用深度特征矩阵和图像标签矩阵直接生成二进制哈希码,并作为网络的自监督信息指导深度哈希函数的训练。其次,构建两个相似性矩阵分别用来保持连续哈希码之间的相似性以及连续哈希码与离散哈希码之间的相似性。同时,采用离散优化算法学习哈希码,减少了在连续哈希码到二进制码转化过程中的量化误差。与其他哈希方法的对比实验结果验证了本文提出的SSDDH在图像检索领域的先进性。
1 深度成对监督哈希
在深度成对监督哈希(DPSH)(Li等,2016)中,给定的哈希码矩阵
$ p\left(s_{i j} \mid \boldsymbol{B}\right)= \begin{cases}\sigma\left(\varOmega_{i j}\right) & s_{i j}=1 \\ 1-\sigma\left(\varOmega_{i j}\right) & s_{i j}=0\end{cases} $ | (1) |
式中,
通过计算
$ \begin{aligned} \min \limits_{\boldsymbol{B}} J_{1}=&-\log p(\boldsymbol{S} \mid \boldsymbol{B})=-\sum\limits_{s_{i j} \in \boldsymbol{S}} \log p\left(s_{i j} \mid \boldsymbol{B}\right)=\\ &-\sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} {\varOmega}_{i j}-\log \left(1+\mathrm{e}^{\varOmega_{i j}}\right)\right) \end{aligned} $ | (2) |
通过最小化上述优化问题,可以使得两个相似点之间的汉明距离尽可能小,同时使两个不相似点之间的汉明距离尽可能大。由于哈希码矩阵
$ \begin{gathered} \min \limits_{\boldsymbol{B}, \boldsymbol{H}} J_{2}=-\sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} \boldsymbol{\varTheta}_{i j}-\log \left(1+\mathrm{e}^{\boldsymbol{\varTheta}_{i j}}\right)\right)+ \\ \eta \sum\limits_{i=1}^{n}\left\|\boldsymbol{b}_{i}-\boldsymbol{h}_{i}\right\|_{\mathrm{F}}^{2} \end{gathered} $ | (3) |
式中,
$ \begin{gathered} \min \limits_{\boldsymbol{B}, \boldsymbol{H}} J_{3}=-\sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} \boldsymbol{\varTheta}_{i j}-\log \left(1+\mathrm{e}^{\boldsymbol{\varTheta}_{i j}}\right)\right)+ \\ \mu\left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}+v\|\boldsymbol{W}\|_{\mathrm{F}}^{2} \end{gathered} $ | (4) |
式中,
2 本文方法
2.1 自监督损失函数
大多数深度哈希方法直接利用卷积神经网络生成连续的哈希码,没有考虑利用卷积神经网络提取的图像特征可以生成离散哈希码。本文通过深度特征矩阵和图像标签矩阵直接生成二进制哈希码并作为网络的自监督信息,指导深度哈希函数的训练。利用矩阵分解,得到
$ \begin{aligned} \min \limits_{\boldsymbol{B}, \boldsymbol{U}, \boldsymbol{W}} &J_{4}=\left\|\boldsymbol{F}-\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}+\left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2} \\ &\text { s. t. }\ \ \ \ \boldsymbol{B} \in\{-1,+1\}^{n \times k} \end{aligned} $ | (5) |
通过最小化式(5),可以直接得到离散哈希码矩阵
2.2 成对损失函数
为保证生成高质量的哈希码,定义成对标签相似矩阵
$ {\mathit{\boldsymbol{S}}^\prime } = 2\mathit{\boldsymbol{S}} - {\bf{1}} $ |
即
$ \begin{gathered} \min \limits_{\boldsymbol{B}, \boldsymbol{H}} J_{5}=-\sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} \boldsymbol{\varTheta}_{i j}-\log \left(1+\mathrm{e}^{\boldsymbol{\varTheta}_{i j}}\right)\right)+ \\ \sum\limits_{s_{i j} \in \boldsymbol{S}^{\prime}}\left(\boldsymbol{h}_{i} \boldsymbol{b}_{j}^{\mathrm{T}}-k s_{i j}^{\prime}\right)^{2}+\alpha\|\boldsymbol{B}-\boldsymbol{H}\|_{\mathrm{F}}^{2}+ \\ \beta\|\boldsymbol{H} {\bf{1}}\|_{\mathrm{F}}^{2}+\gamma\|\hat{\boldsymbol{L}}-\boldsymbol{L}\|_{\mathrm{F}}^{2} \\ \text { s. t. } \quad \boldsymbol{B} \in\{-1,+1\}^{n \times k} \\ \qquad\ \ \ \boldsymbol{H} \in[-1,+1]^{n \times k} \end{gathered} $ | (6) |
式中,
显然,最小化式(6)中的前两项,可以使相似图像的哈希码的距离更小,而不相似图像的距离更大。其中
2.3 目标损失函数以及算法求解
如上所述,SSDDH的总体目标函数定义为
$ \begin{gathered} \min \limits_{\theta, \boldsymbol{B}, \boldsymbol{U}, \boldsymbol{W}} J=J_{4}+J_{5}= \\ \left\|\boldsymbol{F}-\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}+\left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}- \\ \sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} \boldsymbol{\varTheta}_{i j}-\log \left(1+\mathrm{e}^{\boldsymbol{\varTheta}_{i j}}\right)\right)+ \\ \left\|\boldsymbol{H} \boldsymbol{B}^{\mathrm{T}}-k \boldsymbol{S}^{\prime}\right\|_{\mathrm{F}}^{2}+\alpha\|\boldsymbol{B}-\boldsymbol{H}\|_{\mathrm{F}}^{2}+ \\ \beta\|\boldsymbol{H} {\bf{1}}\|_{\mathrm{F}}^{2}+\gamma\|\hat{\boldsymbol{L}}-\boldsymbol{L}\|_{\mathrm{F}}^{2} \\ \text { s. t. } \quad \boldsymbol{B} \in\{-1,+1\}^{n \times k} \end{gathered} $ | (7) |
式中,
1) 更新参数
$ \begin{gathered} \min _{\theta} J=-\sum\limits_{s_{i j} \in \boldsymbol{S}}\left(s_{i j} \boldsymbol{\varTheta}_{i j}-\log \left(1+\mathrm{e}^{\boldsymbol{\varTheta}_{i j}}\right)\right)+ \\ \left\|\boldsymbol{H B}^{\mathrm{T}}-k \boldsymbol{S}^{\prime}\right\|_{\mathrm{F}}^{2}+\alpha\|\boldsymbol{B}-\boldsymbol{H}\|_{\mathrm{F}}^{2}+ \\ \beta\|\boldsymbol{H} {\bf{1}}\|_{\mathrm{F}}^{2}+\gamma\|\hat{\boldsymbol{L}}-\boldsymbol{L}\|_{\mathrm{F}}^{2} \end{gathered} $ | (8) |
式(8)可以利用随机梯度下降法(stochastic gradient descent,SGD)通过反向传播对参数
2) 更新参数
$ \min \limits_{\boldsymbol{U}} \boldsymbol{J}=\left\|\boldsymbol{F}-\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2} $ | (9) |
通过式(9)计算关于
$ \boldsymbol{U}=\boldsymbol{F}^{\mathrm{T}} \boldsymbol{B}\left(\boldsymbol{B}^{\mathrm{T}} \boldsymbol{B}\right)^{-1} $ | (10) |
3) 更新参数
$ \min \limits_{\boldsymbol{W}} \boldsymbol{J}=\left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2} $ | (11) |
通过式(11)计算关于
$ \boldsymbol{W}=\boldsymbol{L}^{\mathrm{T}} \boldsymbol{B}\left(\boldsymbol{B}^{\mathrm{T}} \boldsymbol{B}\right)^{-1} $ | (12) |
4) 更新参数
$ \begin{gathered} \min \limits_{\boldsymbol{B}} J=\left\|\boldsymbol{F}-\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}+\left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}+ \\ \left\|\boldsymbol{H} \boldsymbol{B}^{\mathrm{T}}-k \boldsymbol{S}^{\prime}\right\|_{\mathrm{F}}^{2}+\alpha\|\boldsymbol{B}-\boldsymbol{H}\|_{\mathrm{F}}^{2} \\ \text { s. t. } \quad \boldsymbol{B} \in\{-1,+1\}^{n \times k} \end{gathered} $ | (13) |
对于式(13),引入离散循环坐标下降算法(DCCD)(Shen等,2015),则有
$ \left\|\boldsymbol{F}-\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}=\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}} \boldsymbol{U} \boldsymbol{B}^{\mathrm{T}}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{1}^{\mathrm{T}}\right)+c $ |
$ \left\|\boldsymbol{L}-\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}}\right\|_{\mathrm{F}}^{2}=\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}} \boldsymbol{W} \boldsymbol{B}^{\mathrm{T}}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{2}^{\mathrm{T}}\right)+c $ |
$ \left\|\boldsymbol{H} \boldsymbol{B}^{\mathrm{T}}-k \boldsymbol{S}^{\prime}\right\|_{\mathrm{F}}^{2}=\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{H}^{\mathrm{T}} \boldsymbol{H} \boldsymbol{B}^{\mathrm{T}}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{3}^{\mathrm{T}}\right)+c $ |
$ \boldsymbol{\alpha}\|\boldsymbol{B}-\boldsymbol{H}\|_{\mathrm{F}}^{2}=\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{4}^{\mathrm{T}}\right)+c $ |
式中,
$ \begin{gathered} \min \limits_{\boldsymbol{B}} J= \operatorname{tr}\left(\boldsymbol{B} \boldsymbol{U}^{\mathrm{T}} \boldsymbol{U} \boldsymbol{B}^{\mathrm{T}}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{W}^{\mathrm{T}} \boldsymbol{W} \boldsymbol{B}^{\mathrm{T}}\right)+\\ \operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{1}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{2}\right)+\\ \operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{3}\right)+\operatorname{tr}\left(\boldsymbol{B} \boldsymbol{Q}_{4}\right)+c \\ \text { s. t. } \quad \boldsymbol{B} \in\{-1,+1\}^{n \times k} \end{gathered} $ | (14) |
为求解式(14),分别用
为求解
$ \begin{gathered} \min \limits_{\boldsymbol{B}_{*{r}}} J\left(\boldsymbol{B}_{*{r}}\right)= \operatorname{tr}\left(\boldsymbol { B } _ { * { r } } \left[2 \boldsymbol{U}_{*{r}}^{\mathrm{T}} \hat{\boldsymbol{U}}_{r} \hat{\boldsymbol{B}}_{r}^{\mathrm{T}}+2 \boldsymbol{W}_{*{r}}^{\mathrm{T}} \hat{\boldsymbol{W}}_{r} \hat{\boldsymbol{B}}_{r}^{\mathrm{T}}+\right.\right.\\ 2 \boldsymbol{H}_{*{r}}^{\mathrm{T}} \hat{\boldsymbol{H}}_{r} \hat{\boldsymbol{B}}_{r}^{\mathrm{T}}+\left(\boldsymbol{Q}_{1}\right)_{*{r}}^{\mathrm{T}}+\left(\boldsymbol{Q}_{2}\right)_{*{r}}^{\mathrm{T}}+ \\ \left.\left.\left(\boldsymbol{Q}_{3}\right)_{*{r}}^{\mathrm{T}}+\left(\boldsymbol{Q}_{4}\right)_{*{r}}^{\mathrm{T}}\right]\right) \\ \text { s.t. } \quad \boldsymbol{B}_{*{r}} \in\{-1,+1\}^{n} \end{gathered} $ | (15) |
因此,可得式(15)的最优解为
$ \begin{aligned} \boldsymbol{B}_{*{r}} &=-\operatorname{sign}\left(2 \hat{\boldsymbol{B}}_{r}\left[\hat{\boldsymbol{U}}_{r}^{\mathrm{T}} \boldsymbol{U}_{*{r}}+\hat{\boldsymbol{W}}_{r}^{\mathrm{T}} \boldsymbol{W}_{*{r}}+\hat{\boldsymbol{H}}_{r}^{\mathrm{T}} \boldsymbol{H}_{*{r}}\right]+\right.\\ &\left.\left(\boldsymbol{Q}_{1}\right)_{*{r}}+\left(\boldsymbol{Q}_{2}\right)_{*{r}}+\left(\boldsymbol{Q}_{3}\right)_{*{r}}+\left(\boldsymbol{Q}_{4}\right)_{*{r}}\right) \end{aligned} $ | (16) |
综上所述,本文提出的SSDDH算法的具体步骤如下:
输入:图像数据
输出:卷积神经网络参数
初始化:参数
for iter = 1,2,…,
for i = 1,2,…,
随机从
end for;
利用式(10)更新参数
利用式(12)更新参数
for
利用式(16)更新
end for;
end for。
2.4 样本外扩展
当训练过程结束得到深度哈希函数后,只能得到训练数据中图像的哈希码,仍然需要执行样本外扩展来预测未出现在训练集中图像的哈希码。对于任何图像
$ \boldsymbol{b}_{q}=\operatorname{sign}\left(\boldsymbol{F}\left(\boldsymbol{x}_{q} ; \theta\right)\right) $ | (17) |
式中,
3 实验及结果
3.1 数据集与评价指标
实验在公开数据集CIFAR-10(Krizhevsky,2009)、NUS-WIDE(web image dataset from National University of Singapore)(Chua等,2009)和Flickr(Huiskes和Lew,2008)上进行,对本文提出的方法进行评估。
CIFAR-10包含60 000幅32×32像素的彩色图像,共10类,每幅图像属于其中的一类。NUS-WIDE包含近270 000幅彩色图像,共81类,每幅图像至少属于其中一类,遵循Lai等人(2015)的实验设置,实验只使用其中与最常见的21类相关的图像数据。Flickr包含从Flickr网站搜集的20 015幅图像,共24类,每幅图像至少属于其中一类语义标签。
使用平均准确率均值(mean average precision,mAP)作为评价指标。对于一个给定的查询图像
$ A P(\boldsymbol{q})=\frac{1}{Q} \sum\limits_{r=1}^{N} P_{q}(r) {rel}(r) $ | (18) |
式中,
3.2 实验环境设置
实验环境为Ubuntu18.04,CPU为E5-2670,内存64 GB,显卡型号为1080Ti 11 GB。遵循DPSH中的实验设置,将数据集划分为训练集、测试集以及查询数据库。对于CIFAR-10数据集,随机选择5 000个数据作为训练集,1 000个数据作为测试集,剩余数据作为查询数据库。对于NUS-WIDE,在每一类中随机选取500个数据,共计10 500个数据作为训练集,随机选择2 100个数据作为查询数据,剩余数据作为查询数据集。对于Flickr,随机选取10 000个数据作为训练集,2 000个数据作为测试集,剩余数据作为查询数据库。
考虑到AlexNet(Krizhevsky等,2012)具有良好的特征表达能力,将其作为基础网络结构,如图 1所示。同时,利用一个包含
在实验开始时,利用ImageNet(Deng等,2010)数据集上的预训练网络CNN-F(Chatfield等,2014)对SSDDH网络进行初始化。超参数
3.3 对比实验与分析
为了验证本文方法在图像检索中的性能,选择SH、ITQ、SDH、SH、ITQ、SDH、DSH、DHN、HashGAN和DPSH等哈希方法分别在哈希码长度为12 bit、24 bit、32 bit和48 bit上进行对比实验,结果如表 1所示。其中SH、ITQ和SDH属于非深度哈希方法,SH和ITQ为无监督哈希方法,SDH为监督哈希方法,DSH、DHN、HashGAN和DPSH属于深度哈希方法。
表 1
不同哈希算法在3个数据集上的mAP结果
Table 1
mAPs results of different hashing methods on three datasets

方法 | CIFAR-10 | NUS-WIDE | Flickr | |||||||||||
12 bit | 24 bit | 32 bit | 48 bit | 12 bit | 24 bit | 32 bit | 48 bit | 12 bit | 24 bit | 32 bit | 48 bit | |||
DPSH (Li等,2016) | 0.713 | 0.727 | 0.744 | 0.757 | 0.752 | 0.790 | 0.794 | 0.812 | 0.837 | 0.844 | 0.856 | 0.863 | ||
HashGAN (Cao等,2018) | 0.660 | 0.722 | 0.731 | 0.735 | 0.642 | 0.699 | 0.737 | 0.751 | 0.790 | 0.826 | 0.841 | 0.848 | ||
DHN (Zhu等,2016) | 0.555 | 0.594 | 0.603 | 0.621 | 0.708 | 0.735 | 0.748 | 0.758 | 0.823 | 0.839 | 0.844 | 0.847 | ||
DSH (Jin等,2014) | 0.616 | 0.651 | 0.66 | 0.675 | 0.548 | 0.551 | 0.558 | 0.562 | 0.805 | 0.835 | 0.842 | 0.845 | ||
SDH (Shen等,2015) | 0.285 | 0.329 | 0.341 | 0.356 | 0.568 | 0.600 | 0.608 | 0.637 | Nan | Nan | Nan | Nan | ||
ITQ (Gong等,2013) | 0.162 | 0.169 | 0.172 | 0.175 | 0.452 | 0.468 | 0.472 | 0.477 | 0.684 | 0.695 | 0.697 | 0.697 | ||
SH (Weiss等,2008) | 0.127 | 0.128 | 0.126 | 0.129 | 0.454 | 0.406 | 0.405 | 0.400 | 0.645 | 0.651 | 0.65 | 0.646 | ||
本文 | 0.749 | 0.762 | 0.774 | 0.785 | 0.787 | 0.811 | 0.816 | 0.823 | 0.840 | 0.852 | 0.862 | 0.871 | ||
注:加粗字体为各列最优结果, Nan表示该方法在数据集上没有实验数据。 |
从表 1可以看出,监督哈希方法的精度普遍高于无监督哈希方法,深度哈希方法的性能比非深度哈希方法更好。在数据集CIFAR-10上,SSDDH的mAP有明显提升。与无监督方法SH和ITQ相比,取得了近60%的增长。与未采用深度学习的监督哈希SDH相比,取得了超过40%的增长。与深度哈希方法相比,较DSH和DHN取得了超过10%的mAP增长,较HashGAN和DPSH,在哈希码长度为12 bit和24 bit时性能提升明显,在哈希码长度为32 bit和48 bit时增幅较小,有2%的增长。在数据集NUS-WIDE上,SSDDH在不同哈希码长度情况下均获得了最佳的检索性能。与SH和ITQ相比,取得了超过30%的性能提升。与SDH、DSH、DHN和Hash GAN相比,取得了10%~20%的增长。与DPSH相比,增幅同样较小。在数据集Flickr上,SSDDH相比其他方法同样取得了较好的性能提升。
在查询结果方面,SSDDH与DPSH在CIFAR-10测试数据集(哈希码长度为48 bit)上的PR(precision-recall)曲线及二者对飞机、鸟、鹿、蛙和船类图像查询返回的前20个图像准确率对比如图 2和图 3所示,图 3中红色边框表示返回的错误类别图像。由图 2和图 3可知,SSDDH的PR曲线始终处于DPSH的上方。DPSH和SSDDH在飞机和鸟类返回结果中均未出现错误类别图像,但DPSH在鹿、蛙和船类返回结果中分别出现了6幅、1幅和3幅错误类别图像,而SSDDH仅在鹿和船类各返回1幅错误类别图像,由此可见SSDDH方法具有较好的检索性能。
为探究
此外,本文通过对比DPSH和SSDDH在训练过程中mAP值的变化,探究二者的差异。由于DPSH和SSDDH采用成对训练方式,因此二者的时间复杂度为O (n2)。实验在CIFAR-10上进行,哈希码长度为48 bit。为保证实验的公平性,设置SSDDH单次迭代更新的图像网络迭代次数为1。图 5为DPSH和SSDDH前30次迭代的mAP对比。可以看出,在训练初期,由于加入自监督损失和离散哈希码约束项,SSDDH的优化难度大于DPSH,因此mAP低于DPSH。在更新10次后,得益于自监督损失和离散哈希码约束项的加入,SSDDH的mAP值高于DPSH。
4 结论
考虑到现有深度哈希方法忽略了深度图像特征信息在深度哈希函数学习中的指导作用,且存在二进制量化误差较大问题,本文提出自监督的深度离散哈希SSDDH,旨在通过利用卷积神经网络提取的深度特征矩阵和图像标签矩阵计算得出二进制哈希码,并将其作为监督信息指导训练深度哈希函数。通过构建成对损失函数,在保持连续哈希码之间相似性的同时,保持了连续哈希码与二进制哈希码之间的相似性。利用循环坐标下降法,离散求解二进制码,显著减小了二进制量化误差。通过与现有方法在3个常用公开数据集上的对比分析,验证了本文方法比现有哈希检索方法更高效。
但是,由于本文方法是基于图像数据,因此不能适用所有类型的数据。下一步的研究目标是将目前方法扩展到跨模态检索。如给定一段文本数据,检索得到与之语义相似的图像、音频或者视频等数据,利用多种模态之间的复杂相关性提高检索性能。
参考文献
-
Andoni A and Indyk P. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions//The 47th Annual IEEE Symposium on Foundations of Computer Science. Berkeley, UK: IEEE: 459-468[DOI: 10.1109/FOCS.2006.49]
-
Cao Y, Liu B, Long M S and Wang J M. 2018. HashGAN: deep learning to hash with pair conditional Wasserstein GAN//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1287-1296[DOI: 10.1109/CVPR.2018.00140]
-
Chatfield K, Simonyan K, Vedaldi A and Zisserman A. 2014. Return of the devil in the details: delving deep into convolutional nets//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press: 1-12[DOI: 10.5244/C.28.6]
-
Chua T S, Tang J H, Hong R C, Li H J, Luo Z P and Zheng Y T. 2009. NUS-WIDE: a real-world web image database from national university of Singapore//Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM: 1-9[DOI: 10.1145/1646396.1646452]
-
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2010. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[DOI: 10.1109/cvpr.2009.5206848]
-
Gong Y C, Lazebnik S, Gordo A, Perronnin F. 2013. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2916-2929 [DOI:10.1109/TPAMI.2012.193]
-
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
-
He K M, Gkioxari G, Dollár P, Girshick R. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397 [DOI:10.1109/TPAMI.2018.2844175]
-
Huiskes M J and Lew M S. 2008. The MIR Flickr retrieval evaluation//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. Vancouver, Canada: ACM: 39-43[DOI: 10.1145/1460096]
-
Jiang Q Y and Li W J. 2017. Deep cross-modal hashing//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3270-3278[DOI: 10.1109/CVPR.2017.348]
-
Jiang Q Y, Cui X, Li W J. 2018. Deep discrete supervised hashing. IEEE Transactions on Image Processing, 27(12): 5996-6009 [DOI:10.1109/TIP.2018.2864894]
-
Jin Z M, Li C, Lin Y, Cai D. 2014. Density sensitive hashing. IEEE Transactions on Cybernetics, 44(8): 1362-1371 [DOI:10.1109/TCYB.2013.2283497]
-
Krizhevsky A. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009. University of Toronto
-
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 26th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 1097-1105
-
Kulis B and Grauman K. 2009. Kernelized locality-sensitive hashing for scalable image search//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2130-2137[DOI: 10.1109/ICCV.2009.5459466]
-
Lai H J, Pan Y, Liu Y and Yan S C. 2015. Simultaneous feature learning and hash coding with deep neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3270-3278[DOI: 10.1109/CVPR.2015.7298947]
-
Li Q, Sun Z N, He R and Tan T N. 2017. Deep supervised discrete hashing//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 2482-2491
-
Li W J, Wang S and Kang W C. 2016. Feature learning based deep supervised hashing with pairwise labels//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, USA: AAAI: 1711-1717
-
Liu H M, Wang R P, Shan S G and Chen X L. 2016a. Deep supervised hashing for fast image retrieval//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2064-2072[DOI: 10.1109/CVPR.2016.227]
-
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016b. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 398-413[DOI: 10.1007/978-3-319-46448-0_2]
-
Liu W, Mu C, Kumar S and Chang S F. 2014. Discrete graph hashing//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 3419-3427
-
Liu W, Wang J, Ji R R, Jiang Y G and Chang S F. 2012. Supervised hashing with kernels//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2074-2081[DOI: 10.1109/CVPR.2012.6247912]
-
Liu Y, Pan Y, Xia R K, Liu D, Yin J. 2016. FP-CNN: a fast image hashing algorithm based on deep convolutional neural network. Computer Science, 43(9): 39-46, 51 (刘冶, 潘炎, 夏榕楷, 刘荻, 印鉴. 2016. FP-CNNH: 一种基于深度卷积神经网络的快速图像哈希算法. 计算机科学, 43(9): 39-46, 51) [DOI:10.11896/j.issn.1002-137X.2016.9.007]
-
Lu X Q, Chen Y X, Li X L. 2020. Siamese dilated inception hashing with intra-group correlation enhancement for image retrieval. IEEE Transactions on Neural Networks and Learning Systems, 31(8): 3032-3046 [DOI:10.1109/TNNLS.2019.2935118]
-
Noh H, Hong S and Han B. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1520-1528[DOI: 10.1109/ICCV.2015.178]
-
Ren S Q, He K M, Girshick R, Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI:10.1109/TPAMI.2016.2577031]
-
Shen F M, Shen C H, Liu W and Shen H T. 2015. Supervised discrete hashing//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 37-45[DOI: 10.1109/CVPR.2015.7298598]
-
Shen Y M, Feng Y, Fang B, Zhou M L, Kwong S, Qiang B H. 2020. DSRPH: deep semantic-aware ranking preserving hashing for efficient multi-label image retrieval. Information Sciences, 539: 145-156 [DOI:10.1016/j.ins.2020.05.114]
-
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-05-17]. http://arxiv.org/pdf/1409.1556.pdf
-
Wang J, Kumar S and Chang S F. 2010. Sequential projection learning for hashing with compact codes//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress: 1127-1134
-
Wang Z M, Zhang H. 2019. A fast image retrieval method based on multi-layer CNN features. Journal of Computer-Aided Design and Computer Graphics, 31(8): 1410-1416 (王志明, 张航. 2019. 融合多层卷积神经网络特征的快速图像检索方法. 计算机辅助设计与图形学学报, 31(8): 1410-1416) [DOI:10.3724/SP.J.1089.2019.17845]
-
Weiss Y, Torralba A and Fergus R. 2008. Spectral hashing//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 1753-1760
-
Xia R K, Pan Y, Lai H J, Liu C and Yan S C. 2014. Supervised hashing for image retrieval via image representation learning//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada: AAAI: 2156-2162
-
Yan C G, Xie H T, Yang D B, Yin J, Zhang Y D, Dai Q H. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Transactions on Intelligent Transportation Systems, 19(1): 284-295 [DOI:10.1109/TITS.2017.2749965]
-
Zhang P C, Zhang W, Li W J and Guo M Y. 2014. Supervised hashing with latent factor models//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Australia: ACM: 173-182[DOI: 10.1145/2600428.2609600]
-
Zhu H, Long M S, Wang J M and Cao Y. 2016. Deep hashing network for efficient similarity retrieval//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI: 2415-2421