自监督深度离散哈希图像检索

万方; 强浩鹏; 雷光波

doi:10.11834/jig.200212

图像分析和识别 | 浏览量 : 0 下载量: 58 CSCD: 2

PDF
导出
分享
收藏
专辑

自监督深度离散哈希图像检索
Self-supervised deep discrete hashing for image retrieval
2021年26卷第11期页码：2659-2669
收稿：2020-06-17，

修回：2020-11-20，

录用：2020-11-27，

纸质出版：2021-11-16
DOI： 10.11834/jig.200212
稿件说明：

移动端阅览

万方, 强浩鹏, 雷光波. 自监督深度离散哈希图像检索[J]. 中国图象图形学报, 2021,26(11):2659-2669. DOI： 10.11834/jig.200212.

Fang Wan, Haopeng Qiang, Guangbo Lei. Self-supervised deep discrete hashing for image retrieval[J]. Journal of Image and Graphics, 2021, 26(11): 2659-2669. DOI： 10.11834/jig.200212.

摘要

目的

基于深度学习的图像哈希检索是图像检索领域的热点研究问题。现有的深度哈希方法忽略了深度图像特征在深度哈希函数训练中的指导作用，并且由于采用松弛优化，不能有效处理二进制量化误差较大导致的生成次优哈希码的问题。对此，提出一种自监督的深度离散哈希方法(self-supervised deep discrete hashing，SSDDH)。

方法

利用卷积神经网络提取的深度特征矩阵和图像标签矩阵，计算得到二进制哈希码并作为自监督信息指导深度哈希函数的训练。构造成对损失函数，同时保持连续哈希码之间相似性以及连续哈希码与二进制哈希码之间的相似性，并利用离散优化算法求解得到哈希码，有效降低二进制量化误差。

结果

将本文方法在3个公共数据集上进行测试，并与其他哈希算法进行实验对比。在CIFAR-10、NUS-WIDE(web image dataset from National University of Singapore)和Flickr数据集上，本文方法的检索精度均为最高，本文方法的准确率比次优算法DPSH(deep pairwise-supervised hashing)分别高3%、3%和1%。

结论

本文提出的基于自监督的深度离散哈希的图像检索方法能有效利用深度特征信息和图像标签信息，并指导深度哈希函数的训练，且能有效减少二进制量化误差。实验结果表明，SSDDH在平均准确率上优于其他同类算法，可以有效完成图像检索任务。

Abstract

Objective

Hashing techniques have attracted much attention and are widely applied in the nearest neighbor search for image retrieval on large-scale datasets due to the low storage cost and fast retrieval speed. With the great development of deep learning

deep neural networks have been widely incorporated in image hashing retrieval

and existing deep learning-based hashing methods demonstrate the effectiveness of the end-to-end deep learning architecture for hashing learning. However

these methods have several problems. First

these existing deep hashing methods ignore the guiding role of deep image feature information in training deep hashing functions. Second

most deep hashing methods are to solve a relaxed problem first to simplify the optimization involved in a binary code learning procedure and then quantize the solved continuous solution to achieve the approximate binary solution. This optimization strategy leads to a large binary quantization error

result ing in the generation of suboptimal hash codes. Thus

to solve these two problems

a self-supervised deep discrete hashing method (SSDDH) is proposed in this study.

Method

The proposed SSDDH consists of two steps. First

using matrix decomposition

the binary hash code is obtained by solving the self-supervised loss function composed of the deep feature matrix extracted by the convolutional neural network and the image label matrix. The obtained binary hash code is used as the supervision information to guide the training of deep hash function. Second

a pair-wise loss function is constructed to maintain the similarity between the hash codes generated by deep hash function while maintaining the similarity between these hash codes and binary hash codes. The discrete optimization algorithm is used to solve the optimal solution of the objective function

thus effectively reducing the binary quantization error.

Result

Several experiments are conducted on three public datasets to validate the performance of the proposed algorithm. The first experiment compares the mean average precision (mAP) values of different existing hash methods on different hash code lengths

including unsupervised methods

supervised shallow methods

and supervised deep methods. The performance experimental results show that the mAP of our method SSDDH achieves the best performance in all cases with different values of the code length. On the CIFAR-10 and NUS-WIDE(web image dataset from National University of Singapore) datasets

the mAP of SSDDH is 3% higher than the next highest method named DPSH(deep pairwise-supervised hashing). On the Flickr dataset

SSDDH is also 1% higher than the highest method DPSH. The second experiment involves the CIFAR-10 dataset. The precision recall (PR) curves of DPSH and SSDDH are plotted. Query result comparison shows the PR curves of DPSH and SSDDH with 48-bit hash codes on CIFAR-10

and our SSDDH remarkably outperforms its competitor. SSDDH and DPSH are also compared in terms of the accuracy of the top 20 returned images when the hash code length is 48 bits. The result of the experiment is visualized for easy observation. We also found that the retrieval performance of SSDDH is considerably higher than that of DPSH. Experiment 3 is designed for parameter sensitivity analysis of SSDDH. Here

a parameter is used

while the others are fixed. Our method is insensitive to the parameters. This finding relatively demonstrates the robustness and effectiveness of the proposed method. Experiment 4 is conducted on CIFAR-10 when the hash code length is 48 bits to explore the difference between DPSH and SSDDH in time complexity. At the later stage of model training

SSDDH performance is better than DPSH at the same time consumption.

Conclusion

Considering that the existing deep hash methods ignore the guiding role of deep image feature information in the training of deep hash function and have the problem of large binary quantization error

this study proposes a self-supervised deep discrete hashing method named SSDDH. The deep feature matrix extracted by the convolutional neural network and the image label matrix are used to obtain the binary hash codes and make the binary hash codes the supervised information to guide the training of deep hash function. The similarity between the hash codes generated by deep hash function and the similarity between these hash codes and binary hash codes are maintained by constructing a pair-wise loss function. The binary quantization error is effectively reduced using the discrete cyclic coordinate descent. Comparison with several existing methods on three commonly used public datasets proves that this method is more efficient than the existing hash retrieval method. Future work lies in two aspects: First

focus will be on learning better fine-grained representation with more effectively. Second

semi-supervised regularization will be applied to our framework to make full use of the unlabeled data. Both will be employed to boost the image retrieval accuracy further. Third

our current approach will be extended to cross-modal retrieval

such as given a text query

to obtain all semantic relevant images from the database.

关键词

Keywords

references

Andoni A and Indyk P. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions//The 47th Annual IEEE Symposium on Foundations of Computer Science. Berkeley, UK: IEEE: 459-468[ DOI: 10.1109/FOCS.2006.49 http://dx.doi.org/10.1109/FOCS.2006.49 ]

Cao Y, Liu B, Long M S and Wang J M. 2018. HashGAN: deep learning to hash with pair conditional Wasserstein GAN//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1287-1296[ DOI: 10.1109/CVPR.2018.00140 http://dx.doi.org/10.1109/CVPR.2018.00140 ]

Chatfield K, Simonyan K, Vedaldi A and Zisserman A. 2014. Return of the devil in the details: delving deep into convolutional nets//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press: 1-12[ DOI: 10.5244/C.28.6 http://dx.doi.org/10.5244/C.28.6 ]

Chua T S, Tang J H, Hong R C, Li H J, Luo Z P and Zheng Y T. 2009. NUS-WIDE: a real-world web image database from national university of Singapore//Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM: 1-9[ DOI: 10.1145/1646396.1646452 http://dx.doi.org/10.1145/1646396.1646452 ]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2010. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[ DOI: 10.1109/cvpr.2009.5206848 http://dx.doi.org/10.1109/cvpr.2009.5206848 ]

Gong Y C, Lazebnik S, Gordo A and Perronnin F. 2013. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2916-2929[DOI:10.1109/TPAMI.2012.193]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

He K M, Gkioxari G, Dollár P and Girshick R. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397[DOI:10.1109/TPAMI.2018.2844175]

Huiskes M J and Lew M S. 2008. The MIR Flickr retrieval evaluation//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. Vancouver, Canada: ACM: 39-43[ DOI: 10.1145/1460096 http://dx.doi.org/10.1145/1460096 ]

Jiang Q Y and Li W J. 2017. Deep cross-modal hashing//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3270-3278[ DOI: 10.1109/CVPR.2017.348 http://dx.doi.org/10.1109/CVPR.2017.348 ]

Jiang Q Y, Cui X and Li W J. 2018. Deep discrete supervised hashing. IEEE Transactions on Image Processing, 27(12): 5996-6009[DOI:10.1109/TIP.2018.2864894]

Jin Z M, Li C, Lin Y and Cai D. 2014. Density sensitive hashing. IEEE Transactions on Cybernetics, 44(8): 1362-1371[DOI:10.1109/TCYB.2013.2283497]

Krizhevsky A. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009. University of Toronto

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 26th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 1097-1105

Kulis B and Grauman K. 2009. Kernelized locality-sensitive hashing for scalable image search//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2130-2137[ DOI: 10.1109/ICCV.2009.5459466 http://dx.doi.org/10.1109/ICCV.2009.5459466 ]

Lai H J, Pan Y, Liu Y and Yan S C. 2015. Simultaneous feature learning and hash coding with deep neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3270-3278[ DOI: 10.1109/CVPR.2015.7298947 http://dx.doi.org/10.1109/CVPR.2015.7298947 ]

Li Q, Sun Z N, He R and Tan T N. 2017. Deep supervised discrete hashing//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 2482-2491

Li W J, Wang S and Kang W C. 2016. Feature learning based deep supervised hashing with pairwise labels//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, USA: AAAI: 1711-1717

Liu H M, Wang R P, Shan S G and Chen X L. 2016a. Deep supervised hashing for fast image retrieval//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2064-2072[ DOI: 10.1109/CVPR.2016.227 http://dx.doi.org/10.1109/CVPR.2016.227 ]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016b. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 398-413[ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]

Liu W, Mu C, Kumar S and Chang S F. 2014. Discrete graph hashing//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 3419-3427

Liu W, Wang J, Ji R R, Jiang Y G and Chang S F. 2012. Supervised hashing with kernels//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2074-2081[ DOI: 10.1109/CVPR.2012.6247912 http://dx.doi.org/10.1109/CVPR.2012.6247912 ]

LiuY, Pan Y, Xia R K, Liu D and Yin J. 2016. FP-CNN: a fast image hashing algorithm based on deep convolutional neural network. Computer Science, 43(9): 39-46, 51

刘冶, 潘炎, 夏榕楷, 刘荻, 印鉴. 2016. FP-CNNH: 一种基于深度卷积神经网络的快速图像哈希算法. 计算机科学, 43(9): 39-46, 51[DOI:10.11896/j.issn.1002-137X.2016.9.007]

Lu X Q, Chen Y X and Li X L. 2020. Siamese dilated inception hashing with intra-group correlation enhancement for image retrieval. IEEE Transactions on Neural Networks and Learning Systems, 31(8): 3032-3046[DOI:10.1109/TNNLS.2019.2935118]

Noh H, Hong S and Han B. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1520-1528[ DOI: 10.1109/ICCV.2015.178 http://dx.doi.org/10.1109/ICCV.2015.178 ]

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI:10.1109/TPAMI.2016.2577031]

Shen F M, Shen C H, Liu W and Shen H T. 2015. Supervised discrete hashing//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 37-45[ DOI: 10.1109/CVPR.2015.7298598 http://dx.doi.org/10.1109/CVPR.2015.7298598 ]

Shen Y M, Feng Y, Fang B, Zhou M L, Kwong S and Qiang B H. 2020. DSRPH: deep semantic-aware ranking preserving hashing for efficient multi-label image retrieval. Information Sciences, 539: 145-156[DOI:10.1016/j.ins.2020.05.114]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-05-17] . http://arxiv.org/pdf/1409.1556.pdf http://arxiv.org/pdf/1409.1556.pdf

Wang J, Kumar S and Chang S F. 2010. Sequential projection learning for hashing with compact codes//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress: 1127-1134

Wang Z M and Zhang H. 2019. A fast image retrieval method based on multi-layer CNN features. Journal of Computer-Aided Design and Computer Graphics, 31(8): 1410-1416

王志明, 张航. 2019. 融合多层卷积神经网络特征的快速图像检索方法. 计算机辅助设计与图形学学报, 31(8): 1410-1416[DOI:10.3724/SP.J.1089.2019.17845]

Weiss Y, Torralba A and Fergus R. 2008. Spectral hashing//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 1753-1760

Xia R K, Pan Y, Lai H J, Liu C and Yan S C. 2014. Supervised hashing for image retrieval via image representation learning//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada: AAAI: 2156-2162

Yan C G, Xie H T, Yang D B, Yin J, Zhang Y D and Dai Q H. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Transactions on Intelligent Transportation Systems, 19(1): 284-295[DOI:10.1109/TITS.2017.2749965]

Zhang P C, Zhang W, Li W J and Guo M Y. 2014. Supervised hashing with latent factor models//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Australia: ACM: 173-182[ DOI: 10.1145/2600428.2609600 http://dx.doi.org/10.1145/2600428.2609600 ]

Zhu H, Long M S, Wang J M and Cao Y. 2016. Deep hashing network for efficient similarity retrieval//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI: 2415-2421