Image retrieval based on transformer and asymmetric learning strategy
- Vol. 28, Issue 2, Pages: 535-544(2023)
Published: 16 February 2023 ,
Accepted: 21 February 2022
DOI: 10.11834/jig.210842
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 February 2023 ,
Accepted: 21 February 2022
移动端阅览
Chao He, Hongxi Wei. Image retrieval based on transformer and asymmetric learning strategy. [J]. Journal of Image and Graphics 28(2):535-544(2023)
目的
2
图像检索是计算机视觉领域的一项基础任务,大多采用卷积神经网络和对称式学习策略,导致所需训练数据量大、模型训练时间长、监督信息利用不充分。针对上述问题,本文提出一种Transformer与非对称学习策略相结合的图像检索方法。
方法
2
对于查询图像,使用Transformer生成图像的哈希表示,利用哈希损失学习哈希函数,使图像的哈希表示更加真实。对于待检索图像,采用非对称式学习策略,直接得到图像的哈希表示,并将哈希损失与分类损失相结合,充分利用监督信息,提高训练速度。在哈希空间通过计算汉明距离实现相似图像的快速检索。
结果
2
在CIFAR-10和NUS-WIDE两个数据集上,将本文方法与主流的5种对称式方法和性能最优的两种非对称式方法进行比较,本文方法的mAP(mean average precision)比当前最优方法分别提升了5.06%和4.17%。
结论
2
本文方法利用Transformer提取图像特征,并将哈希损失与分类损失相结合,在不增加训练数据量的前提下,减少了模型训练时间。所提方法性能优于当前同类方法,能够有效完成图像检索任务。
Objective
2
Image retrieval is one of the essential tasks in computer vision. Most of deep learning-based image retrieval methods are implemented via learning-symmetrical strategy. Training images are profiled into a pair and then melt into convolutional neural network (CNN) for features extraction. Similar loss is utilized to learn the hash-relevant images. Consequently
a feasible performance can be obtained through this symmetric learning scheme. In recent years
to improve its performance better
a mass of the CNNs-improved are extended horizontally or deepened vertically. CNNs-based structure is complicated and time-consuming for large scale image datasets. Recently
the Transformer has been developing to the domain of computer vision and its image classification ability is improved intensively. Our Transformer-relevant research is melted into the task of large-scale image retrieval method because the Transformer is feasible for large scale dataset like ImageNet-21k and JFT-300M. For symmetric methods
the overall image dataset-related should be involved in the training phase. Meanwhile
query images have to be added into a pair for training
which results in the problem of time-consuming. The hash function between the training image and the query image is learnt in terms of similar calculation. The information-supervised is used for the similarity matrix only
which is insufficient to be used. A learning-asymmetric scheme is optimized for training based on some images-selected only. Furthermore
the corresponding hash function can be learnt from the hash loss. For the rest of images
their feature representations can be picked out as well. Moreover
classification constraints can be used for the query images and the corresponding classification loss can be optimized in terms of learning-altered technology. To resolve the problems of time-consuming and the insufficient information-supervised
we develop a deep supervised hash image retrieval method in terms of the integrated Transformer and learning-asymmetric strategy.
Method
2
For the training images
a Transformer-designed can be utilized to generate the hash representation of these images. The hash loss is used to guarantee the hash representation of images is closer to the real hash value. In the original Transformer
its input is derived of one-dimensional data. First
each image is required to be divided into multiple blocks. Then
each block is mapped into one-dimensional vector. At last
to form the one-dimensional vector
these one-dimensional vectors of all blocks of one image are concatenated together. Our Transformer is designed and composed of 1) two normalization layers
2) a multi-head attention module
3) a fully-connected module
and 4) a hash layer. First
the input one-dimensional vector is penetrated into the normalization layer. Next
the output of the normalization layer is fed into the multi-head attention layer of those are 16 heads. Hence
multiple local features of images can be captured. Additionally
the residual link is developed to integrate the initial one-dimensional vector with the output of the multi-head attention layer. By this way
the global features of images can be preserved better. Finally
the representation vector of each image can be obtained through the fully-connected module and the hash layer. In this study
the process for generating representation vectors mentioned above will be replicated for 24 times. For the rest of images
the classification loss is taken as a constraint
which can be used to learn the hash representation of these images in an asymmetric way. To improve the training efficiency
supervised information can be used effectively. The query images are not required to be involved in. The model can be trained via learning-alternated technology. Specifically
first
hash representation of the query images and the weights of the classification are configured and initialized randomly. Then
the parameters of the model can be optimized in terms of stochastic gradient descent algorithm. Following by an epoch of training
the weights of the classification can be balanced by the trained model. Meanwhile
the hash representation ability of the rest of images is improved gradually. In this manner
the hash code of the rest images can be obtained from a well-trained model directly
which optimizes the training efficiency. Finally
our method can realize fast retrieval of similar images through the calculation of the Hamming distance in the hash space.
Result
2
In our experiment
our method is compared to five categories of symmetric methods and two sorts of asymmetric methods based on two large scale image retrieval datasets. The performance of the proposed method is increased by 5.06% and 4.17% of each on the two datasets (i.e.
CIFAR-10 and NUS-WIDE). The evaluation metric is based on mean average precision (mAP). Ablation experiments validate that the classification loss can make the images closer to the real hash representation. Furthermore
the hyper-parameters of the classification loss are tested as well
and the appropriate hyper-parameters are obtained.
Conclusion
2
To complete the image retrieval task effectively
our Transformer-based method has its potentials on image features extraction for large scale image retrieval
and the hash loss can be melted into classification loss for model training further.
图像检索Transformer哈希函数非对称式学习哈希损失分类损失
image retrievalTransformerhash functionasymmetric learninghash lossclassification loss
Andoni A and Indyk P. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1): 117-122 [DOI: 10.1145/1327452.1327494]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Chen H T, Wang Y H, Guo T Y, Xu C, Deng Y P, Liu Z H, Ma S W, Xu C J, Xu C and Gao W. 2021a. Pre-trained image processing transformer//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 12294-12305 [DOI: 10.1109/CVPR46437.2021.01212http://dx.doi.org/10.1109/CVPR46437.2021.01212]
Chen Y B, Zhang S, Liu F X, Chang Z G, Ye M and Qi Z W. 2021b. Transhash: transformer-based hamming hashing for efficient image retrieval [EB/OL]. [2021-05-05].https://arxiv.org/pdf/2105.01823.pdfhttps://arxiv.org/pdf/2105.01823.pdf
Chua T S, Tang J H, Hong R C, Li H J, Luo Z P and Zheng Y T. 2009. NUS-WIDE: a real-world web image database from national university of Singapore//Proceedings of 2009 ACM International Conference on Image and Video Retrieval. Santorini, Fira Greece: Association for Computing Machinery: #48 [DOI: 10.1145/1646396.1646452http://dx.doi.org/10.1145/1646396.1646452]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words: transformers for image recognition at scale [EB/OL]. [2021-06-03].https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf
Gionis A, Indyk P and Motwani R. 1999. Similarity search in high dimensions via hashing//Proceedings of the 25th International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publishers Inc. : 518-529
Gong Y C, Lazebnik S, Gordo A and Perronnin F. 2013. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2916-2929 [DOI: 10.1109/TPAMI.2012.193]
Gu G H, Liu J T, Li Z Y, Huo W H and Zhao Y. 2020. Joint learning based deep supervised hashing for large-scale image retrieval. Neurocomputing, 385: 348-357 [DOI: 10.1016/j.neucom.2019.12.096]
Guo Y C, Ding G G, Liu L, Han J G and Shao L. 2017. Learning to hash with optimized anchor embedding for scalable retrieval. IEEE Transactions on Image Processing, 26(3): 1344-1354 [DOI: 10.1109/TIP.2017.2652730]
Jiang Q Y and Li W J. 2018. Asymmetric deep supervised hashing//Proceedings of the 32nd AAAI Conference on Artificial Intelligence and the 13th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI: 3342-3349
Krizhevsky A. 2009. Learning Multiple Layers of Features from Tiny Images. Toronto, Canada: University of Toronto
Lai H J, Pan Y, Liu Y and Yan S C. 2015. Simultaneous feature learning and hash coding with deep neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3270-3278 [DOI: 10.1109/CVPR.2015.7298947http://dx.doi.org/10.1109/CVPR.2015.7298947]
Li Q, Sun Z, He R and Tan T N. 2017. Deep supervised discrete hashing//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 2479-2488
Li W J, Wang S and Kang W C. 2016. Feature learning based deep supervised hashing with pairwise labels//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, USA: AAAI Press: 1711-1717
Li Y Q, Pei W J, Zha Y F and van Gemert J. 2019. Push for quantization: deep fisher hashing [EB/OL]. [2021-08-31].https://arxiv.org/pdf/1909.00206.pdfhttps://arxiv.org/pdf/1909.00206.pdf
Lin G S, Shen C H, Shi Q F, Van den Hengel A and Suter D. 2014. Fast supervised hashing with decision trees for high-dimensional data//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1971-1978 [DOI: 10.1109/CVPR.2014.253http://dx.doi.org/10.1109/CVPR.2014.253]
Liu W, Wang J, Ji R R, Jiang Y G and Chang S F. 2012. Supervised hashing with kernels//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2074-2081 [DOI: 10.1109/CVPR.2012.6247912http://dx.doi.org/10.1109/CVPR.2012.6247912]
Liu Y, Cheng M, Wang F P, Li D X, Liu W, Fan J L. 2020. Deep Hashing image retrieval methods, 25(7): 1296-1317http://www.cjig.cn/jig/ch/reader/view_abstract.aspx?file_no=20200702&flag=1.
刘颖, 程美, 王富平, 李大湘, 刘伟, 范九伦. 2020. 深度哈希图像检索方法综述. 中国图象图形学报, 25(7): 1296-1317[DOI: 10.11834/jig.19051810.11834/jig.190518]
Wan F, Qiang H P, Lei G B. 2021. Self-supervised deep discrete hashing for image retrieval. Journal of Image and Graphics, 26(11): 2659-2669
万方, 强浩鹏, 雷光波. 2021. 自监督深度离散哈希图像检索. 中国图象图形学报, 26(11): 2659-2669[DOI: 10.11834/jig.200212]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5000-6010
Wang Q, Li B, Xiao T, Zhu J B, Li C L, Wong D F and Chao L S. 2019. Learning deep transformer models for machine translation//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics: 1810-1822 [DOI: 10.18653/v1/P19-1176http://dx.doi.org/10.18653/v1/P19-1176]
Wu L, Ling H F, Li P, Chen J Z, Fang Y and Zhou F H. 2019. Deep supervised hashing based on stable distribution. IEEE Access, 7: 36489-36499 [DOI: 10.1109/ACCESS.2019.2900489]
Xia R K, Pan Y, Lai H J, Liu C and Yan S C. 2014. Supervised hashing for image retrieval via image representation learning//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada: AAAI Press: 2156-2162
Zhang D, Wang J, Cai D and Lu J S. 2010. Self-taught hashing for fast similarity search//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland: Association for Computing Machinery: 18-25 [DOI: 10.1145/1835449.1835455http://dx.doi.org/10.1145/1835449.1835455]
Zhang P C, Wei Z, Li W J and Guo M Y. 2014. Supervised hashing with latent factor models//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Australia: Association for Computing Machinery: 173-182 [DOI: 10.1145/2600428.2609600http://dx.doi.org/10.1145/2600428.2609600]
Zheng X T, Zhang Y C and Lu X Q. 2020. Deep balanced discrete hashing for image retrieval. Neurocomputing, 403: 224-236 [DOI: 10.1016/j.neucom.2020.04.037]
Zhu H, Long M S, Wang J M and Cao Y. 2016. Deep hashing network for efficient similarity retrieval//Proceedings of the 13th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press: 2415-2421
相关文章
相关作者
相关机构