融合词袋连通图的图像检索特征选择

李国祥; 王继军; 马文斌

doi:10.11834/jig.200371

NCIG 2020 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

融合词袋连通图的图像检索特征选择
Feature selection method for image retrieval based on connected graphs and bag of words
2021年26卷第10期页码：2533-2544
纸质出版日期： 2021-09-16 ，

录用日期： 2020-11-04
DOI： 10.11834/jig.200371
稿件说明：

移动端阅览

李国祥, 王继军, 马文斌. 融合词袋连通图的图像检索特征选择[J]. 中国图象图形学报, 2021,26(10):2533-2544.

Guoxiang Li, Jijun Wang, Wenbin Ma. Feature selection method for image retrieval based on connected graphs and bag of words[J]. Journal of Image and Graphics, 2021,26(10):2533-2544.
李国祥, 王继军, 马文斌. 融合词袋连通图的图像检索特征选择[J]. 中国图象图形学报, 2021,26(10):2533-2544. DOI： 10.11834/jig.200371.

Guoxiang Li, Jijun Wang, Wenbin Ma. Feature selection method for image retrieval based on connected graphs and bag of words[J]. Journal of Image and Graphics, 2021,26(10):2533-2544. DOI： 10.11834/jig.200371.

摘要

目的

随着图像检索所依赖的特征愈发精细化，在提高检索精度的同时，也不可避免地产生众多非相关和冗余的特征。针对在大规模图像检索和分类中高维度特征所带来的时间和空间挑战，从减少特征数量这一简单思路出发，提出了一种有效的连通图特征点选择方法，探寻图像检索精度和特征选择间的平衡。

方法

基于词袋模型（bag of words，BOW）的图像检索机制，结合最近邻单词交叉核、特征距离和特征尺度等属性，构建包含若干个连通分支和平凡图的像素级特征分离图，利用子图特征点的逆文本频率修正边权值，从各连通分量的节点数量和孤立点最近邻单词相关性两个方面开展特征选择，将问题转化为在保证图像匹配精度情况下，最小化特征分离图的阶。

结果

实验采用Oxford和Paris公开数据集，在特征存储容量、时间复杂度集和检索精度等方面进行评估，并对不同特征抽取和选择方法进行了对比。实验结果表明选择后的特征数量和存储容量有效约简50%以上；100 k词典的KD-Tree查询时间减少近58%；相对于其他编码方法和全连接层特征，Oxford数据集检索精度平均提升近7.5%；Paris数据集中检索精度平均高于其他编码方法4%，但检索效果不如全连接层特征。大量实验表明了大连通域的冗余性和孤立点的可选择性。

结论

通过构建特征分离图，摒弃大连通域的冗余特征点，保留具有最近邻单词相关性的孤立特征点，最终形成图像的精简特征点集。整体检索效果稳定，其检索精度基本与原始特征点集持平，且部分类别效果优于原始特征和其他方法。同时，选择后特征的重用性好，方便进一步聚合集成。

Abstract

Objective

Features have to be more refined to improve the accuracy of image retrieval. As a result

a large amount of irrelevant and redundant features is also produced inevitably

which leads to a high requirement of memory and computation

especially for large-scale image retrieval.Thus

feature selection plays a critical role in image retrieval.Based on the principle of feature number reduction

we propose a novel

effectively connected component feature selection method and explore a tradeoff between image retrieval accuracy and feature selection in this paper.

Method

First

we construct a pixel-level feature separate graph that contains several connected branches and trivial graphs based on the bag of words(BOW) principle by combining different characteristics such as nearest word cross kernel

feature distance

and feature scale. Then

we calculate the cross kernel among the first D nearest neighbor words of each feature point. If the crossing set is empty and the distance and scale between feature points satisfy the established conditions

we assume that these two feature points belong to the same group. Then

we select features according to the node number of each connected component and the correlation of nearest words of isolated points. In this process

we use inverse document frequency as the weight of the first D nearest neighbor words to measure the contribution. Finally

we transform the problem to minimize the network order of the feature separated graph in guaranteeing the accuracy of image matching and select feature points from isolated point and connected branches. If the maximum cross kernel of the isolated point with other points is greater than the threshold $n$

then we retain it as a valid feature point. If the connected component of the graph is less than the preset threshold $γ$

then we retain these points in the connected branch as valid feature points

Result

We adopt the public Oxford datasets and Paris datasets

and evaluate the proposed method on the aspects of feature restore requirement

time complex set

and retrieval accuracy using the Kronecker product as the matching kernel. We also compare the proposed method with different feature extraction and selection methods

such as Vlad

spetral regression-locality preserving projection(SR-LPP)

and deep learning features. Experimental results demonstrate that the number of features and storage are reduced to more than 50% in guaranteeing the original retrieve accuracy. Compared with other methods

the retrieval time of KD-Tree for the 100 k dictionary is reduced by nearly 58%. The retrieval method is stable

and the selected features have excellent reusability for further clustering and assembling. When using Oxford as a test set

theretrieval accuracies of selected features are similar to the original features in each type of building

and the retrieval accuracy is better for several categories such as Allsouls

Ashmolean

and Keble. Compared with other coding methods and features from fully connected layers

the retrieval accuracy is improved by nearly 7.5% on average.When tested on the Paris dataset

our algorithm improves retrieval accuracy by approximately 2%5% overall.

Conclusion

Extensive experiments demonstrate the redundancy of largely connected areas and the selectivity of isolated points. We can retain the isolated feature points with the nearest word correlation and finally form a refined feature point set of image by constructing the feature separated graph and discarding the redundant feature points of the large connected area. The retrieval accuracy is comparable with the original feature point set and outperforms the original feature and other encoding methods in several categories. Moreover

the obtained feature points maintain original independence

reduce dimension reduction

and can further achieve clustering. It is convenient to be transplanted and coded in different dictionaries.We attempt to integrate it with principal component analysis(PCA)

which can reduce the dimension again only with the time of feature projection

but has an outstanding retrieval effect.

关键词

词袋模型(BOW)特征选择图像检索连通分量聚合特征

Keywords

bag of words(BOW)features selectionimage retrievalconnected componentaggregated descriptors

references

ArandjelovićR and Zisserman A. 2012. Three things everyone should know to improve object retrieval//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2911-2918[DOI: 10.1109/CVPR.2012.6248018http://dx.doi.org/10.1109/CVPR.2012.6248018]

Avrithis Y and Tolias G. 2014. Hough pyramid matching: speeded-up geometry re-ranking for large scale image retrieval. International Journal of Computer Vision, 107(1): 1-19[DOI:10.1007/s11263-013-0659-3]

Babenko A, Slesarev A, Chigorin A and Lempitsky V. 2014. Neural codes for image retrieval//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer-Verlag: 584-599[DOI: 10.1007/978-3-319-10590-1_38http://dx.doi.org/10.1007/978-3-319-10590-1_38]

Bursuc A, Tolias G and Jégou H. 2015. Kernel local descriptors with implicit rotation matching//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. Shanghai, China: ACM: 595-598[DOI: 10.1145/2671188.2749379http://dx.doi.org/10.1145/2671188.2749379]

Chum O. 2015. Low dimensional explicit feature maps//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4077-4085[DOI: 10.1109/ICCV.2015.464http://dx.doi.org/10.1109/ICCV.2015.464]

Jégou H and Chum O. 2012. Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening//Proceedings of European Conference on Computer Vision. Florence, Italy: Springer: 774-787[DOI: 10.1007/978-3-642-33709-3_55http://dx.doi.org/10.1007/978-3-642-33709-3_55]

Jégou H, Douze M and Schmid C. 2008. Hamming embedding and weak geometric consistency for large scale image search//Proceedings of European Conference on Computer Vision. Marseille, France: Springer-Verlag: 304-317[DOI: 10.1007/978-3-540-88682-2_24http://dx.doi.org/10.1007/978-3-540-88682-2_24]

Jégou H, Douze M, Schmid C and Pérez P. 2010. Aggregating local descriptors into a compact image representation//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 3304-3311[DOI: 10.1109/CVPR.2010.5540039http://dx.doi.org/10.1109/CVPR.2010.5540039]

Karakasis E G, Amanatiadis A, Gasteratos A and Chatzichristofis S A. 2015. Image moment invariants as local features for content based image retrieval using the bag-of-visual-words model. Pattern Recognition Letters, 55: 22-27[DOI:10.1016/j.patrec.2015.01.005]

Lazebnik S, Schmid C and Ponce J. 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories//Proceedings of 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2169-2178[DOI: 10.1109/CVPR.2006.68http://dx.doi.org/10.1109/CVPR.2006.68]

Lin T, Roychowdhury A and Maji S. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1309-1322[DOI:10.1109/TPAMI.2017.2723400]

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60: 91-110[DOI:10.1023/B:VISI.0000029664.99615.94]

Mukundan A, Tolias G and Chum O. 2017. Multiple-kernel local-patch descriptor[EB/OL]. [2020-08-08]. https://arxiv.org/pdf/1707.07825.pdf

Perronnin F, Sánchez J and Mensink T. 2010. Improving the fisher kernel for large-scale image classification//Proceedings of European Conference on Computer Vision. Heraklion, Greece: Springer: 143-156[DOI: 10.1007/978-3-642-15561-1_11http://dx.doi.org/10.1007/978-3-642-15561-1_11]

Razavian A S, Azizpour H, Sullivan J and Carlsson S. 2014. CNN features off-the-shelf: an astounding baseline for recognition//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, USA: IEEE: 512-519[DOI: 10.1109/CVPRW.2014.131http://dx.doi.org/10.1109/CVPRW.2014.131]

Sánchez J, Perronnin F, Mensink T and Verbeek J. 2013. Image classification with the fisher vector: theory and practice. International Journal of Computer Vision, 105(3): 222-245[DOI:10.1007/s11263-013-0636-x]

Tolias G, Avrithis Y and Jégou H. 2013. To aggregate or not to aggregate: selective match kernels for image search//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1401-1408[DOI: 10.1109/ICCV.2013.177http://dx.doi.org/10.1109/ICCV.2013.177]

Torii A, Sivic J, Okutomi M and Pajdla T. 2015. Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11): 2346-2359[DOI:10.1109/TPAMI.2015.2409868]

Turcot P and Lowe D G. 2009. Better matching with fewer features: the selection of useful features in large database recognition problems//Proceedings of the 12th IEEE International Conference on Computer Vision Workshops. Kyoto, Japan: IEEE: 2109-2116[DOI: 10.1109/ICCVW.2009.5457541http://dx.doi.org/10.1109/ICCVW.2009.5457541]

Vedaldi A and Zisserman A. 2010. Efficient additive kernels via explicit feature maps//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 3539-3546[DOI: 10.1109/CVPR.2010.5539949http://dx.doi.org/10.1109/CVPR.2010.5539949]

Yan K, Wang Y W, Liang D W, Huang T J and Tian Y H. 2016. CNN vs. SIFT for image retrieval: alternative or complementary?//Proceedings of the 24th ACM International Conference on Multimedia. Amsterdam, The Netherlands: ACM: 407-411[DOI: 10.1145/2964284.2967252http://dx.doi.org/10.1145/2964284.2967252]

Zheng L, Yang Y and Tian Q. 2018. SIFT meets CNN: a decade survey of instance retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5): 1224-1244[DOI:10.1109/TPAMI.2017.2709749]

文章被引用时，请邮件提醒。

提交

结合Transformer与非对称学习策略的图像检索

基于Gaussian-Hermite矩的旋转运动模糊不变量

非局部注意力双分支网络的跨模态赤足足迹检索

噪声鲁棒的轻量级深度遥感场景图像分类检索

自监督深度离散哈希图像检索