融合词袋连通图的图像检索特征选择

李国祥; 王继军; 马文斌

发布时间： 2021-10-20
摘要点击次数： 938
全文下载次数： 661
DOI: 10.11834/jig.200371
2021 | Volume 26 | Number 10

融合词袋连通图的图像检索特征选择

李国祥^1,2, 王继军^2,3, 马文斌^1,2(1.广西财经学院教务处, 南宁 530003;2.广西师范大学广西多源信息挖掘与安全重点实验室, 桂林 541004;3.广西财经学院信息与统计学院, 南宁 530003)

摘要

目的随着图像检索所依赖的特征愈发精细化，在提高检索精度的同时，也不可避免地产生众多非相关和冗余的特征。针对在大规模图像检索和分类中高维度特征所带来的时间和空间挑战，从减少特征数量这一简单思路出发，提出了一种有效的连通图特征点选择方法，探寻图像检索精度和特征选择间的平衡。方法基于词袋模型（bag of words，BOW）的图像检索机制，结合最近邻单词交叉核、特征距离和特征尺度等属性，构建包含若干个连通分支和平凡图的像素级特征分离图，利用子图特征点的逆文本频率修正边权值，从各连通分量的节点数量和孤立点最近邻单词相关性两个方面开展特征选择，将问题转化为在保证图像匹配精度情况下，最小化特征分离图的阶。结果实验采用Oxford和Paris公开数据集，在特征存储容量、时间复杂度集和检索精度等方面进行评估，并对不同特征抽取和选择方法进行了对比。实验结果表明选择后的特征数量和存储容量有效约简50%以上；100 k词典的KD-Tree查询时间减少近58%；相对于其他编码方法和全连接层特征，Oxford数据集检索精度平均提升近7.5%；Paris数据集中检索精度平均高于其他编码方法4%，但检索效果不如全连接层特征。大量实验表明了大连通域的冗余性和孤立点的可选择性。结论通过构建特征分离图，摒弃大连通域的冗余特征点，保留具有最近邻单词相关性的孤立特征点，最终形成图像的精简特征点集。整体检索效果稳定，其检索精度基本与原始特征点集持平，且部分类别效果优于原始特征和其他方法。同时，选择后特征的重用性好，方便进一步聚合集成。

关键词

词袋模型(BOW) 特征选择图像检索连通分量聚合特征

Feature selection method for image retrieval based on connected graphs and bag of words

Li Guoxiang^1,2, Wang Jijun^2,3, Ma Wenbin^1,2(1.Department of Academic Affairs, Guangxi University of Finance and Economics, Nanning 530003, China;2.Guangxi Key Laboratory of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China;3.Department of Information and Statistics, Guangxi University of Finance and Economics, Nanning 530003, China)

Abstract

Objective Features have to be more refined to improve the accuracy of image retrieval. As a result, a large amount of irrelevant and redundant features is also produced inevitably, which leads to a high requirement of memory and computation, especially for large-scale image retrieval.Thus, feature selection plays a critical role in image retrieval.Based on the principle of feature number reduction, we propose a novel, effectively connected component feature selection method and explore a tradeoff between image retrieval accuracy and feature selection in this paper.Method First, we construct a pixel-level feature separate graph that contains several connected branches and trivial graphs based on the bag of words(BOW) principle by combining different characteristics such as nearest word cross kernel, feature distance, and feature scale. Then, we calculate the cross kernel among the first D nearest neighbor words of each feature point. If the crossing set is empty and the distance and scale between feature points satisfy the established conditions, we assume that these two feature points belong to the same group. Then, we select features according to the node number of each connected component and the correlation of nearest words of isolated points. In this process, we use inverse document frequency as the weight of the first D nearest neighbor words to measure the contribution. Finally, we transform the problem to minimize the network order of the feature separated graph in guaranteeing the accuracy of image matching and select feature points from isolated point and connected branches. If the maximum cross kernel of the isolated point with other points is greater than the threshold n, then we retain it as a valid feature point. If the connected component of the graph is less than the preset threshold γ, then we retain these points in the connected branch as valid feature points.Result We adopt the public Oxford datasets and Paris datasets, and evaluate the proposed method on the aspects of feature restore requirement, time complex set, and retrieval accuracy using the Kronecker product as the matching kernel. We also compare the proposed method with different feature extraction and selection methods,such as Vlad, spetral regression-locality preserving projection(SR-LPP), and deep learning features. Experimental results demonstrate that the number of features and storage are reduced to more than 50% in guaranteeing the original retrieve accuracy. Compared with other methods, the retrieval time of KD-Tree for the 100 k dictionary is reduced by nearly 58%. The retrieval method is stable, and the selected features have excellent reusability for further clustering and assembling. When using Oxford as a test set, theretrieval accuracies of selected features are similar to the original features in each type of building,and the retrieval accuracy is better for several categories such as Allsouls, Ashmolean, and Keble. Compared with other coding methods and features from fully connected layers, the retrieval accuracy is improved by nearly 7.5% on average.When tested on the Paris dataset,our algorithm improves retrieval accuracy by approximately 2%5% overall.Conclusion Extensive experiments demonstrate the redundancy of largely connected areas and the selectivity of isolated points. We can retain the isolated feature points with the nearest word correlation and finally form a refined feature point set of image by constructing the feature separated graph and discarding the redundant feature points of the large connected area. The retrieval accuracy is comparable with the original feature point set and outperforms the original feature and other encoding methods in several categories. Moreover, the obtained feature points maintain original independence, reduce dimension reduction, and can further achieve clustering. It is convenient to be transplanted and coded in different dictionaries.We attempt to integrate it with principal component analysis(PCA), which can reduce the dimension again only with the time of feature projection, but has an outstanding retrieval effect.

Keywords

bag of words(BOW) features selection image retrieval connected component aggregated descriptors

在线采编平台

在线出版

年度会议

下载中心

年度信息