图像场景分类中视觉词包模型方法综述

赵理君; 唐娉; 霍连志; 郑柯

doi:10.11834/jig.20140301

综述 | 浏览量 : 0 下载量: 510 CSCD: 0

PDF
导出
分享
收藏
专辑

图像场景分类中视觉词包模型方法综述
Review of the bag-of-visual-words models in image scene classification
2014年19卷第3期页码：333-343
网络出版：2014-03-03，

纸质出版：2014
DOI： 10.11834/jig.20140301
稿件说明：

移动端阅览

赵理君, 唐娉, 霍连志, 郑柯. 图像场景分类中视觉词包模型方法综述[J]. 中国图象图形学报, 2014,19(3):333-343. DOI： 10.11834/jig.20140301.

Zhao Lijun, Tang Ping, Huo Lianzhi, Zheng Ke. Review of the bag-of-visual-words models in image scene classification[J]. Journal of Image and Graphics, 2014, 19(3): 333-343. DOI： 10.11834/jig.20140301.

摘要

关于图像场景分类中视觉词包模型方法的综述性文章在国内外杂志上还少有报导，为了使国内外同行对图像场景分类中的视觉词包模型方法有一个较为全面的了解，对这些研究工作进行了系统总结。在参考国内外大量文献的基础上，对现有图像场景分类（主要指针对单一图像场景的分类）中出现的各种视觉词包模型方法从低层特征的选择与局部图像块特征的生成、视觉词典的构建、视觉词包特征的直方图表示、视觉单词优化等多方面加以总结和比较。回顾了视觉词包模型的发展历程，对目前存在的多种视觉词包模型进行了归纳，比较常见方法各自的优缺点，总结了视觉词包模型性能评价方法，并对目前常用的标准场景库进行汇总，同时给出了各自所达到的最高精度。图像场景分类中视觉词包模型方法的研究作为计算机视觉领域方兴未艾的热点研究领域，在国内外研究中取得了不少进展，在计算机视觉领域的研究也不再局限于直接应用模型描述图像内容，而是更多地考虑图像与文本的差异。虽然视觉词包模型在图像场景分类的应用中还存在很多亟需解决的问题，但是这丝毫不能掩盖其研究的重要意义。

Abstract

With the rapid development of computer multi-media technique

database technique and computer network technique

there have been more and more images to classify and label. Instead of using traditional manual mode

it has been a hot research field to use computer-aided automatic image-scene classification techniques. Among numerous image scene classification methods

the bag-of-visual-words(BOVW)model has become a widely adopted one

which

as a middle level feature

can narrow the gap between low-level visual features and high-level semantic features. However

reviews about BOVW model in image scene classification are rarely seen on journals in China and abroad. Therefore

in order to give a comprehensive understanding of this method to researchers in this field

this paper systematically summarizes these studies. Based on numerous references about the BOVW model in image scene classification during almost the past ten years

we divide the general process of development of the BOVW into five stages

that is

the stage of direct application of early bag-of-words model in image field

the stage of studying latent semantic information in the BOVW model

the stage of studying spatial layout or structure information in the BOVW model

the stage of studying context information in the BOVW model

and the stage of optimizing visual word semantics and introducing new methods into the BOVW model. Furthermore

we sum up and compare various existing BOVW models in image scene classification in terms of local feature selection

feature generation of local image patches

visual vocabulary construction

histogram representation of bag of visual words feature

optimization of visual words

and so on. The development history of the BOVW and the research status of the BOVW based image scene classification are reviewed

which gives a clear trail of the development of the BOVW model;the numerous existing the BOVW models are categorized according to their working mechanism;the advantages and disadvantages of commonly used methods are compared;the performance evaluation method for the BOVW model is described and the commonly used standard scene databases are collected

with their best classification accuracies given separately. As a hot research field that is currently rising

studies of the BOVW methods in image scene classification have produced quite a few research progress. The research in computer vision field has no longer been limited to directly applying original the BOVW model to describe image content

and more and more differences between images and texts are considered. The urgent problems to be solved are as follows:the performance of the BOVW will be greatly influenced when the bag of visual words are applied to the samples that are quite different from the training ones

while training new bag of visual words based on new training samples is very time and labor consuming;there is still no theoretical guide for determining the size of visual vocabulary;the relationship between visual words and semantics is still not fully exploited;the application of the BOVW in special fields

such as high-resolution remote sensing land-use scene classification

is far from being satisfactory. Besides

based on these problems

there may be some interesting research directions for example: constructing universal self-adaptive bag of visual words for different sample sets

automatically selecting optimal vocabulary size

adding more spatial layout and context information to the BOVW and exploring latent semantic information in visual words

studying image visual grammars for image understanding

studying scene classification problems in images of special fields

such as high resolution remote sensing images

and investigating new well-characterized low level feature extraction algorithms to construct high level bag of visual words. To conclude

although there are still a number of urgent problems to be solved in the application of the BOVW model based image scene classification

the important meanings of the studies of the BOVW model cannot be covered up.

关键词

Keywords

references

文章被引用时，请邮件提醒。

提交

用于遥感场景分类的全局—局部特征耦合网络

融合场景信息的图像美学属性评价

HSRS-SC: 面向遥感场景分类的高光谱图像数据集

一种基于类主题空间的图像场景分类方法