赵理君,唐娉,霍连志,郑柯(中国科学院遥感与数字地球研究所, 北京 100101;中国科学院大学, 北京 100049)
目的 关于图像场景分类中视觉词包模型方法的综述性文章在国内外杂志上还少有报导，为了使国内外同行对图像场景分类中的视觉词包模型方法有一个较为全面的了解，对这些研究工作进行了系统总结。方法 在参考国内外大量文献的基础上，对现有图像场景分类（主要指针对单一图像场景的分类）中出现的各种视觉词包模型方法从低层特征的选择与局部图像块特征的生成、视觉词典的构建、视觉词包特征的直方图表示、视觉单词优化等多方面加以总结和比较。结果 回顾了视觉词包模型的发展历程，对目前存在的多种视觉词包模型进行了归纳，比较常见方法各自的优缺点，总结了视觉词包模型性能评价方法，并对目前常用的标准场景库进行汇总，同时给出了各自所达到的最高精度。结论 图像场景分类中视觉词包模型方法的研究作为计算机视觉领域方兴未艾的热点研究领域，在国内外研究中取得了不少进展，在计算机视觉领域的研究也不再局限于直接应用模型描述图像内容，而是更多地考虑图像与文本的差异。虽然视觉词包模型在图像场景分类的应用中还存在很多亟需解决的问题，但是这丝毫不能掩盖其研究的重要意义。
Review of the bag-of-visual-words models in image scene classification
Zhao Lijun,Tang Ping,Huo Lianzhi,Zheng Ke(Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;University of Chinese Academy of Sciences, Beijing 100049, China)
Objective With the rapid development of computer multi-media technique,database technique and computer network technique,there have been more and more images to classify and label. Instead of using traditional manual mode,it has been a hot research field to use computer-aided automatic image-scene classification techniques. Among numerous image scene classification methods,the bag-of-visual-words(BOVW)model has become a widely adopted one,which,as a middle level feature,can narrow the gap between low-level visual features and high-level semantic features. However,reviews about BOVW model in image scene classification are rarely seen on journals in China and abroad. Therefore, in order to give a comprehensive understanding of this method to researchers in this field,this paper systematically summarizes these studies. Method Based on numerous references about the BOVW model in image scene classification during almost the past ten years, we divide the general process of development of the BOVW into five stages,that is,the stage of direct application of early bag-of-words model in image field,the stage of studying latent semantic information in the BOVW model,the stage of studying spatial layout or structure information in the BOVW model,the stage of studying context information in the BOVW model,and the stage of optimizing visual word semantics and introducing new methods into the BOVW model. Furthermore, we sum up and compare various existing BOVW models in image scene classification in terms of local feature selection,feature generation of local image patches,visual vocabulary construction,histogram representation of bag of visual words feature,optimization of visual words,and so on. Result The development history of the BOVW and the research status of the BOVW based image scene classification are reviewed,which gives a clear trail of the development of the BOVW model;the numerous existing the BOVW models are categorized according to their working mechanism;the advantages and disadvantages of commonly used methods are compared;the performance evaluation method for the BOVW model is described and the commonly used standard scene databases are collected,with their best classification accuracies given separately. Conclusion As a hot research field that is currently rising,studies of the BOVW methods in image scene classification have produced quite a few research progress. The research in computer vision field has no longer been limited to directly applying original the BOVW model to describe image content,and more and more differences between images and texts are considered. The urgent problems to be solved are as follows:the performance of the BOVW will be greatly influenced when the bag of visual words are applied to the samples that are quite different from the training ones,while training new bag of visual words based on new training samples is very time and labor consuming;there is still no theoretical guide for determining the size of visual vocabulary;the relationship between visual words and semantics is still not fully exploited;the application of the BOVW in special fields,such as high-resolution remote sensing land-use scene classification,is far from being satisfactory. Besides,based on these problems,there may be some interesting research directions for example: constructing universal self-adaptive bag of visual words for different sample sets,automatically selecting optimal vocabulary size,adding more spatial layout and context information to the BOVW and exploring latent semantic information in visual words,studying image visual grammars for image understanding,studying scene classification problems in images of special fields,such as high resolution remote sensing images,and investigating new well-characterized low level feature extraction algorithms to construct high level bag of visual words. To conclude,although there are still a number of urgent problems to be solved in the application of the BOVW model based image scene classification,the important meanings of the studies of the BOVW model cannot be covered up.