非负局部Laplacian稀疏编码和上下文信息的图像分类
Image classification with non-negative and local Laplacian sparse coding and context information
- 2017年22卷第6期 页码:731-740
网络出版:2017-06-08,
纸质出版:2017
DOI: 10.11834/jig.160583
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2017-06-08,
纸质出版:2017
移动端阅览
稀疏编码是图像特征表示的有效方法,但不足之处是编码不稳定,即相似的特征可能会被编码成不同的码字。且在现有的图像分类方法中,图像特征表示和图像分类是相互独立的过程,提取的图像特征并没有有效保留图像特征之间的语义联系。针对这两个问题,提出非负局部Laplacian稀疏编码和上下文信息的图像分类算法。 图像特征表示包含两个阶段,第一阶段利用非负局部的Laplacian稀疏编码方法对局部特征进行编码,并通过最大值融合得到原始的图像表示,从而有效改善编码的不稳定性;第二阶段在所有图像特征表示中随机选择部分图像生成基于上下文信息的联合空间,并通过分类器将图像映射到这些空间中,将映射后的特征表示作为最终的图像表示,使得图像特征之间的上下文信息更多地被保留。 在4个公共的图像数据集Corel-10、Scene-15、Caltech-101以及Caltech-256上进行仿真实验,并和目前与稀疏编码相关的算法进行实验对比,分类准确率提高了约3%~18%。 本文提出的非负局部Laplacian稀疏编码和上下文信息的图像分类算法,改善了编码的不稳定性并保留了特征之间的相互依赖性。实验结果表明,该算法与现有算法相比的分类效果更好。另外,该方法也适用于图像分割、标注以及检索等计算机视觉领域的应用。
Image classification is an important issue in computer vision and a hot research topic. The traditional sparse coding (SC) method is effective for image representation and has achieved good results in image classification. However
the SC method has two drawbacks. First
the method ignores the local relationship between image features
thus losing local information. Second
because the combinatorial optimization problems of SC involve addition and subtraction
the subtraction operation might cause features to be cancelled. These two drawbacks result in coding instability
which means similar features are encoded into different codes. Meanwhile
representation and classification are usually independent of each other during image classification
so the features of image semantic relations between image features are not well preserved. In other words
image representation is not task-driven and may be unable to perform the final classification task well. Furthermore
the local feature quantization method disregards the underlying semantic information of the local region
which influences the classification performance. To deal with such problems
a two-stage method of image classification with non-negative and local Laplacian SC and context information (NLLSC-CI) is proposed in this study. NLLSC-CI aims to improve the efficiency of image representation and the accuracy of image classification. The representation of an image involves two stages. In the first stage
non-negative and locality-constrained Laplacian SC (NLLSC) is introduced to the encoding of the local features of the image to overcome coding instability. First
non-negativity is introduced in Laplacian SC (LSC) by non-negative matrix factorization (NMF) to avoid offsetting between features
which is applied to constrain the negativity of the codebook and code coefficient. Second
bases that are near the local features are selected to constrain the codes because locality is more important than sparseness; thus
the local information between features is preserved. Then
original image representation is attained by using spatial pyramid division (SPD) and max pooling (MP) in the pooling step. In the second stage
several original image representations are selected and connected to generate joint context spaces. All images are then mapped into these spaces by the SVM classifier. The mapped features in these joint context spaces are regarded as the final representations of images. In this manner
image representation and classification tasks are considered jointly to achieve improved performance. This two-stage representation method preserves the context relationship between the features of images to a certain extent. To validate the performance of the proposed method
experiments on four public image datasets
namely
Corel-10
Scene-15
Caltech-101
and Calthch-256
are conducted.Results suggest that the classification accuracy of NLLSC-CI increases by about 3% to 18% compared with that of state-of-the-art SC algorithms. The accuracy rate of NLLSC-CI increases by 3% to 12% in the Corel-10 dataset. For the Scene-15 dataset
classification accuracy increases by 4% to 15%. The classification performance in the Caltech-101 and Caltech-256 datasets increases by 3% to 14% and 4% to 18%
respectively. These findings show that the classification accuracy of the proposed method is better than that of state-of-art SC algorithms in the four benchmark image datasets. In addition
Tables 2 to 5 show that classification accuracy is the lowest in the Calthch-256 dataset. The reason could be the size of this dataset. The dataset contains too many categories and images
and the difference between and within classes is too large. As a result
the corresponding category of images cannot be identified correctly during classification. Thus
the accuracy of the proposed method is relatively low for datasets with large numbers and multiple classes of images. In general
however
NLLSC-CI demonstrates improved classification accuracy. This study proposes an algorithm called NLLSC-CI to solve coding instability and the independence between image representation and classification. The proposed method overcomes coding instability and preserves the mutual context dependency between the local features of images. Specifically
due to the incorporation of non-negativity
locality
and graph Laplacian regularization
this new method improves the consistency of sparse codes and their mutual dependency
thus preserving more features and local information between them and making the local features more discriminating. The new optimization problem in NLLSC-CI is solved by defining a diagonal matrix to obtain the analytical solution. Furthermore
the consistency of sparse codes is maintained by introducing a Laplacian matrix. This two-stage method of image representation jointly considers two independent tasks:image representation and classification. The construction of a joint space based on context information preserves the context between image features
and the image representation obtained by context information and image classification are mutually dependent. Therefore
NLLSC-CI can model images adequately and represent the original images through mutual dependency and context information among features
thus improving the classification accuracy. Several benchmark image datasets are studied
and the final experimental results show that the proposed algorithm presents better performance than other previous algorithms. In addition
this novel method can be applied to other computer vision issues
such as image segmentation
image annotation
and image retrieval. Meanwhile
extensive image data need to be maximized because the experimental image data used in this study are from several standard image datasets. Moreover
although the context information of this method can effectively convey the information expressed by images
it cannot reflect the complete method of thinking of humans. Therefore
other methods and models of image semantic content that are closer to humans' perception and thinking need to be investigated.
相关作者
相关机构
京公网安备11010802024621