
发布时间: 2017-06-16
DOI: 10.11834/jig.160583
2017 | Volume 22 | Number 6




expand article info 万源, 史莹, 陈晓丽
武汉理工大学理学院, 武汉 430070


目的 稀疏编码是图像特征表示的有效方法,但不足之处是编码不稳定,即相似的特征可能会被编码成不同的码字。且在现有的图像分类方法中,图像特征表示和图像分类是相互独立的过程,提取的图像特征并没有有效保留图像特征之间的语义联系。针对这两个问题,提出非负局部Laplacian稀疏编码和上下文信息的图像分类算法。 方法 图像特征表示包含两个阶段,第一阶段利用非负局部的Laplacian稀疏编码方法对局部特征进行编码,并通过最大值融合得到原始的图像表示,从而有效改善编码的不稳定性;第二阶段在所有图像特征表示中随机选择部分图像生成基于上下文信息的联合空间,并通过分类器将图像映射到这些空间中,将映射后的特征表示作为最终的图像表示,使得图像特征之间的上下文信息更多地被保留。 结果 在4个公共的图像数据集Corel-10、Scene-15、Caltech-101以及Caltech-256上进行仿真实验,并和目前与稀疏编码相关的算法进行实验对比,分类准确率提高了约3%~18%。 结论 本文提出的非负局部Laplacian稀疏编码和上下文信息的图像分类算法,改善了编码的不稳定性并保留了特征之间的相互依赖性。实验结果表明,该算法与现有算法相比的分类效果更好。另外,该方法也适用于图像分割、标注以及检索等计算机视觉领域的应用。


稀疏编码; 非负局部Laplacian稀疏编码; 最大值融合; 图像表示; 上下文信息; 联合空间

Image classification with non-negative and local Laplacian sparse coding and context information
expand article info Wan Yuan, Shi Ying, Chen Xiaoli
School of Science, Wuhan University of Technology, Wuhan 430070, China
Supported by: National Natural Science Foundation of China(91324201, 81271513)


Objective Image classification is an important issue in computer vision and a hot research topic. The traditional sparse coding (SC) method is effective for image representation and has achieved good results in image classification. However, the SC method has two drawbacks. First, the method ignores the local relationship between image features, thus losing local information. Second, because the combinatorial optimization problems of SC involve addition and subtraction, the subtraction operation might cause features to be cancelled. These two drawbacks result in coding instability, which means similar features are encoded into different codes. Meanwhile, representation and classification are usually independent of each other during image classification, so the features of image semantic relations between image features are not well preserved. In other words, image representation is not task-driven and may be unable to perform the final classification task well. Furthermore, the local feature quantization method disregards the underlying semantic information of the local region, which influences the classification performance. To deal with such problems, a two-stage method of image classification with non-negative and local Laplacian SC and context information (NLLSC-CI) is proposed in this study. NLLSC-CI aims to improve the efficiency of image representation and the accuracy of image classification. Method The representation of an image involves two stages. In the first stage, non-negative and locality-constrained Laplacian SC (NLLSC) is introduced to the encoding of the local features of the image to overcome coding instability. First, non-negativity is introduced in Laplacian SC (LSC) by non-negative matrix factorization (NMF) to avoid offsetting between features, which is applied to constrain the negativity of the codebook and code coefficient. Second, bases that are near the local features are selected to constrain the codes because locality is more important than sparseness; thus, the local information between features is preserved. Then, original image representation is attained by using spatial pyramid division (SPD) and max pooling (MP) in the pooling step. In the second stage, several original image representations are selected and connected to generate joint context spaces. All images are then mapped into these spaces by the SVM classifier. The mapped features in these joint context spaces are regarded as the final representations of images. In this manner, image representation and classification tasks are considered jointly to achieve improved performance. This two-stage representation method preserves the context relationship between the features of images to a certain extent. Results To validate the performance of the proposed method, experiments on four public image datasets, namely, Corel-10, Scene-15, Caltech-101, and Calthch-256, are conducted.Results suggest that the classification accuracy of NLLSC-CI increases by about 3% to 18% compared with that of state-of-the-art SC algorithms. The accuracy rate of NLLSC-CI increases by 3% to 12% in the Corel-10 dataset. For the Scene-15 dataset, classification accuracy increases by 4% to 15%. The classification performance in the Caltech-101 and Caltech-256 datasets increases by 3% to 14% and 4% to 18%, respectively. These findings show that the classification accuracy of the proposed method is better than that of state-of-art SC algorithms in the four benchmark image datasets. In addition, Tables 2 to 5 show that classification accuracy is the lowest in the Calthch-256 dataset. The reason could be the size of this dataset. The dataset contains too many categories and images, and the difference between and within classes is too large. As a result, the corresponding category of images cannot be identified correctly during classification. Thus, the accuracy of the proposed method is relatively low for datasets with large numbers and multiple classes of images. In general, however, NLLSC-CI demonstrates improved classification accuracy. Conclusion This study proposes an algorithm called NLLSC-CI to solve coding instability and the independence between image representation and classification. The proposed method overcomes coding instability and preserves the mutual context dependency between the local features of images. Specifically, due to the incorporation of non-negativity, locality, and graph Laplacian regularization, this new method improves the consistency of sparse codes and their mutual dependency, thus preserving more features and local information between them and making the local features more discriminating. The new optimization problem in NLLSC-CI is solved by defining a diagonal matrix to obtain the analytical solution. Furthermore, the consistency of sparse codes is maintained by introducing a Laplacian matrix. This two-stage method of image representation jointly considers two independent tasks:image representation and classification. The construction of a joint space based on context information preserves the context between image features, and the image representation obtained by context information and image classification are mutually dependent. Therefore, NLLSC-CI can model images adequately and represent the original images through mutual dependency and context information among features, thus improving the classification accuracy. Several benchmark image datasets are studied, and the final experimental results show that the proposed algorithm presents better performance than other previous algorithms. In addition, this novel method can be applied to other computer vision issues, such as image segmentation, image annotation, and image retrieval. Meanwhile, extensive image data need to be maximized because the experimental image data used in this study are from several standard image datasets. Moreover, although the context information of this method can effectively convey the information expressed by images, it cannot reflect the complete method of thinking of humans. Therefore, other methods and models of image semantic content that are closer to humans' perception and thinking need to be investigated.

Key words

sparse coding (SC); non-negativity and locality constrained Laplacian sparse coding (NLLSC); max pooling (MP); image representation; context information; joint spaces

0 引言




1 相关工作

图像分类中最经典的编码方法就是词袋(BoW)模型[3],将其与SIFT特征结合,能较好地表征图像的特性。考虑到图像局部特征之间的空间信息,空间金字塔匹配(SPM)模型[4]被应用在图像分类中。但BoW和SPM模型中的量化方法很容易造成量化误差,Yang等人[5]结合SPM模型,提出ScSPM的图像分类算法,其核心问题是学习$M$空间中的超完备($M \gg D$)字典$\mathit{\boldsymbol{U}}$,并选取其中尽可能少的基向量来表示原始的特征向量,得到的优化问题

$\left\{ {\begin{array}{*{20}{l}} {\mathop {{\rm{min}}}\limits_{\mathit{\boldsymbol{U}},\mathit{\boldsymbol{V}}} \sum\limits_{i = 1}^N {\left( {\left\| {{\mathit{\boldsymbol{x}}_i} - \mathit{\boldsymbol{U}}{\mathit{\boldsymbol{v}}_i}} \right\|_2^2 + \mathit{\boldsymbol{\lambda }}{{\left\| {{\mathit{\boldsymbol{v}}_i}} \right\|}_1}} \right)} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\left\| {{\mathit{\boldsymbol{u}}_j}} \right\|_2^2 \le 1,\forall j = 1,2, \cdots ,\mathit{\boldsymbol{M}}} \end{array}} \right.$ (1)

式中,$\mathit{\boldsymbol{X}} \in {{\bf{R}}^{D \times N}}$为特征矩阵,$\mathit{\boldsymbol{U}} \in {{\bf{R}}^{D \times M}}$为字典,$\mathit{\boldsymbol{V}} \in {{\bf{R}}^{M \times N}}$为相应的稀疏编码。而传统的SC在编码过程中极其不稳定,相似的局部特征可能会被编码成不同的码字,Gao等人[6]引入Laplacian矩阵来保持相似局部特征的编码的一致性,改善编码的不稳定性并使编码过程不再独立。相应的优化问题为

$\left\{ {\begin{array}{*{20}{l}} {\mathop {{\rm{min}}}\limits_{\mathit{\boldsymbol{U}},\mathit{\boldsymbol{V}}} \left\| {\mathit{\boldsymbol{X}} - \mathit{\boldsymbol{UV}}} \right\|_{\rm{F}}^2 + \mathit{\boldsymbol{\lambda }}\sum\limits_i {{{\left\| {{\mathit{\boldsymbol{v}}_i}} \right\|}_1}} + \beta {\rm{tr}}\left( {\mathit{\boldsymbol{VL}}{\mathit{\boldsymbol{V}}^{\rm{T}}}} \right)}\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\left\| {{\mathit{\boldsymbol{u}}_j}} \right\|_2^2 \le 1,\forall j = 1,2, \cdots ,\mathit{\boldsymbol{M}}} \end{array}} \right.$ (2)

式中,目标函数中的第3项${\beta {\rm{tr}}\left( {\mathit{\boldsymbol{VL}}{\mathit{\boldsymbol{V}}^{\rm{T}}}} \right)}$用来提取图像的空间几何信息,减少量化误差;$\mathit{\boldsymbol{L}}$为Laplacian矩阵。然而,SC和LSC的优化问题中都涉及加法和减法的交互运算,减法的使用有可能会使特征之间相互抵消。因此,Lee等人[7]通过NMF来学习物体的部分表示,并提出一种解决NMF问题的算法。Han等人[8]利用NMF和Laplacian算子,提出基于非负性和依赖性约束的SC方法,使特征和特征之间的相似性都得以保留。然而,这些方法都忽视了特征之间的局部性信息。因此,Wang等人[9]引入局部性提出LLC的图像分类方法。虽然LLC的亮点在于其利用了$K$近邻编码,但是随着$K$值的增大,编码的某些正值元素和负值元素的差值绝对值会随之增大,因此刘培娜等人[10]在LLC中引入非负性约束,提出非负LLC图像分类方法。考虑到在SC的过程中,相似的特征有可能会被编码成不同的码字,Min等人[11]提出Laplacian正则化的LLC图像分类算法,保持相似特征的编码一致性。


2 NLLSC和上下文信息的图像分类

以上这些编码方法都能在一定程度上减少重构误差,但是它们存在两个缺点:1) 特征之间缺乏非负性和局部性,从而导致图像表示中特征以及它们之间信息的缺失。2) 图像特征表示与分类器的训练过程是相对独立的,且局部特征量化方法没有考虑到局部区域的潜在上下文语义信息,这会阻碍图像的特征在信息表达和传递中的效率。为了克服第1个缺点,本文在LSC方法中添加非负性和局部性,构建NLLSC方法,有效利用特征之间的局部信息和依赖关系,改善编码的不稳定性并保持相似编码的一致性。对于第2个缺点,在NLLSC的基础上综合考虑图像特征表示和分类两个过程,提出NLLSC-CI的图像分类方法。图像特征表示过程分为两个阶段,第1阶段用NLLSC方法对图像的局部特征进行编码,并利用MP得到原始的图像表示。第2阶段从所有图像表示中随机选择部分图像生成相应的基于上下文信息的联合空间。再利用训练好的分类器将所有训练图像投影到这些空间中。两阶段的图像表示方法能够充分有效地融合图像的视觉特征信息和上下文语义信息,更加恰当完整地表示图像,从而提高图像的分类效率。最后,利用支持向量机(SVM)分类器[17]对图像进行分类。

该方法可以概括成3个部分,其整体框架如图 1所示。

图 1 NLLSC-CI方法的整体框架
Fig. 1 The overall framework of NLLSC-CI method

2.1 基于NLLSC方法的原始图像表示


2.1.1 学习局部约束的非负字典和编码


$\left\{ {\begin{array}{*{20}{l}} {\mathop {{\rm{min}}}\limits_{\mathit{\boldsymbol{U}},\mathit{\boldsymbol{V}}} \sum\limits_{i = 1}^N {\left( {\left\| {{\mathit{\boldsymbol{x}}_i} - \mathit{\boldsymbol{U}}{\mathit{\boldsymbol{v}}_i}} \right\|_2^2 + \mathit{\boldsymbol{\lambda }}\left\| {{\mathit{\boldsymbol{d}}_i} \odot {\mathit{\boldsymbol{v}}_i}} \right\|_2^2} \right) + \beta {\rm{tr}}\left( {\mathit{\boldsymbol{VL}}{\mathit{\boldsymbol{V}}^{\rm{T}}}} \right)} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\left\| {{\mathit{\boldsymbol{u}}_j}} \right\|_2^2 \le 1,\mathit{\boldsymbol{U}} \ge 0,\mathit{\boldsymbol{V}} \ge 0,\forall j} \end{array}} \right.$ (3)

式中,$\mathit{\boldsymbol{X}} = \left[ {{\mathit{\boldsymbol{x}}_1},{\mathit{\boldsymbol{x}}_2}, \cdots ,{\mathit{\boldsymbol{x}}_N}} \right] \in {{\bf{R}}^{D \times N}}$为非负的特征矩阵,$\mathit{\boldsymbol{U}}$$\mathit{\boldsymbol{V}}$为相应的非负字典和非负编码;$\lambda ,\beta $都为给定的常数;⊙代表两个列向量逐元素相乘;${\mathit{\boldsymbol{d}}_i} \in {{\bf{R}}^M}$是一个局部适应器,定义为

$\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{d}}_i} = {\rm{exp}}\left( {dist\left( {{\mathit{\boldsymbol{x}}_i},\mathit{\boldsymbol{U}}} \right)/\sigma } \right)}\\ {dist\left( {{\mathit{\boldsymbol{x}}_i},\mathit{\boldsymbol{U}}} \right) = }\\ {{{\left[ {dist\left( {{\mathit{\boldsymbol{x}}_i},{\mathit{\boldsymbol{u}}_1}} \right), \cdots ,dist\left( {{\mathit{\boldsymbol{x}}_i},{\mathit{\boldsymbol{u}}_M}} \right)} \right]}^{\rm{T}}}} \end{array}$ (4)

式中,$dist\left( {{\mathit{\boldsymbol{x}}_i},{\mathit{\boldsymbol{b}}_j}} \right)$${{\mathit{\boldsymbol{x}}_i}}$${{\mathit{\boldsymbol{b}}_j}}$之间的欧氏距离,$\sigma $是一个用来调整权重衰减的参数。


$\left\{ {\begin{array}{*{20}{l}} {\mathop {\min }\limits_\mathit{\boldsymbol{V}} \left\| {\mathit{\boldsymbol{X}} - \mathit{\boldsymbol{UV}}} \right\|_{\rm{F}}^2 + \lambda \left\| {\mathit{\boldsymbol{d}} \odot \mathit{\boldsymbol{V}}} \right\|_{\rm{F}}^2 + \beta {\rm{tr}}\left( {\mathit{\boldsymbol{VL}}{\mathit{\boldsymbol{V}}^{\rm{T}}}} \right)}\\ {{\rm{s}}{\rm{.t}}{\rm{. }}\mathit{\boldsymbol{V}} \ge 0} \end{array}} \right.$ (5)

式中,$\mathit{\boldsymbol{d}} = \left[ {{\mathit{\boldsymbol{d}}_1},{\mathit{\boldsymbol{d}}_2}, \cdots ,{\mathit{\boldsymbol{d}}_N}} \right] \in {{\bf{R}}^{M \times N}}$


$\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{v}}_{ij}} = {\mathit{\boldsymbol{v}}_{ij}}\frac{{{{\left( {{\mathit{\boldsymbol{U}}^{\rm{T}}}\mathit{\boldsymbol{X}} + \beta \mathit{\boldsymbol{VW}}} \right)}_{ij}}}}{{{{\left( {{\mathit{\boldsymbol{U}}^{\rm{T}}}\mathit{\boldsymbol{UV}} + \beta \mathit{\boldsymbol{VD}} + \lambda {\rm{diag}}\left( {{\mathit{\boldsymbol{b}}_i}} \right)\mathit{\boldsymbol{V}}} \right)}_{ij}}}}}\\ {\forall i = 1,2, \cdots ,M,j = 1,2, \cdots ,N} \end{array}$ (6)

式中,${\mathit{\boldsymbol{b}}_i} = \left[ {d_{i1}^2,d_{i2}^2, \cdots ,d_{iN}^2} \right]$$N$维行向量,且$d_{ij}^2 = \exp \left( {2{\rm{dist}}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{u}}_i}} \right)/\sigma } \right)$


$\left\{ {\begin{array}{*{20}{l}} {\mathop {\min }\limits_\mathit{\boldsymbol{U}} \left\| {\mathit{\boldsymbol{X}} - \mathit{\boldsymbol{UV}}} \right\|_{\rm{F}}^2}\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\;\mathit{\boldsymbol{U}} \ge 0,\left\| {{\mathit{\boldsymbol{u}}_j}} \right\|_2^2 \le 1,\forall j = 1,2, \cdots ,M} \end{array}} \right.$ (7)

将式(7) 转化为Lagrange对偶问题,并利用共轭梯度法求对偶矩阵$\mathit{\boldsymbol{ \boldsymbol{\varLambda} }}$,再利用式(8) 求出字典$\mathit{\boldsymbol{U}}$,即

$\mathit{\boldsymbol{U}} = \left( {\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{V}}^{\rm{T}}}} \right){\left( {\mathit{\boldsymbol{V}}{\mathit{\boldsymbol{V}}^{\rm{T}}}\mathit{\boldsymbol{ + \boldsymbol{\varLambda} }}} \right)^{ - 1}}$ (8)

2.1.2 基于新特征的NLLSC


$\left\{ {\begin{array}{*{20}{l}} {\mathop {\min }\limits_\mathit{\boldsymbol{S}} \left\| {\mathit{\boldsymbol{Y}} - \mathit{\boldsymbol{US}}} \right\|_{\rm{F}}^2 + \lambda \left\| {\mathit{\boldsymbol{d}} \odot \mathit{\boldsymbol{S}}} \right\|_{\rm{F}}^2 + \frac{\beta }{2}\sum\limits_{ji} {\left\| {{\mathit{\boldsymbol{s}}_j} - {\mathit{\boldsymbol{v}}_i}} \right\|_{\rm{2}}^2{w_{ji}}} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\quad {s_{ij}} \ge 0,\forall i,j} \end{array}} \right.$ (9)



${s_{ij}} = {s_{ij}}\frac{{{{\left( {{\mathit{\boldsymbol{U}}^{\rm{T}}}\mathit{\boldsymbol{Y}} + \beta \mathit{\boldsymbol{V}}{\mathit{\boldsymbol{W}}^{\rm{T}}}} \right)}_{ij}}}}{{{{\left( {{\mathit{\boldsymbol{U}}^T}\mathit{\boldsymbol{US + }}\frac{1}{2}\beta \mathit{\boldsymbol{SA}} + \lambda {\rm{diag}}\left( {{\mathit{\boldsymbol{b}}_i}} \right)\mathit{\boldsymbol{S}}} \right)}_{ij}}}}$ (10)

式中,$\mathit{\boldsymbol{A}}$为对角权重矩阵,且其对角线元素为${a_{jj}} = \sum\limits_i {{w_{ji}}} $

由2.1.1节的相关介绍,可得出字典和模板特征的稀疏编码,再利用式(10) 可得基于新特征的NLLSC。

2.1.3 利用空间金字塔划分进行MP

在特征融合阶段,本文依照文献[5-6, 8, 16],采用最大值融合方法。具体为

$\begin{array}{*{20}{c}} {{r_l} = \max \left\{ {\left| {{s_{1l}}} \right|,\left| {{s_{2l}}} \right|, \cdots ,\left| {{s_{Nl}}} \right|} \right\}}\\ {l = 1,2, \cdots ,M} \end{array}$ (11)

式中,${{s_{Nl}}}$是稀疏编码${{\mathit{\boldsymbol{s}}_N}}$的第$l$个元素;而${{r_l}}$是向量$\mathit{\boldsymbol{r}}$的第$l$个元素。单个空间金字塔区域的图像可以用$M$维列向量$\mathit{\boldsymbol{r}}$来表示,即$\mathit{\boldsymbol{r}} = {\left[ {{r_1},{r_2}, \cdots ,{r_M}} \right]^{\rm{T}}}$


2.2 基于上下文信息的联合图像表示和分类


2.2.1 构建基于上下文信息的联合空间


假设通过NLLSC方法获得$Q$个训练图像的原始图像表示,用${\mathit{\boldsymbol{r}}^1},{\mathit{\boldsymbol{r}}^2}, \cdots ,{\mathit{\boldsymbol{r}}^Q}$表示,共$C$类,其相应的类别标签为${\mathit{\boldsymbol{y}}^1},{\mathit{\boldsymbol{y}}^2}, \cdots ,{\mathit{\boldsymbol{y}}^Q}$。从中随机选择$L\left( {L \le Q} \right)$个图像构建基于上下文信息的联合空间,并且重复选择$T$次。相应的结果用

$\left\{ {\left( {{\mathit{\boldsymbol{r}}^{1,1}},{\mathit{\boldsymbol{y}}^{1,1}}} \right), \cdots ,\left( {{\mathit{\boldsymbol{r}}^{L,1}},{\mathit{\boldsymbol{y}}^{L,1}}} \right)} \right\}, \cdots ,\left\{ {\left( {{\mathit{\boldsymbol{r}}^{1,T}},{\mathit{\boldsymbol{y}}^{1,T}}} \right), \cdots ,\left( {{\mathit{\boldsymbol{r}}^{L,T}},{\mathit{\boldsymbol{y}}^{L,T}}} \right)} \right\}$表示。对于第$t$次随机选择的图像$\left\{ {\left( {{\mathit{\boldsymbol{r}}^{1,t}},{\mathit{\boldsymbol{y}}^{1,t}}} \right), \cdots ,\left( {{\mathit{\boldsymbol{r}}^{L,t}},{\mathit{\boldsymbol{y}}^{L,t}}} \right)} \right\}$,利用SVM分类器来构建相应的联合空间,即

$\begin{array}{l} \mathit{\boldsymbol{f}}_c^t\left( {{\mathit{\boldsymbol{r}}^{l,t}}} \right) = \mathit{\boldsymbol{\bar y}}_c^{l,t} = \mathit{\boldsymbol{w}}_c^t{\mathit{\boldsymbol{r}}^{l,t}} + \mathit{\boldsymbol{b}}_c^t\\ l = 1,2, \cdots ,L,c = 1,2, \cdots ,C \end{array}$ (12)

式中,$\mathit{\boldsymbol{w}}_c^t$$\mathit{\boldsymbol{b}}_c^t$分别为权重向量和偏置。再通过其铰链损失(Hinge loss)函数[19]$l\left( {\mathit{\boldsymbol{\bar y}}_c^{l,t},{\mathit{\boldsymbol{y}}^{l,t}}} \right) = \max \left( {0,1 - \mathit{\boldsymbol{\bar y}}_c^{l,t} \times {\mathit{\boldsymbol{y}}^{l,t}}} \right)$来构造相应的优化问题为

$\mathop {\min }\limits_{w_c^t} {\left\| {\mathit{\boldsymbol{w}}_c^t} \right\|^2} + \alpha \sum\limits_{l = 1}^L {\left( {\mathit{\boldsymbol{\bar y}}_c^{l,t},{\mathit{\boldsymbol{y}}^{l,t}}} \right)} $ (13)

求解式(13) 可得相应的$\mathit{\boldsymbol{w}}_c^t$$\mathit{\boldsymbol{b}}_c^t$


2.2.2 将所有训练图像投影到联合空间


$\begin{array}{*{20}{c}} {\mathit{\boldsymbol{r}}_{t,k}^{{\rm{js}}} = \left( {\mathit{\boldsymbol{f}}_1^t\left( {{\mathit{\boldsymbol{r}}^q}} \right),\mathit{\boldsymbol{f}}_2^t\left( {{\mathit{\boldsymbol{r}}^q}} \right), \cdots ,\mathit{\boldsymbol{f}}_C^t\left( {{\mathit{\boldsymbol{r}}^q}} \right)} \right)}\\ {t = 1,2, \cdots ,T;q = 1,2, \cdots ,Q} \end{array}$ (14)

2.2.3 连接所有联合空间形成最终的图像表示


$\mathit{\boldsymbol{r}}_k^{{\rm{js}}} = \left( {\mathit{\boldsymbol{r}}_{1,q}^{{\rm{js}}};\mathit{\boldsymbol{r}}_{2,q}^{{\rm{js}}}; \cdots ;\mathit{\boldsymbol{r}}_{T,q}^{{\rm{js}}}} \right),q = 1,2, \cdots Q$ (15)

利用式(15) 得到图像的最终特征之后,针对图像分类,即可利用此特征进行分类。线性SVM具有快速和低复杂度的优点,本文采用多类线性SVM。

2.3 算法


1) 对初始非负特征矩阵$\mathit{\boldsymbol{X}}$和稀疏编码$\mathit{\boldsymbol{V}}$进行预处理。即$\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{X}}/\max \left( {\mathit{\boldsymbol{X}}\left( : \right)} \right),\mathit{\boldsymbol{V}} = \mathit{\boldsymbol{V}}/{\left\| \mathit{\boldsymbol{V}} \right\|_1}$

2) 根据式(6) 更新稀疏编码$\mathit{\boldsymbol{V}}$

3) 标准化$\mathit{\boldsymbol{U}}$$\mathit{\boldsymbol{V}}$,即${v_{ij}} = {v_{ij}}/\sqrt {\sum\limits_i {{v_{ij}}} } ,{u_{ij}} = {u_{ij}}/\sqrt {\sum\limits_i {{u_{ij}}} } $

4) 利用共轭梯度法更新Lagrange对偶矩阵$\mathit{\boldsymbol{ \boldsymbol{\varLambda} }}$,并由式(8) 求得最优字典$\mathit{\boldsymbol{U}}$

5) 得到模板特征的$\mathit{\boldsymbol{U}}$$\mathit{\boldsymbol{V}}$之后,利用式(10) 求得新特征的稀疏编码$\mathit{\boldsymbol{S}}$

6) 根据式(11) 对得到的编码进行MP,并利用空间金字塔划分得到原始的图像表示。

7) 利用式(12) 构建基于上下文信息的联合空间,并由式(13) 得到$\mathit{\boldsymbol{w}}_c^t$$\mathit{\boldsymbol{b}}_c^t$

8) 根据式(14) 将所有训练图像投影到联合空间中。

9) 由式(15) 连接所有的联合空间,并生成最终的图像表示。

10) 利用多类线性SVM在联合空间中对图像表示进行分类。

3 实验

3.1 实验数据集

介绍4个标准的图像数据集,包括Corel-10[20]、Scene-15[21]、Caltech-101[22]和Caltech-256[23]数据集。具体如表 1所示。其中Caltech-256数据集中的部分图像如图 2所示。

表 1 4种标准图像数据集
Table 1 Four standard image datasets

Corel-10101001 000
Scene-15152004004 485
Caltech-101101318009 144
Caltech-256256≥8029 780
图 2 Caltech-256数据集中部分图像
Fig. 2 Some pictures of the Caltech-256 dataset


3.2 实验设置

首先,在特征提取阶段,设置16×16的窗口,步长为8提取图像的SIFT特征。对于空间金字塔划分,利用文献[4]中所提出的顶3层,即:1×1,2×2以及4×4,且每一层的权重都相同。对于字典的学习过程,固定字典的尺寸$M$=1 024。另外,在使用$K$近邻构建相似矩阵$\mathit{\boldsymbol{W}}$时,$K$=5。

其次,对于优化问题中的3个参数$\lambda ,\beta $以及$\sigma $,在文献[5-6]中,不同的图像数据集设置不同的值。例如,在LScSPM算法[6]中,Corel-10和Scene-15:$\beta $=0.2,$\lambda $=0.4;Caltech-101和Caltech-256:$\beta $=0.1,$\lambda $=0.3。Gao等人在文献[24]中固定$\beta $=0.1,$\lambda $∈[0.1,0.4]以及固定$\lambda $=0.4,$\beta $∈[0.1,0.4]。因此,可以确定:$\lambda $∈[0.1,0.4],$\beta $∈[0.1,0.4]。在本文中,通过几个不同值的比较,最后设置的是:$\lambda $=0.4,$\beta $=0.2。而对于权重衰减参数$\sigma $,令$\sigma $=100。

最后,在生成基于上下文信息的联合空间中,随机选择的图像数量$L$和次数$T$是影响实验结果的两个很重要的参数。一般来说,比较大的$T$值和$L$值能更好地表示图像,并且可以增强相应联合空间的判别力,但是同时计算复杂度也会相应地增加。为了综合考虑计算复杂度和分类准确率,本文按照文献[16]的设置,$L$=30 % $Q$$T$=30。

3.3 实验结果与分析

将上述4个公共数据集都随机分成10份,然后基于10-折交叉验证来记录本文方法的平均分类准确率与标准差;最后,将本文方法与几种经典的方法进行比较并分析相应的实验结果。表 2表 5给出了本文方法与几种稀疏编码方法包括ScSPM[5]、LScSPM、Lap-NMF-SPM[8]以及RSS[16]在上述4个标准图像数据集上的分类效果比较。

表 2 Corel-10数据集上的分类结果
Table 2 The classification performance on the Corel-10 dataset

/ %

表 3 Scene-15数据集上的分类结果
Table 3 The classification performance on the Scene-15 dataset

/ %

表 4 Caltech-101数据集上的分类结果
Table 4 The classification performance on the Caltech-101 dataset

/ %

表 5 Caltech-256数据集上的分类结果
Table 5 The classification performance on the Caltech-256 dataset

/ %



4 结论



