发布时间: 2018-09-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170570 2018 | Volume 23 | Number 9 图像分析和识别

 收稿日期: 2017-11-10; 修回日期: 2018-02-02 基金项目: 国家自然科学基金项目（61573012） 第一作者简介: 万源, 1976年生, 女, 教授, 博士, 研究方向为机器学习、图像处理、模式识别。E-mail:wanyuan@whut.edu.cn;张景会, 女, 硕士研究生, 研究方向为图像处理与模式识别。E-mail:Jingzhang@whut.edu.cn;欧卓玲, 女, 硕士研究生, 研究方向为机器学习、模式识别。E-mail:Jingzhang@whut.edu.cn. 中图法分类号: TP391.4 文献标识码: A 文章编号: 1006-8961(2018)09-1316-10

# 关键词

Semi-supervised feature selection based on low-rank sparse graph embedding
Wan Yuan, Chen Xiaoli, Zhang Jinghui, Ou Zhuoling
College of Science, Wuhan University of Technology, Wuhan 430070, China
Supported by: National Natural Science Foundation of China (61573012)

# Abstract

Objective With the widespread use of high-dimensional data, dimensionality reduction has become an important research direction in both machine learning and data mining. High-dimensional data require additional storage space, cause high complexity in calculating, and are time consuming. Given that the large number of redundant features indicates a minimal effect on data analysis, the sparse preserving project method is proposed, and this method maintains the sparse reconstruction relations of data by minimizing an objective function including $l$1 norm regularization term. Thus, the approach can obtain intrinsic geometric structure information of data without any parameter setting. The traditional SPP neglects the potential global structure because it computes sparse representation of each sample separately; thus, SPP is not robust to noise. Another dimension reduction method, which is called low-rank sparse preserving projection, has advantages in preserving global structure and local linear structure of the data by constructing a low-rank sparse graph. However, this method ignores the loss of information between the original high-dimensional data and the reduced lower dimensional data. Furthermore, the method fails to address the situation that involves a small number of labeled samples and a large number of unlabeled samples. Thus, to solve these two problems, a semi-supervised feature selection method based on low-rank sparse graph embedding (LRSE) is proposed in this paper. Method The proposed LRSE consists of two steps. The first is to learn the low-rank sparse representation of original data from a small amount of labeled data and a large amount of unlabeled data. Next, we consider the difference of information between high-and low-dimensional data and structural information preservation in a union model. Thus, the useful information in the data is preserved as much as possible by minimizing the information loss function. Furthermore, the structural information in the original data space is preserved by embedding the low-rank sparse graph that contains a data global structure and internal geometric structure into lower-dimensional data space. Therefore, this method can select a larger number of discriminative features. The solution process of the objective function is presented in detail in Section 4, where we use the alternating optimization method to convert non-convex problems into two convex optimization subproblems. Then, the Lagrange multiplier method is adopted to solve these two subproblems. Result To validate the performance of the proposed algorithm, we conduct five experiments on six public datasets. The first two experiments involve dimensionality reduction on dataset Wine and face dataset Orl64. Then, we visualize the obtained features. The features selected by the proposed method indicate strong discrimination ability and less redundant information from visual graphics 1 and 2. The second two experiments conduct feature selection on six datasets, and we adopt KNN classification on the lower-dimensional data to verify the effectiveness of the proposed method. Table 2 shows that the average classification accuracy (ACC) of our method LRSE on the five datasets is the highest except for datasets WarpPIE10P compared with four other feature selection methods and baseline. In particular, the ACC increases by 11.19% in the Wine dataset. The classification accuracy is better than that of the original high-dimensional data in most cases. A reasonable explanation is that our method eliminates redundant features in dimensionality reduction. Thus, the conclusions of experiment 1 and 2 can also be proven by these results. The last experiment is parameter sensitivity analysis. The experimental results show that the classification accuracy is more stable for most datasets when $\alpha$=1. Conclusion This study proposes a semi-supervised feature selection method based on low-rank sparse graph embedding; the method learns the low-rank sparse representation of labeled data and unlabeled data and embeds it in the lower-dimensional data space. Thus, the information contained in the original data can be preserved as much as possible. Moreover, the information loss in dimensionality reduction can be minimized by adding reconstruction error of data in the objective function. The combination of these two strategies can improve the efficiency of our dimensionality reduction method. We conduct a series of comparative analyses with several existing methods on multiple datasets to prove that the proposed method is more efficient than the existing dimensionality reduction methods. Furthermore, the method applied in this study is based on the assumption that all of the features are on linear manifolds. Thus, our method is not suitable for all types of data. Future research will focus on expanding the applicability of the method to a wider range of data with kernel trick.

# Key words

feature selection; semi-supervised learning; low rank representation; sparse representation; structural embedding; image classification

# 1 低秩稀疏表示

 $\begin{array}{*{20}{c}} {\mathop {\min }\limits_{{s_i}} {{\left\| {{\mathit{\boldsymbol{s}}_i}} \right\|}_1}}\\ {{\rm{s}}.\;{\rm{t}}.\;\;\;{\mathit{\boldsymbol{x}}_i} = \mathit{\boldsymbol{X}}{\mathit{\boldsymbol{s}}_i}{\bf{1}} = {{\bf{1}}^{\rm{T}}}{\mathit{\boldsymbol{s}}_i}} \end{array}$ (1)

 $\mathop {\min }\limits_\mathit{\boldsymbol{W}} \sum\limits_{i = 1}^n {\left\| {{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{x}}_i} - {\mathit{\boldsymbol{W}}^{\rm{T}}}\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{s}}_i}} \right\|_2^2}$ (2)

 $\begin{array}{l} \mathop {\min }\limits_\mathit{\boldsymbol{Z}} {\rm{rank}}\left( \mathit{\boldsymbol{Z}} \right)\\ {\rm{s}}.\;{\rm{t}}.\;\;\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{XZ}} \end{array}$ (3)

 $\begin{array}{l} \mathop {\min }\limits_{\mathit{\boldsymbol{Z}},\mathit{\boldsymbol{E}}} {\left\| \mathit{\boldsymbol{Z}} \right\|_ * } + \lambda {\left\| \mathit{\boldsymbol{E}} \right\|_{2,1}}\\ {\rm{s}}.\;{\rm{t}}.\;\;\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{XZ}} + \mathit{\boldsymbol{E}} \end{array}$ (4)

 $\begin{array}{l} \mathop {\min }\limits_{\mathit{\boldsymbol{Z}},\mathit{\boldsymbol{E}}} {\left\| \mathit{\boldsymbol{Z}} \right\|_ * } + \lambda {\left\| \mathit{\boldsymbol{E}} \right\|_{2,1}} + \gamma {\left\| \mathit{\boldsymbol{Z}} \right\|_1}\\ {\rm{s}}.\;{\rm{t}}.\;\;\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{XZ}} + \mathit{\boldsymbol{E}},{\rm{diag}}\left( \mathit{\boldsymbol{Z}} \right) = 0 \end{array}$ (5)

 $\begin{array}{*{20}{c}} {\mathop {\min }\limits_{{\mathit{\boldsymbol{U}}_i},{\mathit{\boldsymbol{E}}_i}} {{\left\| {{\mathit{\boldsymbol{U}}_i}} \right\|}_ * } + \lambda {{\left\| {{\mathit{\boldsymbol{E}}_i}} \right\|}_{2,1}} + \mu {{\left\| {{\mathit{\boldsymbol{U}}_i}} \right\|}_1}}\\ {{\rm{s}}.\;{\rm{t}}.\;\;\;{\mathit{\boldsymbol{X}}_i} = {\mathit{\boldsymbol{X}}_i}{\mathit{\boldsymbol{U}}_i} + {\mathit{\boldsymbol{E}}_i},\;\;\;\;{\rm{diag}}\left( {{\mathit{\boldsymbol{U}}_i}} \right) = 0} \end{array}$ (8)

 $\mathit{\boldsymbol{U}} = {\rm{diag}}\left( {{\mathit{\boldsymbol{U}}_1}, \cdots ,{\mathit{\boldsymbol{U}}_j}, \cdots ,{\mathit{\boldsymbol{U}}_c},{\mathit{\boldsymbol{U}}_N}} \right)$ (9)

# 3 模型求解

 $\begin{array}{*{20}{c}} {F = \frac{1}{2}{\rm{tr}}\left( {\left( {\mathit{\boldsymbol{X}} - \mathit{\boldsymbol{XWP}}} \right){{\left( {\mathit{\boldsymbol{X}} - \mathit{\boldsymbol{XWP}}} \right)}^{\rm{T}}}} \right) + }\\ {\frac{\alpha }{2}\left( {\left( {\mathit{\boldsymbol{XW}} - \mathit{\boldsymbol{XWU}}} \right){{\left( {\mathit{\boldsymbol{XW}} - \mathit{\boldsymbol{XWU}}} \right)}^{\rm{T}}}} \right) = }\\ {\frac{1}{2}{\rm{tr}}\left( {\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{X}}^{\rm{T}}} - 2\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{X}}^{\rm{T}}} + \mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}}{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}} \right) + }\\ {\frac{\alpha }{2}\left( {\mathit{\boldsymbol{XWL}}{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}} \right)} \end{array}$ (10)

 $\begin{array}{*{20}{c}} {f = \frac{1}{2}{\rm{tr}}\left( {\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{X}}^{\rm{T}}} - 2\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{X}}^{\rm{T}}} + \mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}}{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}} \right) + }\\ {\frac{\alpha }{2}\left( {\mathit{\boldsymbol{XWL}}{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}} \right) + {\rm{tr}}\left( {{\mathit{\boldsymbol{\omega }}^{\rm{T}}}\mathit{\boldsymbol{W}}} \right)} \end{array}$ (11)

 $\begin{array}{*{20}{c}} {\frac{{\partial f}}{{\partial \mathit{\boldsymbol{W}}}} = - {\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{P}}^{\rm{T}}} + {\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}} + }\\ {\alpha {\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWL + \omega }} = 0} \end{array}$ (12)

 $\begin{array}{*{20}{c}} { - {{\left( {{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{P}}^{\rm{T}}}} \right)}_{ij}}{\mathit{\boldsymbol{W}}_{ij}} + {{\left( {{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}}} \right)}_{ij}}{\mathit{\boldsymbol{W}}_{ij}} + }\\ {{{\left( {\alpha {\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWL}}} \right)}_{ij}}{\mathit{\boldsymbol{W}}_{ij}} = 0} \end{array}$ (13)

 ${\mathit{\boldsymbol{W}}_{ij}} = {\mathit{\boldsymbol{W}}_{ij}}\frac{{{{\left( {{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{P}}^{\rm{T}}}} \right)}_{ij}}}}{{{{\left( {{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}} + \alpha {\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWL}}} \right)}_{ij}}}}$ (14)

 $\begin{array}{*{20}{c}} {f = \frac{1}{2}{\rm{tr}}\left( {\mathit{\boldsymbol{X}}{\mathit{\boldsymbol{X}}^{\rm{T}}} - 2\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{X}}^{\rm{T}}} + } \right.}\\ {\left. {\mathit{\boldsymbol{XWP}}{\mathit{\boldsymbol{P}}^{\rm{T}}}{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}} \right) + {\rm{tr}}\left( {{\mathit{\boldsymbol{\sigma }}^{\rm{T}}}\mathit{\boldsymbol{P}}} \right)} \end{array}$ (15)

 $\frac{{\partial f}}{{\partial \mathit{\boldsymbol{P}}}} = - {\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}} + {\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}} + \mathit{\boldsymbol{\sigma }} = 0$ (16)

 $- {\left( {{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}}} \right)_{ij}}{\mathit{\boldsymbol{P}}_{ij}} + {\left( {{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}}} \right)_{ij}}{\mathit{\boldsymbol{P}}_{ij}} = 0$ (17)

 ${\mathit{\boldsymbol{P}}_{ij}} = {\mathit{\boldsymbol{P}}_{ij}}\frac{{{{\left( {{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{X}}} \right)}_{ij}}}}{{{{\left( {{\mathit{\boldsymbol{W}}^{\rm{T}}}{\mathit{\boldsymbol{X}}^{\rm{T}}}\mathit{\boldsymbol{XWP}}} \right)}_{ij}}}}$ (18)

1) 参照文献[22]中的方法计算$\mathit{\boldsymbol{U}}$

2) 计算$\mathit{\boldsymbol{L}} = \left( {\mathit{\boldsymbol{I}}-\mathit{\boldsymbol{U}}} \right){\left( {\mathit{\boldsymbol{I}}-\mathit{\boldsymbol{U}}} \right)^{\rm{T}}}$

3) 根据式(14)更新投影矩阵$\mathit{\boldsymbol{W}}$

4) 根据式(18)更新系数矩阵$\mathit{\boldsymbol{P}}$

5) 如果满足停止准则${\left\| {{W_{t + 1}}-{W_t}} \right\|_F} \le \varepsilon {\left\| {{\mathit{\boldsymbol{W}}_1}-{\mathit{\boldsymbol{W}}_0}} \right\|_F}$，则进入下一步；否则，回到步骤3)；

6) 对$\mathit{\boldsymbol{W}}$的每一列进行归一化处理；

7) 计算$\mathit{\boldsymbol{W}}$每个行向量的二范数${\left\| {{\mathit{\boldsymbol{w}}^i}} \right\|_2}, i = 1, \cdots, d$并排序，找出前$k$个对应的特征下标$\mathit{\boldsymbol{I}}$返回。

# 4.1 实验数据和实验设置

Table 1 Experimental datasets

 数据集 样本 特征 类别 Wine 178 13 3 Breast 699 10 2 WarpPIE10P 210 2 420 10 Orlraws10P 100 10 304 10 Coil20 1 440 1 024 20 Orl64 400 4 096 40

1) Fisher score(FS)，经典的打分方法，判别每个特征的重要性，根据特征的重要性进行选择；

2) 多聚类特征选择(MCFS)，多聚类特征选择先用谱嵌入进行聚类分析，再用1范数正则回归模型进行特征选择，这样所选择的特征尽可能地保持数据的多聚类结构；

3) Laplacian score(LS)，LS基于属于同一类样本彼此接近的假设，根据每个特征的局部保持能力对其进行打分；

4) 鲁棒非监督特征选择(RUFS)，鲁棒非监督特征选择方法通过局部学习正则鲁棒非负矩阵分解得到伪聚类标签，同时用鲁棒2-1范数最小化模型进行了特征选择；

5) 稀疏保持投影(SPP)，最小化一个与1范数正则项相关的目标函数来保持数据的稀疏重构关系；

6) 低秩稀疏保持投影(LPSPP)，将稀疏表示与低秩表示结合构建一个低秩稀疏图，将其嵌入在低维空间中进行特征选择；

7) All features，用所有的特征进行分类，计算分类准确率。

MCFS, LS和RUFS中的近邻个数$k$均设置为5。SPP和LRSPP方法降维时先对数据用PCA投影进行预处理，本文LRSE方法的结构保持项系数$\alpha$设置为1。由于数据集中样本量不大，本文对每个数据集选择50%样本作为有标签的数据，50%样本为无标签样本，对于有监督方法FS和MCFS，用50%的有标签样本用于训练，对于非监督方法LS和RUFS，用所有的样本进行训练。对降维后的数据采用KNN分类器进行分类，分类器的近邻参数设置为3，用5折交叉验证得到最终的准确率。

# 4.2 实验设计及结果分析

1) 实验1。Wine是UCR数据集中的一个实际生活中的数据集，该数据集包括13个特征，3种类别，178个样本。通常的降维方法实验[13]是从这13个特征中选择2个特征作为纵横坐标，再画出这178个样本的散点图。

2) 实验2。用所有的方法在Orl64人脸数据集上选择100个特征，并将由不同方法选择的100个特征可视化在人脸图像上。

3) 实验3。由于SPP和LRSPP方法都适用于人脸识别，本文选取2个人脸图像数据集Orl64和Coil20进行对比，在每个数据集上分别选取{20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120}个特征，计算不同方法的分类准确率。

4) 实验4。不同方法在Wine和Breast数据集上分别选择{2, 3, 4, 5, 6, 7, 8}个特征，对于其他4个人脸数据集，分别选择{20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120}个特征，计算不同方法在不同特征个数下的最高分类准确率。

Table 2 The classification accuracy of different methods on six datasets

 Datasets ALL feature FS MCFS LS RUFS LRSE Wine 0.713 3 0.702 7 0.724 8 0.702 7 0.753 3 0.865 2 Breast 0.619 4 0.915 6 0.616 5 0.632 3 0.917 0 0.922 7 WarpPIE10P 0.990 5 0.923 8 0.966 7 0.690 5 0.733 3 0.866 7 Orlraws10P 0.930 0 0.870 0 0.940 0 0.800 0 0.820 0 0.950 0 Coil20 0.965 4 0.951 8 0.980 3 0.890 3 0.949 3 0.9910 Orl64 0.837 5 0.735 0 0.862 5 0.585 0 0.785 0 0.887 5 注：其中粗体表示最高的准确率，下划线表示第2高的准确率。

5) 实验5。在数据集Breast和Coil20上对本文LRSE方法进行参数的灵敏度分析，其中本文结构项系数$\alpha$的范围设定为{0.000 1, 0.001, 0.01, 0.1, 1, 10}，计算不同参数和不同特征数下的分类准确率。

# 参考文献

• [1] Ritchie M D, Hahn L W, Moore J H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity[J]. Genetic Epidemiology, 2003, 24(2): 150–157. [DOI:10.1002/gepi.10218]
• [2] Wang X S, Gao Y, Cheng Y H. A non-negative sparse semi-supervised dimensionality reduction algorithm for hyperspectral data[J]. Neurocomputing, 2016, 188: 275–283. [DOI:10.1016/j.neucom.2014.12.127]
• [3] Wu J, Xie Y F, Yang C H, et al. An unsupervised reduction method for the selection of flotation froth image characters and its application[J]. Information and Control, 2014, 43(3): 314–317, 333. [吴佳, 谢永芳, 阳春华, 等. 一种无监督约简的浮选泡沫图像特征选择方法及应用[J]. 信息与控制, 2014, 43(3): 314–317, 333. ] [DOI:10.3724/SP.J.1219.2014.00314]
• [4] Lipovetsky S. PCA and SVD with nonnegative loadings[J]. Pattern Recognition, 2009, 42(1): 68–76. [DOI:10.1016/j.patcog.2008.06.025]
• [5] Bishop C M. Neural Networks for Pattern Recognition[M]. New York: Oxford University Press, 1996: 1235-1242.
• [6] Li K T, Peng H, Zhou X F, et al. Feature selection algorithm based on multiple correlation measures[J]. Journal of Chinese Computer Systems, 2017, 38(4): 696–700. [李开拓, 彭慧, 周晓锋, 等. 基于多种相关性度量的特征选择方法研究[J]. 小型微型计算机系统, 2017, 38(4): 696–700. ] [DOI:10.3969/j.issn.1000-1220.2017.04.007]
• [7] Yang Y M, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.
• [8] Cover T M, Thomas J A. Elements of Information Theory[M]. 2nd ed. New Jersey: Wiley-Blackwell, 2006: 1600-1601.
• [9] He X F, Cai D, Niyogi P. Laplacian score for feature selection[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2005: 507-514.
• [10] Cai D, Zhang C Y, He X F. Unsupervised feature selection for multi-cluster data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC: ACM, 2010: 333-342. [DOI:10.1145/1835804.1835848]
• [11] Qian M J, Zhai C X. Robust unsupervised feature selection[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing: AAAI Press, 2013: 1621-1627.
• [12] Zhu X J. Semi-supervised learning literature survey[D]. Madison: University of Wisconsin, 2005: 63-77.
• [13] Cheng B, Yang J C, Yan S C, et al. Learning with $\ell$1-graph for image analysis[J]. IEEE Transactions on Image Processing, 2010, 19(4): 858–866. [DOI:10.1109/TIP.2009.2038764]
• [14] Qiao L S, Chen S C, Tan X Y. Sparsity preserving projections with applications to face recognition[J]. Pattern Recognition, 2010, 43(1): 331–341. [DOI:10.1016/j.patcog.2009.05.005]
• [15] Liu G C, Lin Z C, Yu Y. Robust subspace segmentation by low-rank representation[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa: ACM, 2010: 663-670.
• [16] Du S Q, Wang W L, Ma Y D. Low rank sparse preserve projection for face recognition[C]//Proceedings of 2016 Chinese Control and Decision Conference. Yinchuan: IEEE, 2016: 3822-3826. [DOI:10.1109/CCDC.2016.7531651]
• [17] He X F, Niyogi P. Locality preserving projections[J]. Advances in Neural Information Processing Systems, 2004, 16: 153–160.
• [18] He X F, Cai D, Yan S C, et al. Neighborhood preserving embedding[C]//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing: IEEE, 2005: 1208-1213. [DOI:10.1109/ICCV.2005.167]
• [19] Chung F R K. Spectral Graph Theory[M]. Providence: California State University at Fresno, 1997: 212.
• [20] Baraniuk R G. A lecture on compressive sensing[J]. IEEE Signal Processing Magazine, 2007, 24(4): 118–121. [DOI:10.1109/MSP.2007.4286571]
• [21] Candès E J, Recht B. Exact matrix completion via convex optimization[J]. Foundations of Computational Mathematics, 2009, 9(6): 717–772. [DOI:10.1007/s10208-009-9045-5]
• [22] Zhuang L S, Gao H Y, Huang J J, et al. Semi-supervised classification via low rank graph[C]//Proceedings of the 6th International Conference on Image and Graphics. Hefei: IEEE, 2011: 511-516. [DOI:10.1109/ICIG.2011.86]
• [23] Lee D D, Seung H S. Algorithms for non-negative matrix factorization[C]//Proceedings of 13th International Conference on Neural Information Processing Systems. Denver: MIT Press, 2000: 535-541.
• [24] Hoyer P O. Non-negative sparse coding[C]//Proceedings of the 200212th IEEE Workshop on Neural Networks for Signal Processing. Martigny: IEEE, 2002: 557-565. [DOI:10.1109/NNSP.2002.1030067]
• [25] Dua D, Karra T E. UCI machine learning repository[DB/OL]. 2018-03-29]. http://archive.ics.uci.edu/ml/datasets.html.
• [26] Cai D. Four face databases in matlab format[DB/OL]. 2018-03-29]. http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.htm.