低秩稀疏图嵌入的半监督特征选择

万源; 陈晓丽; 张景会; 欧卓玲

doi:10.11834/jig.170570

图像分析和识别 | 浏览量 : 0 下载量: 4 CSCD: 3

PDF
导出
分享
收藏
专辑

低秩稀疏图嵌入的半监督特征选择
Semi-supervised feature selection based on low-rank sparse graph embedding
2018年23卷第9期页码：1316-1325
收稿：2017-11-10，

修回：2018-2-2，

纸质出版：2018-09-16
DOI： 10.11834/jig.170570
稿件说明：

移动端阅览

万源, 陈晓丽, 张景会, 欧卓玲. 低秩稀疏图嵌入的半监督特征选择[J]. 中国图象图形学报, 2018,23(9):1316-1325. DOI： 10.11834/jig.170570.

Yuan Wan, Xiaoli Chen, Jinghui Zhang, Zhuoling Ou. Semi-supervised feature selection based on low-rank sparse graph embedding[J]. Journal of Image and Graphics, 2018, 23(9): 1316-1325. DOI： 10.11834/jig.170570.

摘要

目的

特征降维是机器学习领域的热点研究问题。现有的低秩稀疏保持投影方法忽略了原始数据空间和降维后的低维空间之间的信息损失，且现有的方法不能有效处理少量有标签数据和大量无标签数据的情况，针对这两个问题，提出基于低秩稀疏图嵌入的半监督特征选择方法（LRSE）。

方法

LRSE方法包含两步：第1步是充分利用有标签数据和无标签数据分别学习其低秩稀疏表示，第2步是在目标函数中同时考虑数据降维前后的信息差异和降维过程中的结构信息保持，其中通过最小化信息损失函数使数据中有用的信息尽可能地保留下来，将包含数据全局结构和内部几何结构的低秩稀疏图嵌入在低维空间中使得原始数据空间中的结构信息保留下来，从而能选择出更有判别性的特征。

结果

将本文方法在6个公共数据集上进行测试，对降维后的数据采用KNN分类验证本文方法的分类准确率，并与其他现有的降维算法进行实验对比，本文方法分类准确率均有所提高，在其中的5个数据集上本文方法都有最高的分类准确率，其分类准确率分别在Wine数据集上比次高算法鲁棒非监督特征选择算法（RUFS）高11.19%，在Breast数据集上比次高算法RUFS高0.57%，在Orlraws10P数据集上比次高算法多聚类特征选择算法（MCFS）高1%，在Coil20数据集上比次高算法MCFS高1.07%，在数据集Orl64上比次高算法MCFS高2.5%。

结论

本文提出的基于低秩稀疏图嵌入的半监督特征选择算法使得降维后的数据能最大限度地保留原始数据包含的信息，且能有效处理少量有标签样本和大量无标签样本的情况。实验结果表明，本文方法比现有算法的分类效果更好，此外，由于本文方法基于所有的特征都在线性流形上的假设，所以本文方法只适用于线性流形上的数据。

Abstract

Objective

With the widespread use of high-dimensional data

dimensionality reduction has become an important research direction in both machine learning and data mining. High-dimensional data require additional storage space

cause high complexity in calculating

and are time consuming. Given that the large number of redundant features indicates a minimal effect on data analysis

the sparse preserving project method is proposed

and this method maintains the sparse reconstruction relations of data by minimizing an objective function including

$$l$$

norm regularization term. Thus

the approach can obtain intrinsic geometric structure information of data without any parameter setting. The traditional SPP neglects the potential global structure because it computes sparse representation of each sample separately; thus

SPP is not robust to noise. Another dimension reduction method

which is called low-rank sparse preserving projection

has advantages in preserving global structure and local linear structure of the data by constructing a low-rank sparse graph. However

this method ignores the loss of information between the original high-dimensional data and the reduced lower dimensional data. Furthermore

the method fails to address the situation that involves a small number of labeled samples and a large number of unlabeled samples. Thus

to solve these two problems

a semi-supervised feature selection method based on low-rank sparse graph embedding (LRSE) is proposed in this paper.

Method

The proposed LRSE consists of two steps. The first is to learn the low-rank sparse representation of original data from a small amount of labeled data and a large amount of unlabeled data. Next

we consider the difference of information between high-and low-dimensional data and structural information preservation in a union model. Thus

the useful information in the data is preserved as much as possible by minimizing the information loss function. Furthermore

the structural information in the original data space is preserved by embedding the low-rank sparse graph that contains a data global structure and internal geometric structure into lower-dimensional data space. Therefore

this method can select a larger number of discriminative features. The solution process of the objective function is presented in detail in Section 4

where we use the alternating optimization method to convert non-convex problems into two convex optimization subproblems. Then

the Lagrange multiplier method is adopted to solve these two subproblems.

Result

To validate the performance of the proposed algorithm

we conduct five experiments on six public datasets. The first two experiments involve dimensionality reduction on dataset Wine and face dataset Orl64. Then

we visualize the obtained features. The features selected by the proposed method indicate strong discrimination ability and less redundant information from visual graphics 1 and 2. The second two experiments conduct feature selection on six datasets

and we adopt KNN classification on the lower-dimensional data to verify the effectiveness of the proposed method. Table 2 shows that the average classification accuracy (ACC) of our method LRSE on the five datasets is the highest except for datasets WarpPIE10P compared with four other feature selection methods and baseline. In particular

the ACC increases by 11.19% in the Wine dataset. The classification accuracy is better than that of the original high-dimensional data in most cases. A reasonable explanation is that our method eliminates redundant features in dimensionality reduction. Thus

the conclusions of experiment 1 and 2 can also be proven by these results. The last experiment is parameter sensitivity analysis. The experimental results show that the classification accuracy is more stable for most datasets when

$$\alpha $$

=1.

Conclusion

This study proposes a semi-supervised feature selection method based on low-rank sparse graph embedding; the method learns the low-rank sparse representation of labeled data and unlabeled data and embeds it in the lower-dimensional data space. Thus

the information contained in the original data can be preserved as much as possible. Moreover

the information loss in dimensionality reduction can be minimized by adding reconstruction error of data in the objective function. The combination of these two strategies can improve the efficiency of our dimensionality reduction method. We conduct a series of comparative analyses with several existing methods on multiple datasets to prove that the proposed method is more efficient than the existing dimensionality reduction methods. Furthermore

the method applied in this study is based on the assumption that all of the features are on linear manifolds. Thus

our method is not suitable for all types of data. Future research will focus on expanding the applicability of the method to a wider range of data with kernel trick.

关键词

Keywords

references

Ritchie M D, Hahn L W, Moore J H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity[J]. Genetic Epidemiology, 2003, 24(2):150-157. [DOI:10.1002/gepi.10218]

Wang X S, Gao Y, Cheng Y H. A non-negative sparse semi-supervised dimensionality reduction algorithm for hyperspectral data[J]. Neurocomputing, 2016, 188:275-283. [DOI:10.1016/j.neucom.2014.12.127]

Wu J, Xie Y F, Yang C H, et al. An unsupervised reduction method for the selection of flotation froth image characters and its application[J]. Information and Control, 2014, 43(3):314-317, 333.

吴佳, 谢永芳, 阳春华, 等.一种无监督约简的浮选泡沫图像特征选择方法及应用[J].信息与控制, 2014, 43(3):314-317, 333. [DOI:10.3724/SP.J.1219.2014.00314]

Lipovetsky S. PCA and SVD with nonnegative loadings[J]. Pattern Recognition, 2009, 42(1):68-76. [DOI:10.1016/j.patcog.2008.06.025]

Bishop C M. Neural Networks for Pattern Recognition[M]. New York:Oxford University Press, 1996:1235-1242.

Li K T, Peng H, Zhou X F, et al. Feature selection algorithm based on multiple correlation measures[J]. Journal of Chinese Computer Systems, 2017, 38(4):696-700.

李开拓, 彭慧, 周晓锋, 等.基于多种相关性度量的特征选择方法研究[J].小型微型计算机系统, 2017, 38(4):696-700.

Yang Y M, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.

Cover T M, Thomas J A. Elements of Information Theory[M]. 2nd ed. New Jersey:Wiley-Blackwell, 2006:1600-1601.

He X F, Cai D, Niyogi P. Laplacian score for feature selection[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2005: 507-514.

Cai D, Zhang C Y, He X F. Unsupervised feature selection for multi-cluster data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC: ACM, 2010: 333-342. [ DOI:10.1145/1835804.1835848 http://dx.doi.org/10.1145/1835804.1835848 ]

Qian M J, Zhai C X. Robust unsupervised feature selection[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing: AAAIPress, 2013: 1621-1627.

Zhu X J. Semi-supervised learning literature survey[D]. Madison: University of Wisconsin, 2005: 63-77.

Cheng B, Yang J C, Yan S C, et al. Learning with $$\ell$$ 1 -graph for image analysis[J]. IEEE Transactions on Image Processing, 2010, 19(4):858-866. [DOI:10.1109/TIP.2009.2038764] .

Qiao L S, Chen S C, Tan X Y. Sparsity preserving projections with applications to face recognition[J]. Pattern Recognition, 2010, 43(1):331-341. [DOI:10.1016/j.patcog.2009.05.005]

Liu G C, Lin Z C, Yu Y. Robust subspace segmentation by low-rank representation[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa: ACM, 2010: 663-670.

Du S Q, Wang W L, Ma Y D. Low rank sparse preserve projection for face recognition[C]//Proceedings of 2016 Chinese Control and Decision Conference. Yinchuan: IEEE, 2016: 3822-3826. [ DOI:10.1109/CCDC.2016.7531651 http://dx.doi.org/10.1109/CCDC.2016.7531651 ]

He X F, Niyogi P. Locality preserving projections[J]. Advances in Neural Information Processing Systems, 2004, 16:153-160.

He X F, Cai D, Yan S C, et al. Neighborhood preserving embedding[C]//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing: IEEE, 2005: 1208-1213. [ DOI:10.1109/ICCV.2005.167 http://dx.doi.org/10.1109/ICCV.2005.167 ]

Chung F R K. Spectral Graph Theory[M]. Providence:California State University at Fresno, 1997:212.

Baraniuk R G. A lecture on compressive sensing[J]. IEEE Signal Processing Magazine, 2007, 24(4):118-121.

Candès E J, Recht B. Exact matrix completion via convex optimization[J]. Foundations of Computational Mathematics, 2009, 9(6):717-772. [DOI:10.1007/s10208-009-9045-5]

Zhuang L S, Gao H Y, Huang J J, et al. Semi-supervised classification via low rank graph[C]//Proceedings of the 6th International Conference on Image and Graphics. Hefei: IEEE, 2011: 511-516. [ DOI:10.1109/ICIG.2011.86 http://dx.doi.org/10.1109/ICIG.2011.86 ]

Lee D D, Seung H S. Algorithms for non-negative matrix factorization[C]//Proceedings of 13th International Conference on Neural Information Processing Systems. Denver: MIT Press, 2000: 535-541.

Hoyer P O. Non-negative sparse coding[C]//Proceedings of the 200212th IEEE Workshop on Neural Networks for Signal Processing. Martigny: IEEE, 2002: 557-565. [ DOI:10.1109/NNSP.2002.1030067 http://dx.doi.org/10.1109/NNSP.2002.1030067 ]

Dua D, Karra T E. UCI machine learning repository[DB/OL]. 2018-03-29] . http://archive.ics.uci.edu/ml/datasets.html http://archive.ics.uci.edu/ml/datasets.html .

Cai D. Four face databases in matlab format[DB/OL]. 2018-03-29] . http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.htm http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.htm .