改进重构权值的局部线性嵌入算法

刘方原; 夏克文; 牛文佳

doi:10.11834/jig.170301

图像理解和计算机视觉 | 浏览量 : 0 下载量: 4 CSCD: 2

PDF
导出
分享
收藏
专辑

改进重构权值的局部线性嵌入算法
Improved reconstruction weight-based locally linear embedding algorithm
2018年23卷第1期页码：52-60
收稿：2017-06-27，

修回：2017-9-26，

纸质出版：2018-01-16
DOI： 10.11834/jig.170301
稿件说明：

移动端阅览

刘方原, 夏克文, 牛文佳. 改进重构权值的局部线性嵌入算法[J]. 中国图象图形学报, 2018,23(1):52-60. DOI： 10.11834/jig.170301.

Fangyuan Liu, Kewen Xia, Wenjia Niu. Improved reconstruction weight-based locally linear embedding algorithm[J]. Journal of Image and Graphics, 2018, 23(1): 52-60. DOI： 10.11834/jig.170301.

摘要

目的

局部线性嵌入（LLE）算法是机器学习、数据挖掘等领域中的一种经典的流形学习算法。为克服LLE算法难以有效处理噪声、大曲率和稀疏采样数据等问题，提出一种改进重构权值的局部线性嵌入算法（IRWLLE）。

方法

采用测地线距离来描述结构，重新构造和定义LLE中的重构权值，即在某样本的邻域内，将测地距离与欧氏距离之比定义为结构权值；将测地距离与中值测地距离之比定义为距离权值，再将结构权值与距离权值的乘积作为重构权值，从而将流形的结构和距离两种信息进行有机的结合。

结果

对经典的人工数据Swiss roll、S-curve和Helix进行实验，在数据中加入噪声干扰，同时采用稀疏采样的方式来生成数据集，并与原始LLE算法和Hessian局部线性嵌入（HLLE）算法进行比较。实验结果表明，IRWLLE算法对比于LLE算法和HLLE算法，能够更好地保持流形的近邻关系，对流形的展开更加完好。尤其是对于加入噪声的大曲率数据集Helix，IRWLLE展现出极强的鲁棒性。对ORL和Yale人脸数据库进行人脸识别实验，采用最近邻分类器进行识别，将IRWLLE算法的识别结果与LLE算法进行对比。对于ORL数据集，IRWLLE算法识别率为90%，原LLE算法的识别率为85.5%；对于Yale数据集，IRWLLE算法识别率为88%，原LLE算法的识别率为75%，可见IRWLLE在人脸识别率上也有很大提高。

结论

本文提出的IRWLLE算法对比于原LLE算法，不仅将流形距离信息引入到重构权值中，而且还将结构信息加入其中，有效减少了噪声和流形外数据点的干扰，所以对于噪声数据具有更强的鲁棒性，能够更好地处理稀疏采样数据和大曲率数据，在人脸识别率上也有较大提升。

Abstract

Objective

The development of science and technology has reduced the cost of data collection

increased data in geometric series

and caused data dimension reduction technology to be an important part of machine learning. Manifold learning method is a nonlinear dimensionality reduction technique that is widely used in the visualization

feature extraction

and solving of the computational complexity of high-dimensional data. Locally linear embedding (LLE) algorithm is a classical manifold learning algorithm in machine learning and data mining. The basic idea of the LLE algorithm is that any sampling point and its neighbors form a locally linear plane. Each nearest neighbor corresponds to a weight

and the sampling point can be linearly represented by the principle of minimizing the reconstruction error of the nearest neighbor. An improved reconstruction weight-based LLE (IRWLLE) algorithm is proposed to overcome the problem of the LLE algorithm

including noise

large curvature

and sparse sampling data.

Method

Geodesic distance is used to describe the structure and reconstruct weights and define the reconstruction weights in the LLE to overcome the shortcomings of the original LLE algorithm

which considers only distance factors and ignores structural factors. Structural and distance weights are added. Any sample is selected as the center point

and the nearest-neighbor point (sample) from the center point is selected as the local neighborhood according to Euclidean distance. In this neighborhood

the ratio of the geodesic distance to the Euclidean distance between the center and neighboring points is defined as the structural weight

and the ratio of the geodesic distance to the median geodesic distance between the center and neighboring points is defined as the distance weight. The product of structural and distance weights is defined as the reconstruction weight; thus

the structure and distance information of the manifold are organically combined. Geometric distance calculation method using the classic Dijkstra algorithm is commonly used in Isomap algorithm. For the distance weight

the median distance of the geodesic distance in a local neighborhood is fixed. A farther distance from the center sample point of a neighboring point indicates a smaller distance weight corresponding to this neighbor point

which is in line with the idea that "a greater distance from the neighborhood center means a smaller contribution to the reconstruction center." The geodesic distance divided by its value reduces the noise effect on the weight to a certain extent. In the structural weight part

the ratio of Euclidean distance to geodesic distance is selected to measure the linearity of the local neighborhood. A greater distance from the linear plane of a neighboring point indicates a smaller contribution to the reconstruction center point and a smaller structural weight of the adjacent point. This notion further emphasizes the importance of structure to weight and enhances noise immunity.

Result

Classical artificial data

such as Swiss roll

S-curve

and Helix

are experimented

and noise is added to the data. Sparse sampling is used to generate a data set. The proposed algorithm is compared with the original LLE algorithm and Hessian LLE (HLLE) algorithm. Results show that the IRWLLE algorithm is better than the LLE and HLLE algorithms by maintaining the neighbor relation of the manifold and improving the manifold. In particular

IRWLLE exhibits stronger robustness for the large curvature data set Helix. A face recognition experiment on ORL and Yale face databases is conducted by using a nearest-neighbor classifier

and the recognition result of the IRWLLE algorithm is compared with that of the LLE algorithm. For the ORL dataset

the recognition rate of the IRWLLE algorithm is 90%

whereas that of the original LLE algorithm is 85.5%. For the Yale dataset

the recognition rate of the IRWLLE algorithm is 88%

whereas that of the original LLE algorithm is 75%. Therefore

the face recognition rate of IRWLLE is also greatly improved.

Conclusion

The proposed IRWLLE algorithm is based on the original LLE algorithm. This proposed algorithm not only introduces manifold distance information into reconstruction weight but also adds structural information

thereby effectively reducing the interference from noise and data outside the manifold. The IRWLLE algorithm has high robustness to noise data and can manage sparse sampling and large curvature data. The face recognition rate of the IRWLLE algorithm is also enhanced.

关键词

Keywords

references

Abdi H, Williams L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews:Computational Statistics, 2010, 2(4):433-459.[DOI:10.1002/wics.101]

Zhong F J, Zhang J S. Linear discriminant analysis based on L1-norm maximization[J]. IEEE Transactions on Image Processing, 2013, 22(8):3018-3027.[DOI:10.1109/TIP.2013.2253476]

Borg I, Groenen P. Modern multidimensional scaling:theory and applications[J]. Journal of Educational Measurement, 2003, 40(3):277-280.[DOI:10.1111/j.1745-3984.2003.tb01108.x]

Hyvärinen A, Oja E. Independent component analysis:algorithms and applications[J]. Neural Networks, 2000, 13(4-5):411-430.[DOI:10.1016/S0893-6080(00)00026-5]

Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500):2319-2323.[DOI:10.1126/science.290.5500.2319]

Saul L K, Roweis S T. An introduction to locally linear embedding[J]. Journal of Machine Learning Research, 2001, 7(1):1-13.

Saul L K, Roweis S T. Think globally, fit locally:unsupervised learning of low dimensional manifolds[J]. Journal of Machine Learning Research, 2003, 4:119-155.[DOI:10.1162/153244304322972667]

Zhang Z Y, Zha H Y. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J]. SIAM Journal on Scientific Computing, 2004, 26(1):313-338.[DOI:10.1137/S1064827502419154]

Tang Z J, Ruan L L, Qin C, et al. Robust image hashing with embedding vector variance of LLE[J]. Digital Signal Processing, 2015, 43:17-27.[DOI:10.1016/j.dsp.2015.05.002]

Wu P P, Xia K W, Yu H Y. Correlation coefficient based supervised locally linear embedding for pulmonary nodule recognition[J]. Computer Methods and Programs in Biomedicine, 2016, 136:97-106.[DOI:10.1016/j.cmpb.2016.08.009]

Wang X, Zheng Y, Zhao Z Z, et al. Bearing fault diagnosis based on statistical locally linear embedding[J]. Sensors, 2015, 15(7):16225-16247.[DOI:10.3390/s150716225]

Xing X L, Du S D, Wang K J. Robust hessian locally linear embedding techniques for high-dimensional data[J]. Algorithms, 2016, 9(2):#36.[DOI:10.3390/a9020036]

De Ridder D, Kouropteva O, Okun O, et al. Supervised locally linear embedding[C]//Joint International Conference ICANN/ICONIP 2003. Istanbul, Turkey:Springer, 2003:333-341.[ DOI:10.1007/3-540-44989-2_40 http://dx.doi.org/10.1007/3-540-44989-2_40 ]

He X F, Cai D, Yan S C, et al. Neighborhood preserving embedding[C]//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China:IEEE, 2005:1208-1213.[ DOI:10.1109/ICCV.2005.167 http://dx.doi.org/10.1109/ICCV.2005.167 ]

Vural E, Guillemot C. Out-of-sample generalizations for supervised manifold learning for classification[J]. IEEE Transactions on Image Processing, 2016, 25(3):1410-1424.[DOI:10.1109/TIP.2016.2520368]

Sun Y, Ye Q W, Wang X D, et al. Improved LLE algorithm based on sparse constraint[J]. Computer Engineering, 2013, 39(5):53-56, 60.

孙洋, 叶庆卫, 王晓东, 等.基于稀疏约束的LLE改进算法[J].计算机工程, 2013, 39(5):53-56, 60. [DOI:10.3969/j.issn.1000-3428.2013.05.010]

Du C, Zou H X, Sun J X, et al. Manifold learning algorithm based on modified local tangent space alignment[J]. Journal of Electronics&Information Technology, 2014, 36(2):277-284.

杜春, 邹焕新, 孙即祥, 等.基于改进局部切空间排列的流形学习算法[J].电子与信息学报, 2014, 36(2):277-284. [DOI:10.3724/SP.J.1146.2013.00135]