Current Issue Cover
非贪婪的鲁棒性度量学习算法

曾凡霞1,2, 张文生1,2(1.中国科学院自动化研究所, 北京 100190;2.中国科学院大学, 北京 100049)

摘 要
目的 度量学习是机器学习与图像处理中依赖于任务的基础研究问题。由于实际应用背景复杂,在大量不可避免的噪声环境下,度量学习方法的性能受到一定影响。为了降低噪声影响,现有方法常用L1距离取代L2距离,这种方式可以同时减小相似样本和不相似样本的损失尺度,却忽略了噪声对类内和类间样本的不同影响。为此,本文提出了一种非贪婪的鲁棒性度量学习算法——基于L2/L1损失的边缘费歇尔分析(marginal Fisher analysis based on L2/L1 loss,MFA-L2/L1),采用更具判别性的损失,可提升噪声环境下的识别性能。方法 在边缘费歇尔分析(marginal Fisher analysis,MFA)方法的基础上,所提模型采用L2距离刻画相似样本损失、L1距离刻画不相似样本损失,同时加大对两类样本的惩罚程度以提升方法的判别性。首先,针对模型非凸带来的求解困难,将目标函数转为迭代两个凸函数之差便于求解;然后,受DCA(difference of convex functions algorithm)思想启发,推导出非贪婪的迭代求解算法,求得最终度量矩阵;最后,算法的理论证明保证了迭代算法的收敛性。结果 在5个UCI(University of California Irrine)数据集和7个人脸数据集上进行对比实验:1)在不同程度噪声的5个UCI数据集上,MFA-L2/L1算法最优,且具有较好的抗噪性,尤其在30%噪声程度的Seeds和Wine数据集上,与次优方法LDA-NgL1(non-greedy L1-norm linear discriminant analysis))相比,MFA-L2/L1的准确率高出9%;2)在不同维度的AR和FEI人脸数据集上的实验,验证了模型采用L1损失、采用L2损失提升了模型的判别性;3)在Senthil、Yale、ORL、Caltech和UMIST人脸数据集的仿真实验中,MFA-L2/L1算法呈现出较强鲁棒性,性能排名第1。结论 本文提出了一种基于L2/L1损失的鲁棒性度量学习模型,并推导了一种便捷有效的非贪婪式求解算法,进行了算法收敛性的理论分析。在不同数据集的不同噪声情况下的实验结果表明,所提算法具有较好的识别率和鲁棒性。
关键词
Robust distance metric learning algorithm based on nongreedy solution

Zeng Fanxia1,2, Zhang Wensheng1,2(1.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Objective Distance metric learning, which is task dependent, is an essential research issue in machine learning and image processing. As a usual preprocessing step for classification and recognition, distance metric learning method enhances the performance of machine learning methods, such as clustering and k-nearest neighbor. This kind of method aims to learn the underlying structure of distribution given the semantic similarity or labels of samples. The optimization goal of distance metric learning is to shrink the distance between similar pairs and expand the distance between dissimilar pairs. With the development of machine learning, many distance metric learning methods have been proposed and applied successfully. These methods use L2 distance to measure loss. However, the L2 distance amplifies the influence of noise with the square operation, because of the unavoidable noise resulting from the complex application background. This situation may make the observed dissimilar and similar pairs respectively appear to be further apart and closer than in fact, which misleads the learning. Thus, the existence of noise leads to the poor performance of distance metric learning in machine learning and pattern recognition. Existing robust distance metric learning methods usually adopt L1 distance rather than L2 distance, aiming to diminish the distance scale of pairs simultaneously and thus reduce the influence of noise. However, by adopting a kind of loss for similar and dissimilar pairs, these methods neglect the different optimization directions between intra- and interclasses. Therefore, a robust distance metric learning algorithm based on L2/L1 loss with nongreedy solution is proposed. This method adopts more discriminative loss and improves the recognition performance in the presence of feature noise. Method Based on marginal Fisher analysis (MFA) method, the proposed model adopts L2 and L1 distance to measure the loss of similar pairs and dissimilar pairs, respectively. In details, it considers the influence of noise on intraclass and interclass scatters separately. Affected by noise, the observed similar pairs may be closer than the truth, and the observed dissimilar pairs may be further apart than the truth, which may leads that the punishments on pairs is not enough. Adopting L2/L1 loss, L2 distance gives more punishments on similar samples than L1 distance does, and L1 distance gives more punishments on dissimilar samples than L2 distance does. However, the model is nonconvex, which is involved with both minimization of L2 norm and maximization of the L1 norm. Hence, the existing solution for trace ratio problems are not suitable for the objective function. This makes the object difficult to solve, thus a nongreedy solution is derived. First, the objective function, which is a ratio, is transformed into a difference of two convex functions during each iteration. This process makes the object easy to solve. Then, inspired by the idea of the difference of convex function algorithms (DCA), an iterative algorithm with nongreedy solution is derived to solve the object, and a projection matrix is learned. Analyzing the algorithm, in fact, an auxiliary function of the object is generated during the iteration, which makes sure that the value of objective function decreases. Lastly, a theoretical analysis of the iterative algorithm is performed to ensure the convergence of the proposal. Result Synthetic experiments are conducted on five UCI(University of California Irrine) datasets and seven face datasets with noise. The nearest neighbor classifier based on the learned matrix is used to compare the performance of related methods. The proposed method outperforms other methods in terms of accuracy. First, when 5%, 15%, 25%, and 30% feature noise is added on five UCI datasets, MFA-L2/L1 achieves the best performance and exhibits robustness against noise. In addition, the accuracy of MFA-L2/L1 is 9% higher than the second-best method. Second, during the accuracy comparison with varying dimension of projection based on AR dataset, the proposal beats MFA, which validates that using the L1 loss for dissimilar pairs makes the proposal more discriminative. During the accuracy comparison with varying dimensions of projection based on FEI dataset, the proposal beats LDA-NgL1(non-greedy L1-norm liner discriminant analysis), which validates that using the L2 loss for similar pairs makes the proposal more discriminative. Lastly, based on five face datasets with noise, MFA-L2/L1 behaves better than other methods in terms of robustness. Conclusion This study proposes a robust distance metric learning method based on L2/L1 loss, which adopts different losses for similar pairs and dissimilar pairs respectively. To solve a nonconvex object, the object is transformed into a difference of two convex functions during each iteration. By virtue of the idea of DCA, an iterative algorithm is derived, which generates an auxiliary function decreasing the object. The convergence of the algorithm is guaranteed by a theoretical proof. Based on the public datasets, several series of experiments with different levels and types of synthetic noise are executed. Results on different datasets show that the proposal performs well and is robust to noise based on the experiment datasets. Future research will focus on local distance metric learning in the presence of label noise, because the label noise makes the observed distribution deviate from the true distribution more than the feature noise does.
Keywords

订阅号|日报