Current Issue Cover
Co-history:协同学习中考虑历史信息的标签噪声鲁棒学习方法

董永峰, 李佳伟, 王振, 贾文玉(河北工业大学人工智能与数据科学学院)

摘 要
目的 深度神经网络在计算机视觉分类任务上表现出优秀的性能,然而,在标签噪声环境下,深度学习模型面临着严峻的考验。基于协同学习(Co-teaching)的学习算法能够有效缓解神经网络对噪声标签数据的学习问题,但仍然存在许多不足之处。为此,本文提出了一种协同学习中考虑历史信息的标签噪声鲁棒学习方法(Co-history)。方法 首先,针对在噪声标签环境下使用交叉熵损失函数(CE)存在的过拟合问题,通过分析样本损失的历史规律,提出了修正损失函数,在模型训练时减弱CE损失带来的过拟合带来的影响。其次,针对Co-teaching算法当中,两个网络存在过早收敛的问题,提出差异损失函数,在训练过程中保持两个网络的差异性。最后,遵循小损失选择策略,通过结合样本历史损失,提出了新的样本选择方法,可以更加精准地选择干净样本。结果 在四个模拟噪声数据集(F-MNIST、SVHN、CIFAR-10和CIFAR-100)和一个真实数据集(Clothing1M)上进行对比实验。其中,在F-MNIST、SVHN、CIFAR-10和CIFAR-100,对称噪声(Symmetric)40%噪声率下,对比Co-teaching算法,本方法分别提高了3.52%、4.77%、6.16%和6.96%;在真实数据集Clothing1M下,对比Co-teaching算法,最佳准确率和最后准确率分别提高了0.94%和1.2%。结论 本文提出的协同学习下考虑历史损失的带噪声标签鲁棒分类算法,经过大量实验论证,可以有效降低噪声标签带来的影响,提高模型分类准确率。
关键词
Co-history: Learning with Noisy Labels by Co-teaching with History Losses

DongYongFeng, LiJiaWei, WangZhen, JiaWenYu(School of Artifcial Intelligence, Hebei University of Technology)

Abstract
Objective Deep neural networks (DNNs) have been successful in many fields, especially in computer vision, which cannot be achieved without large-scale labeled datasets. However, in practice, it is very difficult to collect large-scale datasets with accurate labels, especially data in some professional fields, which requires relevant experts to label, which will cost a lot of manpower and financial resources. So, to cut costs, the researchers turned to datasets built by crowdsourcing annotations, search engine queries, web crawling, and more. In this way, the dataset inevitably introduces noisy label, which seriously affects the generalization of DNNs. Because DNNs memorize these noise labels during training. Learning algorithms based on Co-teaching can effectively alleviate the learning problem of neural networks on noisy label data, such as Co-teaching+, JoCoR, CoDis, etc. They all put forward different opinions on using two networks to solve the problem of label noise from different perspectives. However, in the noise label environment, the deep learning model based on CE loss is very sensitive to the noise label, which makes the model easy to fit the label noise sample and unable to learn the real pattern of the data. With the progress of training, the Co-teaching causes the parameters of the two networks to gradually become consistent, prematurely converging to the same network, resulting in the stop of learning. As the iteration progresses, the network will inevitably remember some of the noise label samples, resulting in the failure to correctly distinguish between noise and clean samples based solely on the CE loss value. Therefore, relying solely on CE loss to use small loss selection strategy is not reliable. To solve these problems, we propose learning with noisy labels by co-teaching with history losses (Co-History) that considers historical information in collaborative learning. Methods Firstly, to solve the overfitting problem of cross-entropy loss (CE) in the noise label environment, we propose a correction loss by analyzing the history of sample loss. The significance of the revised loss function is to adjust the weight of CE loss in the current iteration, so that the CE loss of the sample remains stable in the historical iteration as a whole, which conforms to the law that the classifier should maintain after separating noise and clean samples, so as to reduce the influence of overfitting caused by CE loss. Secondly, we propose the difference loss to solve the problem of premature convergence of two networks in Co-teaching algorithm. Inspired by contrast loss, we design a difference loss to make two networks maintain a certain distance from the feature representation of the same sample, so as to maintain the difference between two networks in the training process and avoid the degradation of two networks into a single network. Because the parameters of the network are different, different decision boundaries will be generated and different types of errors will be filtered. Therefore, keeping this difference can better play the performance of collaborative learning. Finally, due to the existence of overfitting, samples with noisy labels tend to have larger loss fluctuations than samples with clean labels. By combining the historical loss information of samples and following the small loss selection strategy, we propose a new sample selection method, which can select clean samples more accurately. Specifically, we selected samples with low classification losses and low fluctuations in historical losses as clean sample training. Results We conducted a number of experiments to demonstrate the effectiveness of this algorithm, including comparison experiments on four standard datasets (F-MNIST, SVHN, CIFAR-10 and CIFAR-100) and one real dataset (Clothing1M). Four categories of artificially simulated noise are added to the standard dataset, including symmetric, asymmetric, pairflip, and tridiagonal noise types, with 20% and 40% noise rates for each noise type, respectively. In the real dataset, the labels are generated by the text around the image, which contains the noise label itself, so no additional label noise is added. Among them, at the symmetric noise type with 20% noise rate of F-MNIST, SVHN, CIFAR-10 and CIFAR-100 datasets, the Co-history algorithm improved by 2.05%, 2.19%, 3.06% and 2.58%, respectively, compared with the Co-teaching algorithm, and at 40% noise rate, the increase is 3.52%, 4.77%, 6.16% and 6.96%, respectively. In the real dataset Clothing1M, the best accuracy and last accuracy of Co-history compared with Co-teaching algorithm are improved by 0.94% and 1.2%, respectively. The effectiveness of the proposed loss is proved by the ablation experiments. Conclusion In this paper, in view of the overfitting problem of CE loss training and the historical law of sample loss, we propose a correction loss. At the same time, in order to solve the problem of premature convergence of two networks in Co-teaching, the difference loss function is proposed. Finally, in view of the traditional small-loss sample selection strategy, this paper fully considers the historical law of sample loss, and puts forward a more accurate sample selection strategy. Different from the learning strategies of Co-teaching in the past, the algorithm proposed in this paper is superior to the baseline algorithm after a large number of experiments. It has stronger robustness in datasets with noise labels and is more suitable for noise label scenarios. In addition, ablation experiments clearly demonstrate the effectiveness of various improvements in this method. Because the algorithm in this paper needs to analyze the historical loss information of each sample, it is necessary to save the historical loss value of each sample. When the number of training samples increases, the memory space occupied will increase, resulting in an increase in computing and storage costs. In addition, with a large number of sample categories, the performance of the algorithm in this paper is not optimal under some noise environments (such as asymmetric noise type with 40% noise rate and 20% noise rate of CIFAR-100 dataset). Therefore, in future work, we will find more high-performance solutions under the premise of guaranteed accuracy. Further explore more excellent robust classification algorithms in learning with noisy labels.
Keywords

订阅号|日报