主动学习的多标签图像在线分类

徐美香; 孙福明; 李豪杰

doi:10.11834/jig.20150210

第十届和谐人机环境联合会议栏目 | 浏览量 : 0 下载量: 309 CSCD: 0

PDF
导出
分享
收藏
专辑

主动学习的多标签图像在线分类
Online multi-label image classification with active learning
2015年20卷第2期页码：237-244
网络出版：2015-02-10，

纸质出版：2015
DOI： 10.11834/jig.20150210
稿件说明：

移动端阅览

徐美香, 孙福明, 李豪杰. 主动学习的多标签图像在线分类[J]. 中国图象图形学报, 2015,20(2):237-244. DOI： 10.11834/jig.20150210.

Xu Meixiang, Sun Fuming, Li haojie. Online multi-label image classification with active learning[J]. Journal of Image and Graphics, 2015, 20(2): 237-244. DOI： 10.11834/jig.20150210.

摘要

在多标签有监督学习框架中

构建具有较强泛化性能的分类器需要大量已标注训练样本

而实际应用中已标注样本少且获取代价十分昂贵.针对多标签图像分类中已标注样本数量不足和分类器再学习效率低的问题

提出一种结合主动学习的多标签图像在线分类算法. 基于min-max理论

采用查询最具代表性和最具信息量的样本挑选策略主动地选择待标注样本

且基于KKT(Karush-Kuhn-Tucker)条件在线地更新多标签图像分类器. 在4个公开的数据集上

采用4种多标签分类评价指标对本文算法进行评估.实验结果表明

本文采用的样本挑选方法比随机挑选样本方法和基于间隔的采样方法均占据明显优势;当分类器达到相同或相近的分类准确度时

利用本文的样本挑选策略选择的待标注样本数目要明显少于采用随机挑选样本方法和基于间隔的采样方法所需查询的样本数. 本文算法一方面可以减少获取已标注样本所需的人工标注代价;另一方面也避免了传统的分类器重新训练时利用所有数据所产生的学习效率低下的问题

达到了当新数据到来时可实时更新分类器的目的.

Abstract

Supervised machine learning methods have been applied to multi-label image classification problems with tremendous success. Despite the different learning mechanisms

the performances of these methods heavily rely on the number of labeled training examples. However

the acquisition of labeled examples requires significant efforts from annotators

which hinders the application of supervised learning methods to large-scale problems. In this paper

we extend an active querying method driven by informativeness and representativeness in single-label learning to multi-label image classification. Given that the training set grows in active learning

the classifier needs to be updated accordingly. A new classifier is required to use all training samples for re-training.Under such condition

the training burden of the classifier increases significantly. A highlyeffective online learning algorithm is explored to speed up learning efficiency. To deal with the massive multi-label classification problems

a novel algorithm named active learning with informativeness and representativeness for online multi-label learning (AIR-OML)

which aims at sample selection strategy and atupdate issuesin classifications

is presented. The sample selection strategy in active learning is based on the min-max theory

querying the most informative and the most representative examples to retrain the multi-label learner.Kullback-Leibler divergence and Karush-Kuhn-Tucker conditions are utilized to update the multi-label classifier online in real time. Combining active learning with online learning

we propose the AIR-OML algorithm. Experiments are conducted in four different real-world datasets with four evaluation criteria to evaluate the presented algorithm. Experimental results demonstrate that the sample selection strategy explored has a significant advantage over the other two existing sample selection strategies

and the classifier can achieve the best performance with fewer new samples by querying the most informative and representative examples. The AIR-OML algorithm can reduce the cost of human annotation and realize the goal of updating the classifier timely when newly labeled examples arrive.