Xu Meixiang, Sun Fuming, Li haojie. Online multi-label image classification with active learning[J]. Journal of Image and Graphics, 2015, 20(2): 237-244. DOI: 10.11834/jig.20150210.
Supervised machine learning methods have been applied to multi-label image classification problems with tremendous success. Despite the different learning mechanisms
the performances of these methods heavily rely on the number of labeled training examples. However
the acquisition of labeled examples requires significant efforts from annotators
which hinders the application of supervised learning methods to large-scale problems. In this paper
we extend an active querying method driven by informativeness and representativeness in single-label learning to multi-label image classification. Given that the training set grows in active learning
the classifier needs to be updated accordingly. A new classifier is required to use all training samples for re-training.Under such condition
the training burden of the classifier increases significantly. A highlyeffective online learning algorithm is explored to speed up learning efficiency. To deal with the massive multi-label classification problems
a novel algorithm named active learning with informativeness and representativeness for online multi-label learning (AIR-OML)
which aims at sample selection strategy and atupdate issuesin classifications
is presented. The sample selection strategy in active learning is based on the min-max theory
querying the most informative and the most representative examples to retrain the multi-label learner.Kullback-Leibler divergence and Karush-Kuhn-Tucker conditions are utilized to update the multi-label classifier online in real time. Combining active learning with online learning
we propose the AIR-OML algorithm. Experiments are conducted in four different real-world datasets with four evaluation criteria to evaluate the presented algorithm. Experimental results demonstrate that the sample selection strategy explored has a significant advantage over the other two existing sample selection strategies
and the classifier can achieve the best performance with fewer new samples by querying the most informative and representative examples. The AIR-OML algorithm can reduce the cost of human annotation and realize the goal of updating the classifier timely when newly labeled examples arrive.