Li Zhixin, Shi Zhiping, Zhang Canlong, Wang Jinyan. Hybrid generative/discriminative model for automatic image annotation[J]. Journal of Image and Graphics, 2015, 20(5): 687-699. DOI: 10.11834/jig.20150511.
Given the notorious semantic gap between low level features and high level concepts in image retrieval
automatic image annotation has become a crucial issue. To bridge the semantic gap
this paper proposes a hybrid generative/discriminative approach to annotate images automatically. In the generative learning stage
images are modeled by continuous probabilistic latent semantic analysis model. As a result
we can obtain the corresponding model parameters and the topic distribution of each image. If this topic distribution is taken as an intermediate representation of each image
the image auto-annotation problem could be transformed into a multi-label classification problem. In the discriminative learning stage
we construct ensembles of classifier chains by learning these intermediate representations. At the same time
the contextual information of the annotation words can be integrated into the classifier chains. Therefore
this approach could achieve higher annotation accuracy and better retrieval performance. Experiments on two baseline datasets indicate that the average precision and recall of our approach attained 0.28 and 0.32
respectively
on Corel5k dataset. In addition
these two measures of our approach attained 0.29 and 0.18
respectively
on IAPR-TC12 dataset. The experimental results proved that our approach performed better than most state-of-the-art approaches on many evaluation measures. Furthermore
the precision-recall curve showed the superior performance of our approach over several typical and representative approaches. On the basis of hybrid learning strategy
this paper presents an image auto-annotation approach
which integrates the advantages of the generative and discriminative models. As a result
the approach exhibits better
more effective
and more robust semantic image retrieval. The methods and techniques of this paper are not only usable in the fields of image retrieval and recognition
but they can play an important role in the fields of cross-media retrieval and data mining after an appropriate adaption.