Chen Ying, Huo Zhonghua. Person re-identification based on multi-directional saliency metric learning[J]. Journal of Image and Graphics, 2015, 20(12): 1674-1683. DOI: 10.11834/jig.20151212.
Person re-identification is important in video surveillance systems because it reduces human efforts in searching for a target from a large number of video sequences. However
this task is difficult because of variations in lighting conditions
clutter in the background
changes in individual viewpoints
and differences in individual poses. To tackle this problem
most studies concentrated either on designing a feature representation
metric learning method
or discriminative learning method. Visual saliency based on discriminative learning methods has recently been exploited because salient regions can help humans efficiently distinguish targets. Given the problem of inconsistent salience properties between matched patches in person re-identification
this study proposes a multi-directional salience similarity evaluation approach for person re-identification based on metric learning. The proposed method is robust to viewpoints and background variations. First
the salience of image patches is obtained by fusing inter-salience and intra-salience
which are estimated by manifold ranking. The visual similarity between matched patches is then established by the multi-directional weighted fusion of salience according to the distribution of the four saliency types of matched patches. The weight of saliency in each direction is obtained by using metric learning in the base of structural SVM ranking. Finally
a comprehensive similarity measure of image pairs is formed. The proposed method is demonstrated on two public benchmark datasets (e.g.
VIPeR and ETHZ)
and experimental results show that the proposed method achieves excellent re-identification rates with comprehensive similarity measures compared with other similar algorithms. Moreover
the proposed method is invariant to the effects of background variations. The re-identification results on the VIPeR dataset with half of the dataset sampled as training samples are quantitatively analyzed
and the performance of the proposed method outperforms existing learning based methods by 30% at rank 1(represents the correct matched pair) and 72% at rank 15(represents the expectation of the matches at rank 15). The proposed method can still achieve state-of-the-art performance even if the size of the training pair is small. For generalization verification
experiments are conducted on the ETHZ dataset for testing. Result shows that the proposed method outperforms existing feature-design-based methods and supervised-learning-based methods on all three sequences. Thus
the proposed method shows practical significance. The multi-directional weighted fusion of salience can yield a comprehensive description of the saliency distribution of image pairs and obtain a comprehensive similarity measure. The proposed method can realize person reidentification in large-scale
non-overlapping
multi-camera views. Furthermore
the proposed method improves the discriminative and accuracy performance of re-identification and has strong robustness to background changes.