Current Issue Cover

丁正彦1, 尚岩峰1, 张重阳2(1.公安部第三研究所物联网技术研发中心, 上海 201204;2.上海交通大学电子信息与电气工程学院, 上海 200240)

摘 要
目的 现阶段行人属性识别任务存在的主要问题在于某些属性类别的样本分布严重不均衡,为了解决上述问题,提出了一种基于渐进式迭代优化的行人属性识别方法。方法 首先针对不均衡类别,采用马赛克自编码器进行数据增广,构建基于属性平衡化的数据生成模型(balanced attributes-data generation model,BA-DGM),实现从通用大模型到专用小任务的迁移学习和知识增强;然后针对新生成的样本数据,采用判别模型进行一致性筛选,在与生成模型的相互对抗中实现启发式的注意力机制,从而构建基于特征注意力的数据判别模型(attention features-data discrimination model,AF-DDM);最后通过数据生成与数据判别相互交替的循环迭代,实现行人属性识别模型和数据的渐进式优化,并针对均衡后的样本数据,采用知识蒸馏框架对不同轮次的判别模型进行融合,实现基于渐进式迭代的蒸馏融合模型(progressive iterations-distillation fusion model,PI-DFM),进一步提高模型的泛化能力。结果 实验结果表明,所提出的渐进式优化方法在4个当前主流的评测数据集上均能有效提升模型准确率。在RAPv2(richly annotated pedestrian v2)数据集上,在模型复杂度不变的情况下,与已公开的最优行人属性识别模型相比,平均准确率(mean accuracy,mA)和平均F1分数分别提升了约5.0%和约1.7%;同时,经过多轮循环迭代后,原始数据中不均衡类别的个数减少为0,从而实现了数据集的渐进式优化。结论 本文提出的渐进式迭代优化策略与现有的改进方法之间具有良好的互补性,并有助于进一步提升模型的准确性指标。
Pedestrian attribute recognition method based on the progressive iterative optimization

Ding Zhengyan1, Shang Yanfeng1, Zhang Chongyang2(1.Research Center on Internet of Things, the Third Research Institute of the Ministry of Public Security, Shanghai 201204, China;2.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China)

Objective The pedestrian attribute recognition task is currently challenged for the sample distribution issue of some severe unbalanced attribute categories. To resolve the problems,we develop a method of progressive iteration optimization for pedestrian attribute recognition. Method First,data generation model based on masked autoencoder is used for data extension of the unbalanced categories distribution,and general large model-derived can be oriented to the small task. The balanced attributes-data generation model(BA-DGM)relevant masked autoencoder can be utilized to mask the original pedestrian images in terms of a random masking ratio and such newly generated images can be obtained for smallamount categories. The potential information can be fully mined,such as the topological relationship of the visible area, and the latent features-derived pedestrian images can be more resilient. Furthermore,it demonstrates that the autoencoder model can effectively achieve the universal feature representation of the targeted pedestrian,including the consensus features like the relationship-interconnected between various key components of the pedestrian. Second,discrimination model is used for filtering-consistent for the newly generated sample data,and the heuristic attention mechanism is adopted and implemented to deal with generative adversarial networks(GANs). The newly attention features-data discrimination model (AF-DDM)can be utilized and the diversified sample can be achieved while the key features of the attributes are preserved,which can enhance the interpretability of the recognition model. At the same time,to learn effective featuresrelated attributes,the filtered data is generated for training model. In the training process of the discrimination model,50- layer residual network model is adopted as the backbone network to be trained on the original attribute recognition dataset, using a multi-label classification framework. And,in the reasoning process of the discrimination model,the whole attribute labels are divided into two categories:key attribute labels and other related attribute labels. For key attribute labels,to keep consistent with the original labels and preserve the relevant high confidence,the newly generated sample can be kept in consistency in terms of the predicted labels from discrimination model,but it cannot be vice versed. Finally,the pedestrian attribute recognition model and data-contextual can be optimized further based on the cyclical iteration of data generation and discrimination. To optimize generalization ability of the model,the knowledge distillation framework can be used to fuse the discrimination models of the balanced sample data as well. After multiple iterations,the progressive iterationsdistillation fusion model(PI-DFM)based attribute discrimination models can be used as the teacher models and category balancing-afterward attribute recognition dataset is used as the training data. The above models are mutual-benefited in accordance with the datasets of different sample proportions. The network structure of the student model is consistent with the teacher model and the Kullback-Leibler(KL)divergence between the student output and the teacher output is calculated as the distillation loss function. In large-scale practical application scenarios,the sample proportion of test data and train data might be different. To improve the generalization ability of the model in an open uncertain scenario,teacher model can be trained by integrating different sample-proportion data in terms of the knowledge distillation framework. Result Experimental results are demonstrated that the proposed optimization method can effectively improve the accuracy of the model on the four popular evaluation datasets. The proposed metrics for attributes and samples are calculated, including 1)the mean accuracy of all attributes and 2)the F1 score of all samples,representing the harmonic average of the mean accuracy and the mean recall. For example,in the richly annotated pedestrian v2(RAPv2)dataset,the mean accuracy is increased by about 5. 0% and the average F1 score is increased by about 1. 7% as well on the hypothesis of an unchanged model complexity. After several loops of cyclic iteration,the number of unbalanced categories in the original data is reduced to zero,and the optimization can be thus realized for the dataset. In the ablation studies,new samples are randomly generated for each positive sample image,and then the discrimination model is used to filter inconsistent samples. The probability of spatial distribution of the preserved details is analyzed experimentally in terms of the masked region analysis of the filtered samples. The heuristic attention mechanism is introduced and data discrimination model can retain the relevant features of the key attributes of the targeted pedestrian better,which demonstrates that the interpretability of the discrimination model can be further improved by deeply mining the distribution of related features for different attributes. Conclusion The progressive iterative optimization strategy proposed in this paper has good complementarity with the existing improvement methods,and is helpful to further improve the accuracy of the recognition model. To optimize the relationship modeling among multiple pedestrian attributes and improve the interpretability of the recognition model further,future research direction can be predicted and focused on universal feature representation-based masked autoencoder (MAE)model combined with such prior knowledge like human skeleton structure.