Towards universal person re-identification： survey on the applications of large-scale self-supervised pre-train model for person re-identification

Feng Zhanxiang; Lai Jianhuang; Yuan Zang; Huang Yuli; Lai Peijie

doi:10.11834/jig.240426

Views : 0 下载量: 64 CSCD: 0

PDF
Export
Share
Collection
Album

Towards universal person re-identification： survey on the applications of large-scale self-supervised pre-train model for person re-identification
Pages: 1-22(2024)
Published Online： 21 August 2024 ，
DOI： 10.11834/jig.240426
稿件说明：

移动端阅览

冯展祥,赖剑煌,袁藏等.走向通用行人重识别：预训练大模型技术在行人重识别的应用综述[J].中国图象图形学报,

Feng Zhanxiang,Lai Jianhuang,Yuan Zang,et al.Towards universal person re-identification： survey on the applications of large-scale self-supervised pre-train model for person re-identification[J].Journal of Image and Graphics,
冯展祥,赖剑煌,袁藏等.走向通用行人重识别：预训练大模型技术在行人重识别的应用综述[J].中国图象图形学报, DOI： 10.11834/jig.240426.

Feng Zhanxiang,Lai Jianhuang,Yuan Zang,et al.Towards universal person re-identification： survey on the applications of large-scale self-supervised pre-train model for person re-identification[J].Journal of Image and Graphics, DOI： 10.11834/jig.240426.

摘要

行人重识别旨在对没有视野重叠覆盖的视域拍摄的行人目标进行身份匹配，是计算机视觉的研究热点，在安防监控场景有重要的研究意义和广阔的应用前景。受限于标注成本过高，行人数据集规模较小，当前行人重识别模型性能还达不到应用的水平，通用行人重识别技术还任重道远。近年来，预训练大模型引发了广泛的关注，获得了快速的发展，其核心技术在行人重识别领域获得了越来越多的应用。本文对预训练大模型技术在行人重识别的应用进行了全面的梳理回顾。首先介绍本领域的研究背景，从行人重识别的研究现状和面对的困难出发，简要阐述了预训练技术和预训练大模型的相关技术，分析预训练大模型技术在行人重识别的研究意义和应用前景。在此基础上，对基于预训练大模型的行人重识别研究进行了详细的介绍，将已有研究分为大规模自监督预训练行人重识别、预训练大模型引导的行人重识别和基于提示学习的行人重识别三类，并在多个数据集对前沿算法的效果和性能进行对比。最后，对该任务进行了总结，分析当前研究的局限，并展望未来研究的方向。整体而言，预训练大模型技术是实现通用行人重识别不可或缺的技术，当前研究还处于探索阶段，行人重识别与预训练大模型技术的结合还不够紧密，如何结合行人先验和预训练大模型技术实现通用行人重识别需要学术界和工业界共同思考和推动。

Abstract

Person re-identification （re-id） aims to recognize target pedestrians across non-overlapping camera views. Re-id is a research hotspot in computer vision， and with significant research value and widespread application prospect in security surveillance. The performance of re-id techniques meets rapid growth in recent years， and the SOTA methods outperform human beings. Furthermore， researchers pay increasing attention to re-id in challenging uncontrolled environments， including visible-infrared person re-id， occluded person re-id， cloth-changing person re-id， low resolution person re-id， and aerial person re-id. However， the performance of re-id models is still far from satisfaction and does not meat the requirements of applications. There are two major reasons. First， the existing re-id models are trained by closed datasets with single scenarios and enough labeled pedestrians. However， in application environments， there are many varying scenarios， the environments across changing cameras are very different， and the labeled pedestrians are expensive to collect. Therefore， the performance， robustness， and generalization ability of the existing methods are not enough to support realistic applications. Second， because of the high annotation cost， the scale of the re-id datasets is small. The number of training samples for re-id is much smaller than the other vision tasks， such as face recognition， object recognition， and segmentation. As a result， the re-id models may be overfitted to the training images， and the generalizablity of re-id models is not enough. Consequently， there is still a long way to reach universal person re-id.Recently， large-scale pre-train model has attracted significant attention and got rapid development. The key techniques are important for the development of re-id techniques. In this paper， we make an overview survey on the applications of large-scale pre-train techniques for person re-id. First， we introduce the background of large-scale pre-train models. Self-supervised pre-train techniques have gained great success in natural language processing （NLP）. The Transformer structure has shown superior performance to extract robust NLP features. GPT and BERT are pioneering large-scale pre-train models based on Transformer， and are proven generative for down-stream tasks. GPT3 proves that the large-scale pre-train models are competitive with the SOTA supervised models without annotations. With the successful application of GPT3， many researchers try to apply self-supervised pre-train technique to vision tasks， and some pioneering researches are made for vision-language cross-modal tasks. ViLBERT is the first attempt to learn the relationships between vision and language. The CLIP model shows great generalization ability for zero-shot vision tasks. MAE adopts the mask modelling techniques to train a pre-train model with good generalization ability. We can see that the large-scale pre-train techniques can improve the performance and generalize ability of the baseline models with large-scale unsupervised data， which is very important for re-id because we do not need to collect numerous labeled pedestrians， which is expensive for re-id. Besides， the information contained by the large-scale pre-train models can be utilized to improve the performance of re-id models.Because self-supervised pre-train technique can promote re-id models， some researchers have tried pioneering efforts. Here we introduce the existing researches for large-scale pre-train re-id models. The current literature is categorized into 3 types， including self-supervised pre-train re-id methods， large-scale pre-train model based re-id methods， and prompt learning based re-id methods. We will show the details of the above large-scale pre-train technique based methods， and show the effects and performances of the SOTA methods on various benchmarks. self-supervised pre-train re-id methods employ self-supervised pre-train techniques and large-scale unsupervised pedestrian benchmark to train a robust pre-train model. Note that the labeled pedestrians are scarce and expensive， some researchers try to construct weakly supervised/ unsupervised benchmarks for studying self-supervised pre-train re-id techniques. SYSU-30K is the first large-scale weakly supervised re-id dataset， which is constructed by over 30 million images and 30，000 IDs from 1，000 downloaded videos. The challenges of SYSU-30K includes low-resolution， view changes， occlusion and changing illumination. LUPerson is the first large-scale unsupervised person benchmark， which contains more than 4.2 million unsupervised pedestrian images from 46000 scenes， and covers the challenges of illumination variations， changing resolution and occlusion. The researchers then adopt tracking for the LUPerson dataset and construct the weakly supervised dataset LUPerson-NL， which contains more than 10 million pedestrians and 430，000 noisy identities. With the emergence of large-scale unsupervised datasets， some researchers make studies to apply self-supervised techniques for re-id. Some studies utilize the contrastive learning framework to learn robust re-id models from unsupervised pedestrians. The MoCo framework and catastrophic forgetting score are utilized to improve the generalization ability of re-id models. Besides， some researches employ the prior knowledge of pedestrians to improve the performance of self-supervised pre-train techniques. The local structure， the view information and color information are employed to incorporate priors for pre-train re-id methods. Large-scale pre-train model based re-id methods employ the knowledge of multi-modal large-scale model and use the interaction between vision and language to improve the performance of re-id models. Because CLIP model has shown superior performance for zero-shot vision tasks， most of the related studies utilize the CLIP model to learn discriminant and robust re-id model. Llama2 is also adopted to promote re-id task. Prompt learning based re-id methods introduce prompt learning methods to learn a robust re-id model. First， prompt learning re-id methods utilize the relationships between text description and visual features to learn a more discriminative and robust model. Besides， the researchers focus on employing the prompts to make the model adaptive to different environments so that we can obtain a universal re-id model which can cope with changing environments. Experimental results show that self-supervised techniques， large-scale pre-train models， and prompt learning methods can significantly improve the performance and generalization ability of re-id models. We can get a more universal re-id model for unseen scenarios.Finally， we conclude the the overview of the current literature， analyze the limitation of the existing researches， and discuss the potential directions for future researches. In conclusion， the large-scale pre-train techniques are essential for universal re-id. The existing researches are pioneering and immature. The connection between re-id and large-scale pre-train model is loose. How to combine the prior of pedestrians and the knowledge of large-scale model so as to achieve universal re-id still needs the joint thinking and promotions from the academia and industry.

关键词

行人重识别深度学习自监督预训练大模型提示学习

Keywords

person re-identificationdeep learningself-supervised pre-trainlarge-scale modelprompt learning

references

Ye M， Shen J， Lin G， Xiang T， Shao L， and Hoi S C. 2021. Deep learning for person re-identification： A survey and outlook. IEEE transactions on pattern analysis and machine intelligence， 44（6）， 2872-2893 ［DOI： 10.1109/TPAMI.2021.3054775http://dx.doi.org/10.1109/TPAMI.2021.3054775］

Luo H， Jiang W， Fan X， and Zhang S P. 2020. A Survey on Deep Learning Based Person Re-identification. Acta Automatica Sinica， 45（11）：2032-2049

罗浩，姜伟，范星，张思朋. 2020. 基于深度学习的行人重识别研究进展. 自动化学报， 45（11）， 2032-2049 ［DOI：10.16383/j.aas.c180154http://dx.doi.org/10.16383/j.aas.c180154］

An X， Deng J， Guo J， Feng Z， Zhu X， Yang J， and Liu， T. 2022. Killing two birds with one stone： Efficient and robust training of face recognition cnns by partial fc. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Las Vegas， USA： IEEE： 4042-4051 ［DOI： 10.1109/CVPR52688.2022.00401http://dx.doi.org/10.1109/CVPR52688.2022.00401］

Zhu Z， Huang G， Deng J， Ye Y， Huang J， Chen X and Zhou J. 2021. Webface260m： A benchmark unveiling the power of million-scale deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Online ： IEEE： 10492-10502 ［DOI： 10.1109/CVPR46437.2021.01035http://dx.doi.org/10.1109/CVPR46437.2021.01035］

Deng J， Dong W， Socher R， Li L J， Li K， and Li F. 2009. Imagenet： A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition. Miami， USA： IEEE： 248-255 ［DOI： 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848］

Wei L， Zhang S， Gao W， and Tian Q. 2018. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City， USA： IEEE： 79-88. ［DOI： 10.1109/CVPR.2018.00016http://dx.doi.org/10.1109/CVPR.2018.00016］

Yi D， Lei Z， Liao S， and Li S Z. 2014. Learning face representation from scratch. ［EB/OL］. ［2014-10-28］. https://arxiv.org/pdf/1411.7923.pdfhttps://arxiv.org/pdf/1411.7923.pdf

Shao S， Li Z， Zhang T， Peng C， Yu G， Zhang X， and Sun J. 2019. Objects365： A large-scale， high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision. Seoul， South Korea： IEEE： 8430-8439 ［DOI： 10.1109/ICCV.2019.00852http://dx.doi.org/10.1109/ICCV.2019.00852］

Kirillov A， Mintun E， Ravi N， Mao H， Rolland C， and Girshick R. 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 4015-4026 ［DOI： 10.1109/ICCV51070.2023.00371http://dx.doi.org/10.1109/ICCV51070.2023.00371］

Srikrishna K， Mengran G， Ziyan W， Angels R and Richard J. 2018. A Systematic Evaluation and Benchmark for Person Re-Identification： Features， Metrics， and Datasets. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（3）： 523-536 ［DOI： 10.1109/TPAMI.2018.2807450http://dx.doi.org/10.1109/TPAMI.2018.2807450］

Li W， Zhao R， Xiao T， and Wang X. 2014. Deepreid： Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus， USA： IEEE： 152-159 ［DOI： 10.1109/CVPR.2014.27http://dx.doi.org/10.1109/CVPR.2014.27］

Zheng L， Shen L， Tian L， Wang S， Wang J， and Tian Q. 2015. Scalable person re-identification： A benchmark. In Proceedings of IEEE international conference on computer vision. Santiago， Chile： IEEE： 1116-1124 ［DOI： 10.1109/ICCV.2015.133http://dx.doi.org/10.1109/ICCV.2015.133］

Lin J A， Bao C Z， Dong J F， Yang X， Wang X. 2024. Multilingual Text-Video Cross-Modal Retrieval Model for Multilingual-Visual Common Space Learning. Chinese Journal of Computers， 1-17

林俊安，包翠竹，董建锋，杨勋，王勋. 2024. 面向多语言-视觉公共空间学习的多语言文本-视频跨模态检索模型. 计算机学报， 1-17 ［https：//link.cnki.net/urlid/11.1826.tp.20240613.1121.002https://link.cnki.net/urlid/11.1826.tp.20240613.1121.002］

Radford A， Narasimhan K， Salimans T， Sutskever I. 2018. Improving language understanding by generative pre-training. ［https：//www.mikecaptain.com/resources/pdf/GPT-1.pdfhttps://www.mikecaptain.com/resources/pdf/GPT-1.pdf］

Devlin J， Chang M W， Lee K， and Toutanova K. 2018. Bert： Pre-training of deep bidirectional transformers for language understanding. ［EB/OL］. ［2018-10-11］. https://arxiv.org/pdf/1810.04805.pdfhttps://arxiv.org/pdf/1810.04805.pdf

Liu X， Song M and Tao D. 2014. Semi-supervised coupled dictionary learning for person re-identification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 3550-3557 ［DOI： 10.1109/CVPR.2014.454http://dx.doi.org/10.1109/CVPR.2014.454］

Zheng W， Gong S， and Xiang T. 2013. Re-identification by Relative Distance Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence， 35（3）：653-668 ［DOI： 10.1109/TPAMI.2012.138http://dx.doi.org/10.1109/TPAMI.2012.138］

Chen Y， Zhu X， and Zheng W. 2017. Person re-identification by camera correlation-aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（2）， 392-408 ［DOI： 10.1109/TPAMI.2017.2666805http://dx.doi.org/10.1109/TPAMI.2017.2666805］

Feng Z， Lai J， and Xie X. 2018. Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification. IEEE Transactions on Image Processing， 29：579-590 ［DOI：10.1109/TIP.2019.2928126http://dx.doi.org/10.1109/TIP.2019.2928126］

Qian X， Fu Y， and Jiang Y. 2017. Multi-scale deep learning architectures for person re-identification. Proceedings of the IEEE International Conference on Computer Vision. Hawaii， USA： IEEE： 5399-5408 ［DOI：10.1109/ICCV.2017.577http://dx.doi.org/10.1109/ICCV.2017.577］

Li M， Li C， Guo J. 2022. Cluster-guided asymmetric contrastive learning for unsupervised person re-identification. IEEE Transactions on Image Processing， 31： 3606-3617 ［DOI： 10.1109/TIP.2022.3173163http://dx.doi.org/10.1109/TIP.2022.3173163］

Ni H， Li Y， and Gao L. 2023. Part-Aware Transformer for Generalizable Person Re-identification. Proceedings of the International Conference on Computer Vision. Paris， France： IEEE： 11246-11255 ［DOI： 10.1109/ICCV51070.2023.01036http://dx.doi.org/10.1109/ICCV51070.2023.01036］

Zahra A， Perwaiz N， Shahzad M. 2023. Person re-identification： A retrospective on domain specific open challenges and future trends. Pattern Recognition， 109669： 1-26 ［DOI：10.1016/j.patcog.2023.109669http://dx.doi.org/10.1016/j.patcog.2023.109669］

Zhang X， Luo H， Fan X， Xiang， W， Sun Y， Xiao， Q and Sun J. 2017. Alignedreid： Surpassing human-level performance in person re-identification. ［EB/OL］. ［2017-11-22］. https://arxiv.org/pdf/1711.08184.pdfhttps://arxiv.org/pdf/1711.08184.pdf

Feng， Z X， Zhu R， Wang Y J， Lai J H. 2020. Overview of person re-identification in unconstrained environments. Acta Scientiarum Naturalium Universitatis Sunyatseni， 59（3）：1-11

冯展祥，朱荣，王玉娟，赖剑煌. 2020. 非可控环境行人再识别综述. 中山大学学报（自然科学版）， 59（3）：1-11. ［DOI：CNKI：SUN：ZSDZ.0.2020-03-001http://dx.doi.org/CNKI：SUN：ZSDZ.0.2020-03-001］

Wu A C， Lin C Z and Zheng W S. 2022. Single-modality self-supervised information mining for cross-modality person re-identification. Journal of Image and Graphics， 27（10）：2843-2859

吴岸聪，林城梽，郑伟诗. 2022. 面向跨模态行人重识别的单模态自监督信息挖掘. 中国图象图形学报，27（10）：2843-2859［DOI：10. 11834 / jig. 211050http://dx.doi.org/10.11834/jig.211050］

Feng Z， Lai J， Xie X. 2019. Learning modality-specific representation for visible-infrared person re-identification. IEEE Transactions on Image Processing， 29： 579-590 ［DOI： 10.1109/TIP.2019.2928126http://dx.doi.org/10.1109/TIP.2019.2928126］

Wu A， Zheng W， Yu H. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE international conference on computer vision. Venice， Italy： IEEE： 5380-5389 ［DOI： 10.1109/ICCV.2017.575http://dx.doi.org/10.1109/ICCV.2017.575］

Yang M， Huang Z， Hu P， Li T， Lv J， and Peng X. 2022. Learning with twin noisy labels for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans， USA： IEEE： 14308-14317 ［DOI： 10.1109/CVPR52688.2022.01391http://dx.doi.org/10.1109/CVPR52688.2022.01391］

Zhang Q， Lai C， and Liu J. 2022. Fmcnet： Feature-level modality compensation for visible-infrared person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 7349-7358 ［DOI： 10.1109/CVPR52688.2022.00720http://dx.doi.org/10.1109/CVPR52688.2022.00720］

Fang X， Yang Y and Fu Y. 2023. Visible-infrared person re-identification via semantic alignment and affinity inference. Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 11270-11279 ［DOI： 10.1109/ICCV51070.2023.01035http://dx.doi.org/10.1109/ICCV51070.2023.01035］

Yu H， Cheng X， and Peng W. 2023. Modality unifying network for visible-infrared person re-identification. Proceedings of the International Conference on Computer Vision. Paris， France： IEEE： 11185-11195 ［DOI： 10.1109/ICCV51070.2023.01027http://dx.doi.org/10.1109/ICCV51070.2023.01027］

Zheng W， Li X， and Xiang T. 2015. Partial person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 4678-4686 ［DOI： 10.1109/ICCV.2015.531http://dx.doi.org/10.1109/ICCV.2015.531］

He L， Liang J， and Li H. 2018. Deep spatial feature reconstruction for partial person re-identification： Alignment-free approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7073-7082 ［DOI： 10.1109/CVPR.2018.00739http://dx.doi.org/10.1109/CVPR.2018.00739］

Zhuo J， Chen Z， and Lai J. 2018. Occluded person re-identification. In Proceedings of the IEEE International Conference on Multimedia and Expo. San Diego， USA： IEEE： 1-6 ［DOI： 10.1109/ICME.2018.8486568http://dx.doi.org/10.1109/ICME.2018.8486568］

Hou R， Ma B， Chang H. 2019. VRSTC： Occlusion-free video person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Angeles， USA： IEEE： 7183-7192 ［DOI： 10.1109/CVPR.2019.00735http://dx.doi.org/10.1109/CVPR.2019.00735］

Wang G， Yang S， Liu H， Wang Z， Yang Y， Wang S and Sun J. 2020. High-order information matters： Learning relation and topology for occluded person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Online： IEEE： 6449-6458 ［DOI： 10.1109/CVPR42600.2020.00648http://dx.doi.org/10.1109/CVPR42600.2020.00648］

Wang Z， Zhu F， Tang S， Zhao R， He L and Song J. 2022. Feature erasing and diffusion network for occluded person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans， USA： IEEE： 4754–4763 ［DOI： 10.1109/CVPR52688.2022.00471http://dx.doi.org/10.1109/CVPR52688.2022.00471］

Xu B， He L， Liang J and Sun Z. 2022. Learning feature recovery transformer for occluded person re-identification. IEEE Transactions on Image Processing， 31：4651–4662 ［DOI： 10.1109/TIP.2022.3186759http://dx.doi.org/10.1109/TIP.2022.3186759］

Huang M， Hou C， Yang Q and Wang Z. 2023. Reasoning and tuning： Graph attention network for occluded person re-identification. IEEE Transactions on Image Processing， 32：1568–1582 ［DOI： 10.1109/TIP.2023.3247159http://dx.doi.org/10.1109/TIP.2023.3247159］

Wang T， Liu M， Liu H， Li W， Ban M， Guo T， and Li， Y. 2024. Feature completion transformer for occluded person re-identification. IEEE Transactions on Multimedia， 1-14. ［DOI： 10.1109/TMM.2024.3379908http://dx.doi.org/10.1109/TMM.2024.3379908］

Hou R， Ma B， Chang H， Gu X， Shan S， and Chen X. 2022. Feature completion for occluded person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 44（9）：4894–4912 ［DOI： 10.1109/TPAMI.2021.3079910http://dx.doi.org/10.1109/TPAMI.2021.3079910］

Yang L L， Lan L， Sun D T， Teng X， Ben X Y and Shen X B. 2023. low-resolution pedestrian re-identification-relevant dataset and its benched method. Journal of Image and Graphics， 28（05）：1346-1359

杨露露，蓝龙，孙冬婷，滕霄，贲晛烨，沈肖波 . 2023. 低分辨率行人重识别数据集及其基准方法. 中国图象图形学报， 28（05）：1346-1359［DOI：10. 11834/jig. 221082http://dx.doi.org/10.11834/jig.221082］

Wang Z， Ye M， and Yang F. 2018. Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification. Proceedings of International Joint Conferences on Artificial Intelligence. Stockholm， Sweden： Curran Associates： 3891-3897 ［DOI：10.24963/ijcai.2018/541http://dx.doi.org/10.24963/ijcai.2018/541］

Li K， Ding Z， and Li S. 2018. Discriminative Semi-Coupled Projective Dictionary Learning for Low-Resolution Person Re-Identification， Proceedings of AAAI Conference on Artificial Intelligence. New Orleans， USA： AAAI 32：2331-2338 ［DOI： https：//doi.org/10.1609/aaai.v32i1.11908http://dx.doi.org/https：//doi.org/10.1609/aaai.v32i1.11908］

Cheng Z， Dong Q， and Gong S. 2020. Inter-task association critic for cross-resolution person re-identification. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Online： IEEE： 2605-2615 ［DOI： 10.1109/CVPR42600.2020.00268http://dx.doi.org/10.1109/CVPR42600.2020.00268］

Zhang G， Ge Y， and Dong Z. 2021. Deep high-resolution representation learning for cross-resolution person re-identification. IEEE Transactions on Image processing， 30： 8913-8925 ［DOI： 10.1109/TIP.2021.3120054http://dx.doi.org/10.1109/TIP.2021.3120054］

Wu L， Liu L， and Wang Y. 2023. Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification. IEEE Transactions on Image Processing， 32：4800-4811 ［DOI： 10.1109/TIP.2023.3305817http://dx.doi.org/10.1109/TIP.2023.3305817］

Zhang S， Zhang Q， Yang Y， Wei X， Wang P， Jiao B， and Zhang Yanning. 2020. Person re-identification in aerial imagery. IEEE Transactions on Multimedia， 23：281-291 ［DOI： 10.1109/TMM.2020.2977528http://dx.doi.org/10.1109/TMM.2020.2977528］

Li T， Liu J， Zhang W， Ni Y， Wang W， and Li Z. 2021. Uav-human： a large benchmark for human behavior understanding with unmanned aerial vehicles. International Conference on Computer Vision and Pattern Recognition. Online： IEEE： 16266-16275，［DOI： 10.1109/CVPR46437.2021.01600http://dx.doi.org/10.1109/CVPR46437.2021.01600］

Zhang S， Yang Q， Cheng D， and Zhang Yanning. 2023. Ground-to-Aerial Person Search： Benchmark Dataset and Approach. Proceedings of the 31st ACM International Conference on Multimedia. Ottawa， Canada： ACM： 789-799 ［DOI：10.1145/3581783.3612105http://dx.doi.org/10.1145/3581783.3612105］

Nguyen H， Nguyen K， and Sridharan S. 2024. AG-ReID.v2： Bridging Aerial and Ground Views for Person Re-identification. IEEE Transactions on Information Forensics and Security， 19：2896-2908 ［DOI： 10.1109/TIFS.2024.3353078http://dx.doi.org/10.1109/TIFS.2024.3353078］

Chen S， Ye M， and Du B. 2022. Rotation Invariant Transformer for Recognizing Object in UAVs. Proceedings of the 30th ACM International Conference on Multimedia. Lisbon， Portugal： ACM： 2565-2574 ［DOI：10.1145/3503161.3547799http://dx.doi.org/10.1145/3503161.3547799］

Huang M， Hou C， Zheng X. 2024. Multi-resolution feature perception network for UAV person re-identification. Multimedia Tools and Alications， 1-22 ［DOI：10.1007/s11042-023-17937-8http://dx.doi.org/10.1007/s11042-023-17937-8］

Yang Q， Wu A， and Zheng W. 2019. Person re-identification by contour sketch under moderate clothing change. IEEE transactions on pattern analysis and machine intelligence， 43（6）：2029-2046 ［DOI： 10.1109/TPAMI.2019.2960509http://dx.doi.org/10.1109/TPAMI.2019.2960509］

Chen W， Xu X， Jia J， Luo H， Wang Y， Wang F and Sun X. 2023. Beyond Appearance： A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver， Canada： IEEE： 15050-15061 ［DOI： 10.1109/CVPR52729.2023.01445http://dx.doi.org/10.1109/CVPR52729.2023.01445］

Dong N， Zhang L， Yan S， Tang H， and Tang J. 2023. Erasing， transforming， and noising defense network for occluded person re-identification. IEEE Transactions on Circuits and Systems for Video Technology， 34（6）：4458-4472 ［DOI： 10.1109/TCSVT.2023.3339167http://dx.doi.org/10.1109/TCSVT.2023.3339167］

Ren K， and Zhang L. 2024. Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 393-402 ［https：//arxiv.org/pdf/2403.11708https://arxiv.org/pdf/2403.11708］

Yang Z， Lin M， Zhong X， Wu Y， and Wang Z. 2023. Good is bad： Causality inspired cloth-debiasing for cloth-changing person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver， Canada： IEEE： 1472-1481 ［DOI： 10.1109/CVPR52729.2023.00148http://dx.doi.org/10.1109/CVPR52729.2023.00148］

He K， Zhang X， Ren S， and Sun J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A and Polosukhin I. 2017. Attention is all you need. Advances in neural information processing systems. Long Beach， USA： NIPS： 30：1-11 ［https：//arxiv.org/pdf/1706.03762.pdfhttps://arxiv.org/pdf/1706.03762.pdf］

Brown T， Mann B， Ryder N， Subbiah M， Kaplan J， Dhariwal P and Amodei D. 2020. Language models are few-shot learners. Advances in neural information processing systems. Vancouver， Canada： NIPS： 1877-1901 ［https：//arxiv.org/pdf/2005.14165v4https://arxiv.org/pdf/2005.14165v4］

Tian Y L， Wang Y T， Wang J G， Wang X， Wang F Y. 2022. Key Problems and Progress of Vision Transformers： The State of the Art and Prospects. Acta Automatica Sinica， 957-979.

田永林，王雨桐，王建功，王晓，王飞跃. 2022. 视觉 Transformer 研究的关键问题：现状及展望. 自动化学报， 957-979 ［DOI： 10.16383/j.aas.c220027http://dx.doi.org/10.16383/j.aas.c220027］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X， and Houlsby N. 2020. An image is worth 16x16 words： Transformers for image recognition at scale. ［EB/OL］. ［2020-10-22］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Liu Z， Lin Y， Cao Y， Hu H， Wei Y， Zhang Z and Guo B. 2021. Swin transformer： Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. Montreal， Canada： IEEE： 10012-10022 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

He S， Luo H， Wang P， Wang F， Li H， and Jiang W. 2021. Transreid： Transformer-based object re-identification. IEEE International Conference on Computer Vision. Montreal， Canada： IEEE： 15013-15022 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Lu J， Batra D， Parikh D， and Lee S. 2019. Vilbert： Pretraining task-agnostic vision linguistic representations for vision-and-language tasks. Advances in neural information processing systems. Vancouver， Canada： NIPS： 1-11 ［doi：10.5555/3454287.3454289http://dx.doi.org/10.5555/3454287.3454289］

Li L， Yatskar M， Yin D， Hsieh C， and Chang K. 2019. Visualbert： A simple and performant baseline for vision and language. ［EB/OL］. ［2019-08-09］. https://arxiv.org/pdf/1908.03557.pdfhttps://arxiv.org/pdf/1908.03557.pdf

Chen Y， Li L， Yu L， Ahmed F， Gan Z and Liu J. 2020. Uniter： Universal image-text representation learning. In European conference on computer vision. Online： Springer： 104-120 ［doi.org/10.1007/978-3-030-58577-8_7http://dx.doi.org/10.1007/978-3-030-58577-8_7］

Ramesh A， Pavlov M， Goh G， Gray S， Voss C， Radford A and Sutskever I. 2021. Zero-shot text-to-image generation. In International conference on machine learning. Vienna， Austria： ACM： 8821-8831 ［10.48550/arXiv.2102.12092］

Ramesh A， Dhariwal P， Nichol A， Chu C， and Chen M. 2022. Hierarchical text-conditional image generation with clip latents. ［EB/OL］. ［2022-04-13］. https://arxiv.org/pdf/2204.06125.pdfhttps://arxiv.org/pdf/2204.06125.pdf

Radford A， Kim J， Hallacy C， Ramesh A， Goh G， Agarwal S and Sutskever I. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. Vienna， Austria： ACM： 8748-8763 ［https：//arxiv.org/pdf/2103.00020v1https://arxiv.org/pdf/2103.00020v1］

Bao H， Dong L， Piao S， and Wei F. 2021. Beit： Bert pre-training of image transformers. ［EB/OL］. ［2021-06-15］. https://arxiv.org/pdf/2106.08254.pdfhttps://arxiv.org/pdf/2106.08254.pdf

He K， Chen X， Xie S， Li Y， Dollár P， and Girshick R. 2022. Masked autoencoders are scalable vision learners. IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 16000-16009 ［DOI： 10.1109/CVPR52688.2022.01553http://dx.doi.org/10.1109/CVPR52688.2022.01553］

Xie Z， Zhang Z， Cao Y， Lin Y， Bao J， Yao Z and Hu H. 2022. Simmim： A simple framework for masked image modeling. IEEE conference on computer vision and pattern recognition. New Orleans， USA： IEEE： 9653-9663 ［DOI： 10.1109/CVPR52688.2022.00943http://dx.doi.org/10.1109/CVPR52688.2022.00943］

Zhao X， Ding W， An Y， Du Y， Yu T， Li M and Wang J. 2023. Fast segment anything. ［EB/OL］. ［2023-06-21］. https://arxiv.org/pdf/2306.12156.pdfhttps://arxiv.org/pdf/2306.12156.pdf

He K， Fan H， Wu Y， and Girshick R. 2020. Momentum contrast for unsupervised visual representation learning. IEEE conference on computer vision and pattern recognition， Online： IEEE： 9729-9738 ［DOI： 10.1109/CVPR42600.2020.00975http://dx.doi.org/10.1109/CVPR42600.2020.00975］

Chen T， Kornblith S， Norouzi M， and Hinton G. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. Online： ACM： 1597-1607 ［https：//arxiv.org/pdf/2002.05709v2https://arxiv.org/pdf/2002.05709v2］

Gray D， Brennan S， and Tao H. 2007. Evaluating appearance models for recognition， reacquisition， and tracking. IEEE international workshop on performance evaluation for tracking and surveillance 3（5）：1-7 ［DOI： 10.1109/ICDSC.2007.4357519http://dx.doi.org/10.1109/ICDSC.2007.4357519］

Wang G， Zhang X， Lai J， Yu Z， and Lin L. 2020. Weakly supervised person re-id： Differentiable graphical learning and a new benchmark. IEEE Transactions on Neural Networks and Learning Systems， 32（5）：2142-2156 ［DOI： 10.1109/TNNLS.2020.2999517http://dx.doi.org/10.1109/TNNLS.2020.2999517］

Fu D， Chen D， Bao J， Yang H， Yuan L， Zhang L， and Chen D. 2021. Unsupervised pre-training for person re-identification. IEEE conference on computer vision and pattern recognition. Online： IEEE： 14750-14759 ［DOI： 10.1109/CVPR46437.2021.01451http://dx.doi.org/10.1109/CVPR46437.2021.01451］

Fu D， Yang H， Bao J， Yuan L， and Chen D. 2022. Large-scale pre-training for person re-identification with noisy labels. IEEE conference on computer vision and pattern recognition. New Orleans， USA： IEEE： 2476-2486 ［DOI： 10.1109/CVPR52688.2022.00251http://dx.doi.org/10.1109/CVPR52688.2022.00251］

Luo H， Wang P， Xu Y， Ding F， Zhou Y， Wang F， and Jin R. 2021. Self-supervised pre-training for transformer-based person re-identification. ［EB/OL］. ［2021-11-23］. https://arxiv.org/pdf/2111.12084.pdfhttps://arxiv.org/pdf/2111.12084.pdf

Wan L， Jing Q， Sun Z， Zhang C， Li Z， and Chen Y. 2023. Self-supervised modality-aware multiple granularity pre-training for RGB-infrared person re-identification. IEEE Transactions on Information Forensics and Security， 18：3044-3057 ［DOI： 10.1109/TIFS.2023.3273911http://dx.doi.org/10.1109/TIFS.2023.3273911］

Zhang X， Han R， and Feng W. 2024. From Synthetic to Real： Unveiling the Power of Synthetic Data for Video Person Re-ID. ［EB/OL］. ［2024-02-03］. https://arxiv.org/pdf/2402.02108.pdfhttps://arxiv.org/pdf/2402.02108.pdf

Ye Z， Hong C， Zeng Z， and Zhuang W. 2022. Self-supervised person re-identification with channel-wise transformer. In IEEE International Conference on Big Data. Fuzhou， China： IEEE： 4210-4217 ［DOI： 10.1109/BigData55660.2022.10020632http://dx.doi.org/10.1109/BigData55660.2022.10020632］

Zhu K， Guo H， Yan T， Zhu Y， Wang J， and Tang M. 2022. Pass： Part-aware self-supervised pre-training for person re-identification. In European conference on computer vision. Tel Aviv， Israel： Springer： 198-214 ［ doi.org/10.1007/978-3-031-19781-9_12http://dx.doi.org/10.1007/978-3-031-19781-9_12］

Yang Z， Jin X， Zheng K， and Zhao F. 2022. Unleashing potential of unsupervised pre-training with intra-identity regularization for person re-identification. IEEE Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 14298-14307 ［DOI： 10.1109/CVPR52688.2022.01390http://dx.doi.org/10.1109/CVPR52688.2022.01390］

Yang E， Li C， Liu S， Liu Y， Zhao S， and Huang N. 2022. Self-supervised pre-training with learnable tokenizers for person re-identification in railway stations. IEEE International Conference on Signal Processing. Xian， China： IEEE： 325-330 ［DOI： 10.1109/ICSP56322.2022.9965305http://dx.doi.org/10.1109/ICSP56322.2022.9965305］

Huang S， Zhou Y， Kathirvel R， Chellaa R， and Lau C. 2023. Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification. ［EB/OL］. ［2023-11-27］. https://arxiv.org/pdf/2311.17074.pdfhttps://arxiv.org/pdf/2311.17074.pdf

Zhou K， Yang J， Loy C， and Liu Z. 2022. Learning to prompt for vision-language models. International Journal of Computer Vision， 130：2337-2348 ［doi.org/10.1007/s11263-022-01653-1http://dx.doi.org/10.1007/s11263-022-01653-1］

Li S， Sun L， and Li Q. 2023. CLIP-ReID： exploiting vision-language model for image re-identification without concrete text labels. In AAAI Conference on Artificial Intelligence. Washington， USA： 1405-1413 ［doi.org/10.1609/aaai.v37i1.25225http://dx.doi.org/10.1609/aaai.v37i1.25225］

Yan S， Dong N， Zhang L， and Tang J. 2023. Clip-driven fine-grained text-image person re-identification. IEEE Transactions on Image Processing， 32：6032-6046 ［DOI： 10.1109/TIP.2023.3327924http://dx.doi.org/10.1109/TIP.2023.3327924］

Yu X， Dong N， Zhu L， Peng H， and Tao D. 2024. CLIP-Driven Semantic Discovery Network for Visible-Infrared Person Re-Identification. ［EB/OL］. ［2024-01-11］. https://arxiv.org/pdf/2401.05806.pdfhttps://arxiv.org/pdf/2401.05806.pdf

Li W， Tan L， Dai P， and Zhang Y. 2024. Prompt Decoupling for Text-to-Image Person Re-identification. ［EB/OL］. ［2024-01-04］. https://arxiv.org/pdf/2401.02173.pdfhttps://arxiv.org/pdf/2401.02173.pdf

Yu C， Liu X， Wang Y， Zhang P， and Lu H. 2024. TF-CLIP： Learning text-free CLIP for video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver， Canada： 6764-6772 ［DOI： https：//doi.org/10.1609/aaai.v38i7.28500http://dx.doi.org/https：//doi.org/10.1609/aaai.v38i7.28500］

Yang S， and Zhang Y. 2024. MLLMReID： Multimodal Large Language Model-based Person Re-identification. ［EB/OL］. ［2024-01-24］. https://arxiv.org/pdf/2401.13201.pdfhttps://arxiv.org/pdf/2401.13201.pdf

Zhai Y， Zeng Y， Huang Z， Qin Z， and Cao D. 2024， March. Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification. Proceedings of AAAI Conference on Artificial Intelligence. Vancouver， Canada： 6979-6987 ［DOI： https：//doi.org/10.1609/aaai.v38i7.28524http://dx.doi.org/https：//doi.org/10.1609/aaai.v38i7.28524］

Chen Z， Zhang Z， Tan X， Qu Y， and Xie Y. 2023. Unveiling the power of clip in unsupervised visible-infrared person re-identification. ACM International Conference on Multimedia. Vancouver， Canada： ACM： 3667-3675 ［doi.org/10.1145/3581783.3612050http://dx.doi.org/10.1145/3581783.3612050］

He W， Deng Y， Tang S， Chen Q， Wang Y and Yan Y. 2024. Instruct-ReID： A Multi-purpose Person Re-identification Task with Instructions. IEEE Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 17521-17531 ［https：//openaccess.thecvf.com/content/CVPR2024/papers/He_Instruct-ReID_A_Multi-purpose_Person_Re-identification_Task_with_Instructions_CVPR_2024_paper.pdfhttps://openaccess.thecvf.com/content/CVPR2024/papers/He_Instruct-ReID_A_Multi-purpose_Person_Re-identification_Task_with_Instructions_CVPR_2024_paper.pdf］

Zheng W， Yan J， and Peng Y. 2024. A Versatile Framework for Multi-Scene Person Re-Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1-18 ［DOI： 10.1109/TPAMI.2024.3381184http://dx.doi.org/10.1109/TPAMI.2024.3381184］

Li H， Ye M， Zhang M， and Du B. 2024. All in One Framework for Multimodal Re-identification in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 17459-17469 ［https：//openaccess.thecvf.com/content/ CVPR2024/papers/Li_All_in_One_Framework_for_Multimodal_Re-identification_in_the_Wild_CVPR_2024_paper.pdfhttps://openaccess.thecvf.com/content/CVPR2024/papers/Li_All_in_One_Framework_for_Multimodal_Re-identification_in_the_Wild_CVPR_2024_paper.pdf］

Alert me when the article has been cited

提交

Weakly supervised semantic segmentation based on deep learning

Survey of digital face rendering and appearance recovery methods

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems

Review of various vessels and airway segmentation in medical imaging

A review of adversarial examples for optical character recognition