换装行人重识别研究进展

张鹏; 张晓林; 包永堂; 贲晛烨; 单彩峰

发布时间： 2023-05-16
摘要点击次数： 1687
全文下载次数： 828
DOI: 10.11834/jig.220702
2023 | Volume 28 | Number 5

换装行人重识别研究进展

张鹏¹, 张晓林², 包永堂¹, 贲晛烨³, 单彩峰⁴(1.山东科技大学计算机科学与工程学院, 青岛 266590;2.商汤智能科技有限公司, 深圳 518067;3.山东大学信息科学与工程学院, 青岛 266237;4.山东科技大学电气与自动化工程学院, 青岛 266590)

摘要

行人重识别旨在建立目标行人在多个无交叉覆盖监控区域间的身份联系,在智慧城市、司法侦查和监控安全等领域具有重要应用价值。传统行人重识别方法针对短时间跨度场景,依赖行人外观特征的稳定不变性,旨在克服光照差异、视角变化和物体遮挡等挑战。与之不同,换装行人重识别针对长时间跨度场景,除受限于上述挑战还面临换装带来的外观变化问题,是近几年的一个研究难点和热点。围绕换装行人重识别,本文从数据集和解决方法两个方面综述国内外研究进展,探讨面临的挑战和难点。首先,梳理并比较了当前针对换装行人重识别的数据集,从采集方式、行人及样本数量等方面分析其挑战性和面临的局限性。然后,在简单回顾换装行人重识别发展历史的基础上,将其归纳为基于非视觉传感器的方法和基于视觉相机的方法两类。针对基于非视觉传感器的方法,介绍了深度传感器、射频信号等在换装行人重识别中的应用。针对基于视觉相机的方法,详细阐述了基于显式特征设计与提取的方法、基于特征解耦的方法和基于隐式数据驱动自适应学习的方法。在此基础上,探讨了当前换装行人重识别面临的问题并展望未来的发展趋势,旨在为相关研究提供参考。

关键词

视频监控换装行人重识别深度学习特征学习与表示生物特征特征解耦数据驱动学习

Cloth-changing person re-identification: a summary

Zhang Peng¹, Zhang Xiaolin², Bao Yongtang¹, Ben Xianye³, Shan Caifeng⁴(1.College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China;2.SenseTime Group Inc., Shenzhen 518067, China;3.School of Information Science and Engineering, Shandong University, Qingdao 266237, China;4.College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China)

Abstract

Person re-identification（Re-ID）aims to build identity correspondence of the target pedestrian among multiple non-overlap monitoring areas，which has significant application value in the fields such as smart city，criminal investigation and forensics，and surveillance security. Conventional Re-ID methods are often focused on short-term scenarios，which aim to tackle some challenges in related to illumination difference，view-angle change and occlusion. In these methods， the target pedestrian of interest（TPoI）is assumed as unchangeable dressing status while he re-appears under the surveillance circustmances. Such methods are restricted by the homology of appearance across different cameras，such as the same color and texture of pedestrians’clothes. In contrast，cloth-changing person Re-ID aims at long-term scenarios， which determines that the TPoI re-appears after a long-time gap likes one week or more. In addition to the above challenges in classical person Re-ID，cloth-changing person Re-ID also suffers the difficulty of appearance difference caused by clothes changing. This makes it a research difficulty in recent years. Considering cloth-changing person Re-ID，this paper discusses its challenges and difficulties，and provides an indepth review on recent progress in terms of the analysis of datasets and methods. Based on the analysis，some potential research trends and solutions are proposed. First，we summary and compare the existing cloth-changing person Re-ID datasets in relevant to 1）RGBD-based pattern analysis and computer vision（PAVIS），BIWI，and IAS-Lab，2）radio frequency-based radio frequency re-identification dataset-campus（RRDCampus）and RRD-Home，3）RGB image-based Celeb-ReID，person Re-ID under moderate clothing change（PRCC）， long-term cloth-changing（LTCC），and DeepChange and 4）video-based train station dataset（TSD），Motion-ReID and CVID-ReID （cloth-varing video Re-ID），which can be oriented to their difficulties and limitations on the aspects of collecting methods，number of identities and images. Additionally，some popular person Re-ID evaluation metrics are summarized in the context of cumulative match characteristics（CMC），mean average precision（mAP）and mean inverse negative penalty（mINP）. Second，we summary the existing cloth-changing person Re-ID methods and segment them into two major categories in terms of data collection：1）non-visual sensor-based and 2）visual camera-based methods. Non-visual sensor based methods are used to alleviate the influence of clothes from the perspective of data collection manner. In this paper， non-visual sensors are configured into two aspects，i. e. ，RGBD sensor and radio frequency（RF）. The RGBD sensor is used to produce depth information，which can boost the human shape information and eliminate the effect of cloth color. However，the depth information is still influenced by clothes’contour. The RF-based method can be used for overcoming the weakness further. The wireless devices-derived RF signal emittion can penetrate cloths and reflect the shape information of human body. Unfortunately，the non-visual sensor based methods heavily rely on expensive snesors. It is hard to be applied to the existing surveillance systems. In contrast，visual camera based methods can be used to RGB monitoring cameras directly，and its problem can be tackled through cloth-invariant feature learning and representation from RGB images/ videos. These methods can be divided into three categories：1）explicit feature learning and extraction（EFLE），2）feature decoupling（FD），and 3）implicit data adaption（IDA）. The EFLE can extract cloth-invariant identity-relevant biometric features explicitly，such as face，gait，and body shape. And，these methods consist of two aspects，i. e. ，hand-crafted and learning-based. The hand-crafted methods can be used to design feature representation，e. g. ，body measurement and analysis. The learning-based methods guide deep neural network models to learn biometric features using some localization or regularization modules. The FD is used to decouple identity information and cloth-related appearance feature and produce pure identity information，e. g. ，CESD，DG-Net，IS-GAN，AFD-Net，etc. Differently，IDA adopts a data-driven manner，which can adapt intra-class diversity automatically using large volume of data with abundant intra-class variance， e. g. ，ReIDCaps，RCSANet. On the basis，the cons of current cloth-changing person Re-ID methods are analyzed，e. g. ， lack of large-scale and multi-view dataset，feature alignment problem，occlusion，weak feature discriminability and generalization problem. Aiming to these drawbacks，this paper further looks forward to six promising research directions as mentioned below：1）to construct large-scale video-based datasets and explore spatio-temporal features from video clips or contexts. It is supposed that video footages include rich gait information and provide multi-view body characteristics for 3D human reconstruction；2）to utilize 3D human reconstruction for learning view-invariant human geometric features from 3D space. The 3D body is assumed to be robust to shape deformation which highlights body structure information；3）to weaken the effect of clothes-related attributes with the help of pedestrian attribute analysis. It is beneficial for the extraction of semantic-level cues；4）to mine and integrate multiple features using multi-feature co-learning simultaneously，such as gaits，face and shape. These multi-modality features can yield Re-ID models to pay attention on different views of a walking human and thus help investigate more discriminative representation；5）to overcome the limitation of limited labelled data with unsupervised learning. Notably，the integration of generative models and constrastive learning can be used to supervise the feature learning through minimizing the difference between raw image and synthesized image；and 6）multi-task learning pipeline can be as another feasible solution. It combines multiple correlated tasks，such as pedestrian attribute analysis，action analysis and body reconstruction. This resembles to the idea of recently popular universal models that regularizes the stem model to learn more generalized representations.

Keywords