张鹏1, 张晓林2, 包永堂1, 贲晛烨3, 单彩峰4(1.山东科技大学计算机科学与工程学院, 青岛 266590;2.商汤智能科技有限公司, 深圳 518067;3.山东大学信息科学与工程学院, 青岛 266237;4.山东科技大学电气与自动化工程学院, 青岛 266590)
Cloth-changing person re-identification: a summary
Zhang Peng1, Zhang Xiaolin2, Bao Yongtang1, Ben Xianye3, Shan Caifeng4(1.College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China;2.SenseTime Group Inc., Shenzhen 518067, China;3.School of Information Science and Engineering, Shandong University, Qingdao 266237, China;4.College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China)
Person re-identification（Re-ID）aims to build identity correspondence of the target pedestrian among multiple non-overlap monitoring areas，which has significant application value in the fields such as smart city，criminal investigation and forensics，and surveillance security. Conventional Re-ID methods are often focused on short-term scenarios，which aim to tackle some challenges in related to illumination difference，view-angle change and occlusion. In these methods， the target pedestrian of interest（TPoI）is assumed as unchangeable dressing status while he re-appears under the surveillance circustmances. Such methods are restricted by the homology of appearance across different cameras，such as the same color and texture of pedestrians’clothes. In contrast，cloth-changing person Re-ID aims at long-term scenarios， which determines that the TPoI re-appears after a long-time gap likes one week or more. In addition to the above challenges in classical person Re-ID，cloth-changing person Re-ID also suffers the difficulty of appearance difference caused by clothes changing. This makes it a research difficulty in recent years. Considering cloth-changing person Re-ID，this paper discusses its challenges and difficulties，and provides an indepth review on recent progress in terms of the analysis of datasets and methods. Based on the analysis，some potential research trends and solutions are proposed. First，we summary and compare the existing cloth-changing person Re-ID datasets in relevant to 1）RGBD-based pattern analysis and computer vision（PAVIS），BIWI，and IAS-Lab，2）radio frequency-based radio frequency re-identification dataset-campus（RRDCampus）and RRD-Home，3）RGB image-based Celeb-ReID，person Re-ID under moderate clothing change（PRCC）， long-term cloth-changing（LTCC），and DeepChange and 4）video-based train station dataset（TSD），Motion-ReID and CVID-ReID （cloth-varing video Re-ID），which can be oriented to their difficulties and limitations on the aspects of collecting methods，number of identities and images. Additionally，some popular person Re-ID evaluation metrics are summarized in the context of cumulative match characteristics（CMC），mean average precision（mAP）and mean inverse negative penalty（mINP）. Second，we summary the existing cloth-changing person Re-ID methods and segment them into two major categories in terms of data collection：1）non-visual sensor-based and 2）visual camera-based methods. Non-visual sensor based methods are used to alleviate the influence of clothes from the perspective of data collection manner. In this paper， non-visual sensors are configured into two aspects，i. e. ，RGBD sensor and radio frequency（RF）. The RGBD sensor is used to produce depth information，which can boost the human shape information and eliminate the effect of cloth color. However，the depth information is still influenced by clothes’contour. The RF-based method can be used for overcoming the weakness further. The wireless devices-derived RF signal emittion can penetrate cloths and reflect the shape information of human body. Unfortunately，the non-visual sensor based methods heavily rely on expensive snesors. It is hard to be applied to the existing surveillance systems. In contrast，visual camera based methods can be used to RGB monitoring cameras directly，and its problem can be tackled through cloth-invariant feature learning and representation from RGB images/ videos. These methods can be divided into three categories：1）explicit feature learning and extraction（EFLE），2）feature decoupling（FD），and 3）implicit data adaption（IDA）. The EFLE can extract cloth-invariant identity-relevant biometric features explicitly，such as face，gait，and body shape. And，these methods consist of two aspects，i. e. ，hand-crafted and learning-based. The hand-crafted methods can be used to design feature representation，e. g. ，body measurement and analysis. The learning-based methods guide deep neural network models to learn biometric features using some localization or regularization modules. The FD is used to decouple identity information and cloth-related appearance feature and produce pure identity information，e. g. ，CESD，DG-Net，IS-GAN，AFD-Net，etc. Differently，IDA adopts a data-driven manner，which can adapt intra-class diversity automatically using large volume of data with abundant intra-class variance， e. g. ，ReIDCaps，RCSANet. On the basis，the cons of current cloth-changing person Re-ID methods are analyzed，e. g. ， lack of large-scale and multi-view dataset，feature alignment problem，occlusion，weak feature discriminability and generalization problem. Aiming to these drawbacks，this paper further looks forward to six promising research directions as mentioned below：1）to construct large-scale video-based datasets and explore spatio-temporal features from video clips or contexts. It is supposed that video footages include rich gait information and provide multi-view body characteristics for 3D human reconstruction；2）to utilize 3D human reconstruction for learning view-invariant human geometric features from 3D space. The 3D body is assumed to be robust to shape deformation which highlights body structure information；3）to weaken the effect of clothes-related attributes with the help of pedestrian attribute analysis. It is beneficial for the extraction of semantic-level cues；4）to mine and integrate multiple features using multi-feature co-learning simultaneously，such as gaits，face and shape. These multi-modality features can yield Re-ID models to pay attention on different views of a walking human and thus help investigate more discriminative representation；5）to overcome the limitation of limited labelled data with unsupervised learning. Notably，the integration of generative models and constrastive learning can be used to supervise the feature learning through minimizing the difference between raw image and synthesized image；and 6）multi-task learning pipeline can be as another feasible solution. It combines multiple correlated tasks，such as pedestrian attribute analysis，action analysis and body reconstruction. This resembles to the idea of recently popular universal models that regularizes the stem model to learn more generalized representations.