Current Issue Cover

张权,赖剑煌,谢晓华,陈泓栩(中山大学计算机学院, 广州 510006;中山大学计算机学院, 广州 510006;广州新华学院, 广州 510520;广东省信息安全技术重点实验室, 广州 510006;视频图像智能分析与应用技术公安部重点实验室, 广州 510006;中山大学计算机学院, 广州 510006;广东省信息安全技术重点实验室, 广州 510006)

摘 要
A summary on group re-identification

Zhang Quan,Lai Jianhuang,Xie Xiaohua,Chen Hongxu(School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China;School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China;Guangzhou Xinhua University, Guangzhou 510520, China;Guangdong Province Key Laboratory of Information Security Technology, Guangzhou 510006, China;Key Laboratory of Video and Image Intelligent Analysis and Application Technology, Ministry of Public Security, Guangzhou 510006, China;School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China;Guangdong Province Key Laboratory of Information Security Technology, Guangzhou 510006, China)

Pedestrians-oriented group re-identification(GReID)analysis is focused on non-overlapped and multi-viewed small groups. To extract stable and robust feature representations,the challenge issue of GReID is to model the temporal changes and intra-group pedestrians. Our summary is reviewed on the growth of GReID critically. First,we review its research domain in related to its basic concepts,technologies and datasets in relevance. To optimize the surveillance in public security,the GReID can monitor and prevent group-based crimes accurately like women and children-oriented kidnapping and trafficking. Due to pedestrians-targeted are severely occluded or even disappeared,it can leverage the appearance features of pedestrians’partners as additional prior information for recognition. Specifically,GReID-based groups are composed of 2 to 8 members. First,the same group can be identified when the identified intersection-over-union(IoU) ratio of member is greater than 60% in the two group images. Then,a variety of GReID algorithms are introduced and tested in detail. The existing works can be categorized from three perspectives:1)data,2)method,and 3)label. For data types,the existing methods can be segmented into:real image-based,synthetic images-based,and real video-based methods. The real images-based method is basically focused on the datasets collected from real surveillance scenarios, such as CUHK-SYSU Group(CSG),RoadGroup,iLIDS-MCTS,and etc. These datasets can be used to collect several group images from different camera views of different groups and provide the elements of location information and identification information of member. This supervision information can be used to design discriminative group feature representations. However,it is still more challenging to collect and label the real group datasets than the traditional pedestrian reidentification datasets because the consistent group identity is required to be judged between group images,including member variations and layout variations. The following datasets are proposed based on 3D synthetic images. This type of datasets can generate mass group images with high-quality labels efficiency and effectively. These methods can be used to improve the performances of the model in real datasets through massive synthetic data. The video-based datasets can provide several consecutive frames for each group from the surveillance videos. Researchers can extract the group features according to the potential patio-temporal or intra-group relationships. They can be mainly divided into:traditional methods and deep learning methods. The former one is to design group descriptors and extract group features derived of human experience. However,due to the high dependence on the prior knowledge of expertise,it is unable to describe and generalize all possible situations for group images. The model can construct the representations of group images automatically because the emerging deep learning based methods is beneficial for a large number of data samples,and the discrimination and robustness of the deep models have been significantly improved. Deep learning-based methods can be divided into 1)the feature learning-based,2)metric learning based,and 3)generative adversarial network(GAN)based. The deep feature learning based methods aim to design a discriminative network structure or a discriminative feature learning strategy. The features-extracted can reflect the group identification of the input images accurately,and it can be robust enough to suppress occlusion,illumination,number and layout variations of intra-group members. Metric learning based methods can be focused on a similarity criterion evaluation between two groups of images. To get high similarity under the designed measurement criteria,even two group images from the same group class have great differences. To optimize small size of the dataset,GAN-based method attempts to expand the dataset scale of the GReID task by style transfer of samples from other related pedestrian re-identification datasets. For its label,the existing methods can be categorized into:supervised and unsupervised. Supervised learning based methods tend to be more competitive because the group labels or the member labels are participated in the entire training process. It often can learn the similarity only for the local area of the group images because labels are not be provided in the unsupervised learning,and cluster methods can be designed to extract the feature representations of the same group class. To sum up, 1)the specific scenarios based GReID is required to be developed from the aspects of data collection and method design further;2)GReID is still not interrelated to other related visual tasks mutually. Therefore,multiple tasks-collaborated are called to resolve more industry needs,and the implementation of the industry is required to be accelerated for the domain of academia and industry. Furthermore,the data privacy policyrelevant ethic issue needs to be utilized for virtual data and real data in the future.