A summary on group re-identification
- Vol. 28, Issue 5, Pages: 1225-1241(2023)
Published: 16 May 2023
DOI: 10.11834/jig.220697
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 May 2023 ,
移动端阅览
张权, 赖剑煌, 谢晓华, 陈泓栩. 2023. 小股人群重识别研究进展. 中国图象图形学报, 28(05):1225-1241
Zhang Quan, Lai Jianhuang, Xie Xiaohua, Chen Hongxu. 2023. A summary on group re-identification. Journal of Image and Graphics, 28(05):1225-1241
小股人群重识别旨在将非重叠视域的摄像头网络下具有相同成员的群组图像进行正确的关联。小股人群重识别是传统行人重识别任务的一个重要拓展,在安防监控场景下有着重要的研究意义和应用前景。小股人群重识别所面临的独特挑战在于如何针对群内成员的数量变化和布局变化进行建模,并提取稳定、鲁棒的特征表达。近年来,小股人群重识别引发了研究人员的广泛关注,并获得了快速的发展。本文对小股人群重识别技术的研究进展进行了全面的梳理回顾。首先简要介绍本领域的研究背景,对基本概念、数据集和相关技术进行了简要总结。在此基础上,对多种小股人群重识别算法进行了详细的介绍,并在多个数据集上对前沿算法进行性能对比。最后,对该任务进行展望。整体而言,与行人重识别相比,小股人群重识别的现有方法在具体场景下的特定挑战性能表现欠佳,还需要从数据收集和方法设计两方面进一步探讨。此外,现有的小股人群重识别研究与其他视觉任务的关联性不够紧密,如何协同多任务作业以解决更多业界需求、加速产业落地,需要学术界和工业界共同思考和推动。
Pedestrians-oriented group re-identification (GReID) analysis is focused on non-overlapped and multi-viewed small groups. To extract stable and robust feature representations, the challenge issue of GReID is to model the temporal changes and intra-group pedestrians. Our summary is reviewed on the growth of GReID critically. First, we review its research domain in related to its basic concepts, technologies and datasets in relevance. To optimize the surveillance in public security, the GReID can monitor and prevent group-based crimes accurately like women and children-oriented kidnapping and trafficking. Due to pedestrians-targeted are severely occluded or even disappeared, it can leverage the appearance features of pedestrians’ partners as additional prior information for recognition. Specifically, GReID-based groups are composed of 2 to 8 members. First, the same group can be identified when the identified intersection-over-union(IoU) ratio of member is greater than 60% in the two group images. Then, a variety of GReID algorithms are introduced and tested in detail. The existing works can be categorized from three perspectives: 1) data, 2) method, and 3) label. For data types, the existing methods can be segmented into: real image-based, synthetic images-based, and real video-based methods. The real images-based method is basically focused on the datasets collected from real surveillance scenarios, such as CUHK-SYSU Group (CSG), RoadGroup, iLIDS-MCTS, and etc. These datasets can be used to collect several group images from different camera views of different groups and provide the elements of location information and identification information of member. This supervision information can be used to design discriminative group feature representations. However, it is still more challenging to collect and label the real group datasets than the traditional pedestrian re-identification datasets because the consistent group identity is required to be judged between group images, including member variations and layout variations. The following datasets are proposed based on 3D synthetic images. This type of datasets can generate mass group images with high-quality labels efficiency and effectively. These methods can be used to improve the performances of the model in real datasets through massive synthetic data. The video-based datasets can provide several consecutive frames for each group from the surveillance videos. Researchers can extract the group features according to the potential patio-temporal or intra-group relationships. They can be mainly divided into: traditional methods and deep learning methods. The former one is to design group descriptors and extract group features derived of human experience. However, due to the high dependence on the prior knowledge of expertise, it is unable to describe and generalize all possible situations for group images. The model can construct the representations of group images automatically because the emerging deep learning based methods is beneficial for a large number of data samples, and the discrimination and robustness of the deep models have been significantly improved. Deep learning-based methods can be divided into 1) the feature learning-based, 2) metric learning based, and 3) generative adversarial network (GAN) based. The deep feature learning based methods aim to design a discriminative network structure or a discriminative feature learning strategy. The features-extracted can reflect the group identification of the input images accurately, and it can be robust enough to suppress occlusion, illumination, number and layout variations of intra-group members. Metric learning based methods can be focused on a similarity criterion evaluation between two groups of images. To get high similarity under the designed measurement criteria, even two group images from the same group class have great differences. To optimize small size of the dataset, GAN-based method attempts to expand the dataset scale of the GReID task by style transfer of samples from other related pedestrian re-identification datasets. For its label, the existing methods can be categorized into: supervised and unsupervised. Supervised learning based methods tend to be more competitive because the group labels or the member labels are participated in the entire training process. It often can learn the similarity only for the local area of the group images because labels are not be provided in the unsupervised learning, and cluster methods can be designed to extract the feature representations of the same group class. To sum up,1) the specific scenarios based GReID is required to be developed from the aspects of data collection and method design further; 2) GReID is still not interrelated to other related visual tasks mutually. Therefore, multiple tasks-collaborated are called to resolve more industry needs, and the implementation of the industry is required to be accelerated for the domain of academia and industry. Furthermore, the data privacy policy-relevant ethic issue needs to be utilized for virtual data and real data in the future.
小股人群重识别(GReID)行人重识别虚拟数据深度学习特征学习度量学习Transformer
group re-identification(GReID)pedestrian re-identificationsynthesis datadeep learningfeature learningmetric learningTransformer
Cai Y H, Takala V and Pietikainen M. 2010. Matching groups of people by covariance descriptor//Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Turkey: IEEE: 2744-2747 [DOI: 10.1109/ICPR.2010.672http://dx.doi.org/10.1109/ICPR.2010.672]
Chen L, Yang H, Xu Q L and Gao Z Y. 2021. Harmonious attention network for person re-identification via complementarity between groups and individuals. Neurocomputing, 453: 766-776 [DOI: 10.1016/j.neucom.2020.07.118http://dx.doi.org/10.1016/j.neucom.2020.07.118]
Efron B, Hastie T, Johnstone I and Tibshirani R. 2004. Least angle regression. The Annals of Statistics, 32(2): 407-499 [DOI: 10.1214/009053604000000067http://dx.doi.org/10.1214/009053604000000067]
Ess A, Leibe B and Van Gool L. 2007. Depth and appearance for mobile scene analysis//Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8 [DOI: 10.1109/ICCV.2007.4409092http://dx.doi.org/10.1109/ICCV.2007.4409092]
Gheissari N, Sebastian T B and Hartley R. 2006. Person reidentification using spatiotemporal appearance//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 1528-1535 [DOI: 10.1109/CVPR.2006.223http://dx.doi.org/10.1109/CVPR.2006.223]
Gou M R, Karanam S, Liu W Q, Camps O and Radke R J. 2017. DukeMTMC4ReID: a large-scale multi-camera person re-identification dataset//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1425-1434 [DOI: 10.1109/CVPRW.2017.185http://dx.doi.org/10.1109/CVPRW.2017.185]
Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 583-596 [DOI: 10.1109/TPAMI.2014.2345390http://dx.doi.org/10.1109/TPAMI.2014.2345390]
Hu P, Zheng H W and Zheng W S. 2021. Part relational mean model for group re-identification. IEEE Access, 9: 46265-46279 [DOI: 10.1109/ACCESS.2021.3065984http://dx.doi.org/10.1109/ACCESS.2021.3065984]
Huang Z L, Wang Z, Hu W, Lin C W and Satoh S. 2019a. DoT-GNN: domain-transferred graph neural network for group re-identification//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 1888-1896 [DOI: 10.1145/3343031.3351027http://dx.doi.org/10.1145/3343031.3351027]
Huang Z L, Wang Z, Hung T Y, Satoh S and Lin C W. 2019b. Group re-identification via transferred representation and adaptive fusion//Proceedings of 2019 IEEE International Conference on Multimedia Big Data. Singapore, Singapore: IEEE: 128-132 [DOI: 10.1109/BigMM.2019.00-34http://dx.doi.org/10.1109/BigMM.2019.00-34]
Huang Z L, Wang Z, Tsai C C, Satoh S and Lin C W. 2021. DotSCN: group re-identification via domain-transferred single and couple representation learning. IEEE Transactions on Circuits and Systems for Video Technology, 31(7): 2739-2750 [DOI: 10.1109/TCSVT.2020.3031303http://dx.doi.org/10.1109/TCSVT.2020.3031303]
Koperski M, Bak S and Carr P. 2017. Groups re-identification with temporal context//Proceedings of 2017 ACM on International Conference on Multimedia Retrieval. Bucharest, Romania: ACM: 209-217 [DOI: 10.1145/3078971.3078978http://dx.doi.org/10.1145/3078971.3078978]
Lin W Y, Li Y X, Xiao H, See J, Zou J N, Xiong H K, Wang J D and Mei T. 2021. Group reidentification with multigrained matching and integration. IEEE Transactions on Cybernetics, 51(3): 1478-1492 [DOI: 10.1109/TCYB.2019.2917713http://dx.doi.org/10.1109/TCYB.2019.2917713]
Lisanti G, Martinel N, Del Bimbo A and Foresti G. 2017. Group re-identification via unsupervised transfer of sparse features encoding//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2468-2477 [DOI: 10.1109/ICCV.2017.268http://dx.doi.org/10.1109/ICCV.2017.268]
Lisanti G, Martinel N, Micheloni C, Del Bimbo A and Foresti G L. 2019. From person to group re-identification via unsupervised transfer of sparse features. Image and Vision Computing, 83-84: 29-38 [DOI: 10.1016/j.imavis.2019.02.009http://dx.doi.org/10.1016/j.imavis.2019.02.009]
Liu X H, Yu L S and Lai J H. 2021. Group re-identification based on single feature attention learning network (SFALN)//Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision. Beijing, China: Springer: 554-563 [DOI: 10.1007/978-3-030-88004-0_45http://dx.doi.org/10.1007/978-3-030-88004-0_45]
Mei L, Lai J H, Feng Z X and Xie X H. 2020. From pedestrian to group retrieval via siamese network and correlation. Neurocomputing, 412: 447-460 [DOI: 10.1016/j.neucom.2020.06.055http://dx.doi.org/10.1016/j.neucom.2020.06.055]
Mei L, Lai J H, Feng Z X and Xie X H. 2021. Open-world group retrieval with ambiguity removal: a benchmark//Proceedings of the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE: 584-591 [DOI: 10.1109/ICPR48806.2021.9412734http://dx.doi.org/10.1109/ICPR48806.2021.9412734]
Salamon N Z, Junior J C J and Musse S R. 2015. A user-based framework for group re-identification in still images//2015 IEEE International Symposium on Multimedia. Miami, USA: IEEE: 315-318 [DOI: 10.1109/ISM.2015.41http://dx.doi.org/10.1109/ISM.2015.41]
Sun X X and Zheng L. 2019. Dissecting person re-identification from the viewpoint of viewpoint//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 608-617 [DOI: 10.1109/CVPR.2019.00070http://dx.doi.org/10.1109/CVPR.2019.00070]
Wang Y N, Liao S C and Shao L. 2020. Surpassing real-world source training data: random 3D characters for generalizable person re-identification//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM: 3422-3430 [DOI: 10.1145/3394171.3413815http://dx.doi.org/10.1145/3394171.3413815]
Xiao H, Lin W Y, Sheng B, Lu K, Yan J C, Wang J D, Ding E R, Zhang Y H and Xiong H K. 2018. Group re-identification: leveraging and integrating multi-grain information//Proceedings of the 26th ACM International Conference on Multimedia. Seoul, Korea (South): ACM: 192-200 [DOI: 10.1145/3240508.3240539http://dx.doi.org/10.1145/3240508.3240539]
Xiao T, Li S, Wang B C, Lin L and Wang X G. 2017. Joint detection and identification feature learning for person search//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3376-3385 [DOI: 10.1109/CVPR.2017.360http://dx.doi.org/10.1109/CVPR.2017.360]
Xu Q L, Yang H and Chen L. 2019a. Spatial-temporal fusion network with residual learning and attention mechanism: a benchmark for video-based group Re-ID//Proceedings of the 2nd Chinese Conference on Pattern Recognition and Computer Vision. Xi’an, China: Springer: 492-504 [DOI: 10.1007/978-3-030-31654-9_42http://dx.doi.org/10.1007/978-3-030-31654-9_42]
Xu Q L, Yang H, Chen L and Zhai G T. 2019b. Group re-identification with hybrid attention model and residual distance//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 1217-1221 [DOI: 10.1109/ICIP.2019.8803758http://dx.doi.org/10.1109/ICIP.2019.8803758]
Yan Y C, Qin J, Ni B B, Chen J X, Liu L, Zhu F, Zheng W S, Yang X K and Shao L. 2020. Learning multi-attention context graph for group-based re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6): 7001-7018 [DOI: 10.1109/TPAMI.2020.3032542http://dx.doi.org/10.1109/TPAMI.2020.3032542]
Zhang Q, Dang K H, Lai J H, Feng Z X and Xie X H. 2022a. Modeling 3D layout for group re-identification//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 7502-7510 [DOI: 10.1109/CVPR52688.2022.00736http://dx.doi.org/10.1109/CVPR52688.2022.00736]
Zhang Q, Lai J H, Feng Z X and Xie X H. 2022b. Uncertainty modeling with second-order transformer for group re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 3318-3325 [DOI: 10.1609/aaai.v36i3.20241http://dx.doi.org/10.1609/aaai.v36i3.20241]
Zhang T Y, Xie L X, Wei L H, Zhuang Z J, Zhang Y F, Li B and Tian Q. 2021. UnrealPerson: an adaptive pipeline towards costless person re-identification//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11501-11510 [DOI: 10.1109/CVPR46437.2021.01134http://dx.doi.org/10.1109/CVPR46437.2021.01134]
Zhao C R, Qi D, Dou S G, Tu Y P, Sun T L, Bai S, Jiang X Y, Bai X and Miao D Q. 2021. Key technology for intelligent video surveillance: a review of person re-identification. Scientia Sinica Informationis, 51(12): 1979-2015
赵才荣, 齐鼎, 窦曙光, 涂远鹏, 孙添力, 柏松, 蒋忻洋, 白翔, 苗夺谦. 2021. 智能视频监控关键技术: 行人再识别研究综述. 中国科学: 信息科学, 51(12): 1979-2015 [DOI: 10.1360/SSI-2021-0211http://dx.doi.org/10.1360/SSI-2021-0211]
Zheng W S, Gong S G and Xiang T. 2009. Associating groups of people//Proceedings of 2019 British Machine Vision Conference. London, UK: BMVA Press: #23 [DOI: 10.5244/c.23.23http://dx.doi.org/10.5244/c.23.23]
Zhu F, Chu Q and Yu N H. 2016. Consistent matching based on boosted salience channels for group re-identification//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, USA: IEEE: 4279-4283 [DOI: 10.1109/ICIP.2016.7533167http://dx.doi.org/10.1109/ICIP.2016.7533167]
Zhu J, Yang H, Lin W Y, Liu N, Wang J and Zhang W J. 2021. Group re-identification with group context graph neural networks. IEEE Transactions on Multimedia, 23: 2614-2626 [DOI: 10.1109/TMM.2020.3013531http://dx.doi.org/10.1109/TMM.2020.3013531]
相关文章
相关作者
相关机构