非局部注意力双分支网络的跨模态赤足足迹检索

鲍文霞; 茅丽丽; 王年; 唐俊; 杨先军; 张艳

doi:10.11834/jig.200806

图像分析和识别 | 浏览量 : 0 下载量: 74 CSCD: 1

PDF
导出
分享
收藏
专辑

非局部注意力双分支网络的跨模态赤足足迹检索
Non-local attention dual-branch network based cross-modal barefoot footprint retrieval
2022年27卷第7期页码：2199-2213
收稿日期：2020-12-31，

修回日期：2021-04-14，

录用日期：2021-4-21，

纸质出版日期：2022-07-16
DOI： 10.11834/jig.200806
稿件说明：

移动端阅览

鲍文霞, 茅丽丽, 王年, 唐俊, 杨先军, 张艳. 非局部注意力双分支网络的跨模态赤足足迹检索[J]. 中国图象图形学报, 2022,27(7):2199-2213. DOI： 10.11834/jig.200806.

Wenxia Bao, Lili Mao, Nian Wang, Jun Tang, Xianjun Yang, Yan Zhang. Non-local attention dual-branch network based cross-modal barefoot footprint retrieval[J]. Journal of image and graphics, 2022, 27(7): 2199-2213. DOI： 10.11834/jig.200806.

摘要

目的

针对目前足迹检索中存在的采集设备种类多样化、有效的足迹特征难以提取等问题，本文以赤足足迹图像为研究对象，提出一种基于非局部（non-local）注意力双分支网络的跨模态赤足足迹检索算法。

方法

该网络由特征提取、特征嵌入以及双约束损失模块构成，其中特征提取模块采用双分支结构，各分支均以ResNet50作为基础网络分别提取光学和压力赤足图像的有效特征；同时在特征嵌入模块中通过参数共享学习一个多模态的共享空间，并引入非局部注意力机制快速捕获长范围依赖，获得更大感受野，专注足迹图像整体压力分布，在增强每个模态有用特征的同时突出了跨模态之间的共性特征；为了增大赤足足迹图像类间特征差异和减小类内特征差异，利用交叉熵损失

（cross-entropy loss）和三元组损失

TRI

（triplet loss）对整个网络进行约束，以更好地学习跨模态共享特征，减小模态间的差异。

结果

本文将采集的138人的光学赤足图像和压力赤足图像作为实验数据集，并将本文算法与细粒度跨模态检索方法FGC（fine-grained cross-model）和跨模态行人重识别方法HC（hetero-center）进行了对比实验，本文算法在光学到压力检索模式下的mAP（mean average precision）值和rank1值分别为83.63%和98.29%，在压力到光学检索模式下的mAP值和rank1值分别为84.27%和94.71%，两种检索模式下的mAP均值和rank1均值分别为83.95%和96.5%，相较于FGC分别提高了40.01%和36.50%，相较于HC分别提高了26.07%和19.32%。同时本文算法在non-local注意力机制、损失函数、特征嵌入模块后采用的池化方式等方面进行了对比分析，其结果证实了本文算法的有效性。

结论

本文提出的跨模态赤足足迹检索算法取得了较高的精度，为现场足迹比对、鉴定等应用提供了研究基础。

Abstract

Objective

Footprints are the highest rate of material evidence left and extracted from crime scene in general. Footprint retrieval and comparison plays an important role in criminal investigation. Footprint features are identified via the foot shape and bone structure of the person involved and have its features of specificity and stability. Meanwhile

footprints can reveal their essential behavior in the context of the physiological and behavioral characteristics. It is related to the biological features like height

body shape

gender

age and walking habits. Medical research results illustrates that footprint pressure information of each person is unique. It is challenged to improve the rate of discovery

extraction and utilization of footprints in criminal investigation. The retrieval of footprint image is of great significance

which will provide theoretical basis and technical support for footprint comparison and identification. Footprint images have different modes due to the diverse scenarios and tools of extraction. The global information of cross-modal barefoot images is unique

which can realize retrieval-oriented. The retrieval orientation retrieves the corresponding image of cross-modes. The traditional cross-modal retrieval methods are mainly in the context of subspace method and objective model method. These retrieval methods are difficult to obtain distinguishable features. The deep learning based retrieval methods construct multi-modal public space via convolutional neural network (CNN). The high-level semantic features of image can be captured in terms of iterative optimization of network parameters

to lower the multi-modal heterogeneity.

Method

A cross-modal barefoot footprint retrieval algorithm based on non-local attention two-branch network is demonstrated to resolve the issue of intra-class wide distance and inter-class narrow distance in fine-grained images. The collected barefoot footprint images involve optical mode and pressure mode. The median filter is applied to remove noises for all images

and the data augmentation method is used to expand the footprint images of each mode. In the feature extraction module

the pre-trained ResNet50 is used as basic network to extract the inherent features of each mode. In the feature embedding module

parameter sharing is realized by splicing feature vectors

and a multi-modal sharing space is constructed. All the residual blocks in the Layer2 and Layer3 of the ResNet50 use a non-local attention mechanism to capture long-range dependence

obtain a large receptive field

and highlight common features quickly. Simultaneously

cross-entropy loss and triplet loss are used to better learn multi-modal sharing space in order to reduce intra-class differences and increase inter-class differences of features. Our research tool is equipped with two NVIDIA 2070TI graphics CARDS

and the network is built in PyTorch. The size of the barefoot footprint images is 224×224 pixels. The stochastic gradient descent (SGD) optimizer is used for training. The number of iterations is 81

and the initial learning rate is 0.01. The trained network is validated by using the validation set

and the mean average precision (mAP) and rank values are obtained. In addition

the optimal model is saved in accordance with the highest rank1 value. The backup model is based on the test set

and the data of the final experimental results are recorded and saved.

Result

A cross-modal retrieval dataset is collected and constructed through a 138 person sample. Our comparative experiments are carried out to verify the effect of non-local attention mechanism in related to the retrieval efficiency

multiple loss functions and different pooling methods based on feature embedding modules. Our illustrated algorithm is compared to fine-grained cross-modal retrieval derived fine-grained cross-model (FGC) method and the RGB-infrared cross-modal person re-identification based hetero-center (HC) method. The number of people in the training set

verification set and test set is 82

28 and 28

respectively

including 16 400 images

5 600 images and 5 600 images each. The ratio of query images and retrieval images in the verification set and test set is 1:2. The evaluation indexes of the experiment are mAP mean (mAP_Avg) and rank1 mean (rank1_Avg) of two retrieval modes. Our analysis demonstrates that the algorithm illustrated has a higher precision

and the mAP_Avg and rank1_Avg are 83.95% and 96.5%

respectively. Compared with FGC and HC

the evaluation indexes of the proposed algorithm is 40.01% and 36.50% (higher than FGC)

and 26.07% and 19.32% (higher than HC).

Conclusion

A cross-modal barefoot footprint retrieval algorithm is facilitated based on a non-local attention dual-branch network through the integration of non-local attention mechanism and double constraint loss. Our algorithm considers the uniqueness and correlation of in-modal and inter-modal features

and improves the performance of cross-modal barefoot footprint retrieval further

which can provide theoretical basis and technical support for footprint comparison and identification.

关键词

Keywords

references

Bao W X, Wang Y F, Wang N and Tang J. 2020a. Identification algorithm of optical footprint image based on metric learning kernel function. Journal of Huazhong University of Science and Technology (Natural Science Edition), 48(11): 11-16

鲍文霞, 王云飞, 王年, 唐俊. 2020a. 基于度量学习核函数的光学足迹图像识别算法. 华中科技大学学报(自然科学版), 48(11): 11-16 [DOI: 10.13245/j.hust.201103]

Bao W X, Qu J J, Wang N, Tang J and Lu X L. 2020b. Force-tactile footprint recognition based on spatial aggregation weighted convolutional neural network. Journal of Southeast University (Natural Science Edition), 50(5): 959-964

鲍文霞, 瞿金杰, 王年, 唐俊, 鲁玺龙. 2020b. 基于空间聚合加权卷积神经网络的力触觉足迹识别. 东南大学学报(自然科学版), 50(5): 959-964 [DOI: 10.3969/j.issn.1001-0505.2020.05.023]

Cao Y, Long M S, Wang J M and Zhu H. 2016. Correlation autoencoder hashing for supervised cross-modal search//Proceedings of 2016 ACM on International Conference on Multimedia Retrieval. New York, USA: ACM: 197-204[ DOI: 10.1145/2911996.2912000 http://dx.doi.org/10.1145/2911996.2912000 ]

Dai Z Z, Chen M Q, Gu X D, Zhu S Y and Tan P. 2019. Batch DropBlock network for person re-identification and beyond//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3690-3700[ DOI: 10.1109/ICCV.2019.00379 http://dx.doi.org/10.1109/ICCV.2019.00379 ]

Gurney J K, Kersting U G and Rosenbaum D. 2008. Between-day reliability of repeated plantar pressure distribution measurements in a normal population. Gait and Posture, 27(4): 706-709[DOI: 10.1016/j.gaitpost.2007.07.002]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

He X T, Peng Y X and Xie L. 2019. A new benchmark and approach for fine-grained cross-media retrieval//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: Association for Computing Machinery: 1740-1748[ DOI: 10.1145/3343031.3350974 http://dx.doi.org/10.1145/3343031.3350974 ]

Heydarzadeh M, Birjandtalab J, Pouyan M B, Nourani M and Ostadabbas S. 2017. Gaits analysis using pressure image for subject identification//Proceedings of 2017 IEEE EMBS International Conference on Biomedical and Health Informatics. Orland, USA: IEEE: 333-336[ DOI: 10.1109/BHI.2017.7897273 http://dx.doi.org/10.1109/BHI.2017.7897273 ]

Khokher R, Singh R C and Kumar R. 2015. Footprint recognition with principal component analysis and independent component analysis. Macromolecular Symposia, 347(1): 16-26[DOI: 10.1002/masy.201400045]

Kulkarni P S and Kulkarni V B. 2015. Human footprint classification using image parameters//Proceedings of 2015 International Conference on Pervasive Computing. Pune, India: IEEE: 1-5[ DOI: 10.1109/PERVASIVE.2015.7087011 http://dx.doi.org/10.1109/PERVASIVE.2015.7087011 ]

Liang J, He R, Sun Z N and Tan T N. 2016. Group-invariant cross-modal subspace learning//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, USA: AAAI Press: 1739-1745

Liu H J, Cheng J, Wang W, Su Y Z and Bai H W. 2020. Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing, 398: 11-19[DOI: 10.1016/j.neucom.2020.01.089]

Nirenberg M S, Ansert E, Krishan K and Kanchan T. 2019. Two-dimensional metric comparison between dynamic bare and sock-clad footprints for its forensic implications——a pilot study. Science and Justice, 59(1): 46-51[DOI: 10.1016/j.scijus.2018.09.001]

Osisanwo F Y, Adetunmbi A O andÁlese B K. 2014. Barefoot morphology: a person unique feature for forensic identification//Proceedings of the 9th International Conference for Internet Technology and Secured Transactions. London, UK: IEEE: 356-359[ DOI: 10.1109/ICITST.2014.7038837 http://dx.doi.org/10.1109/ICITST.2014.7038837 ]

Radenović F, Tolias G and Chum O. 2019. Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7): 1655-1668[DOI: 10.1109/TPAMI.2018.2846566]

Wang C, Yang H J and Meinel C. 2015. Deep semantic mapping for cross-modal retrieval//Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE: 234-241[ DOI: 10.1109/ICTAI.2015.45 http://dx.doi.org/10.1109/ICTAI.2015.45 ]

Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[ DOI: 10.1109/CVPR.2018.00813 http://dx.doi.org/10.1109/CVPR.2018.00813 ]

Wang Z X, Wang Z, Zheng Y Q, Chuang Y Y and Satoh S. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 618-626[ DOI: 10.1109/CVPR.2019.00071 http://dx.doi.org/10.1109/CVPR.2019.00071 ]

Wu A C, Zheng W S, Yu H X, Gong S G and Lai J H. 2017. RGB-Infrared cross-modality person re-identification//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5390-5399[ DOI: 10.1109/ICCV.2017.575 http://dx.doi.org/10.1109/ICCV.2017.575 ]

Xue Y L and Yue J. 2012. Effect of perpetrators' mentality on footprints. Journal of Fujian Police College, 26(4): 55-60

薛亚龙, 岳佳. 2012. 论案犯心态对足迹反映的影响. 福建警察学院学报, 26(4): 55-60

Ye M, Wang Z, Lan X Y and Yuen P C. 2018a. Visible thermal person re-identification via dual-constrained top-ranking//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI: 1092-1099[ DOI: 10.24963/ijcai.2018/152 http://dx.doi.org/10.24963/ijcai.2018/152 ]

Ye M, Lan X Y, Li J W and Yuen P C. 2018b. Hierarchical discriminative learning for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1): 7501-7508

Zhao B W, Zhang L F and Pan Z F. 2020. Comparison of image filtering methods based on OpenCV. China Computer&Communication, 32(15): 78-80

赵博文, 张力夫, 潘在峰. 2020. 基于OpenCV的图像滤波方法比较. 信息与电脑, 32(15): 78-80

Zhao Y B, Lin J W, Xuan Q and Xi X G. 2019. HPILN: a feature learning framework for cross-modality person re-identification. IET Image Processing, 13(14): 2897-2904[DOI: 10.1049/iet-ipr.2019.0699]

Zheng Y, Zhang Y J and Larochelle H. 2014. Topic modeling of multimodal data: an autoregressive approach//Procedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1370-1377[ DOI: 10.1109/CVPR.2014.178 http://dx.doi.org/10.1109/CVPR.2014.178 ]

Zhu M, Wang T S, Wang N, Tang J and Lu X L. 2020. Footprint pressure image retrieval algorithm based on multi-scale self-attention convolution. Pattern Recognition and Artificial Intelligence, 32(12): 1097-1103

朱明, 汪桐生, 王年, 唐俊, 鲁玺龙. 2020. 基于多尺度自注意卷积的足迹压力图像检索算法. 模式识别与人工智能, 33(12): 1097-1103 [DOI: 10.16451/j.cnki.issn1003-6059.202012004]

Zhu Y X, Yang Z, Wang L, Zhao S, Hu X and Tao D P. 2019. Hetero-center loss for cross-modality person re-identification. Neurocomputing, 386: 97-109[DOI:10.1016/j.neucom.2019.12.100]

文章被引用时，请邮件提醒。

提交

结合扩张卷积与多尺度融合的实时时空动作检测

深度学习的跨视角地理定位方法综述

结合Transformer与非对称学习策略的图像检索

基于Gaussian-Hermite矩的旋转运动模糊不变量

双分支特征融合网络的步态识别算法