图神经网络与CNN融合的虹膜特征编码方法

孙金通; 沈文忠

doi:10.11834/jig.230688

图像分析和识别 | 浏览量 : 0 下载量: 5 CSCD: 0

PDF
导出
分享
收藏
专辑

图神经网络与CNN融合的虹膜特征编码方法
An iris feature-encoding method by fusion of graph neural networks and convolutional neural networks
2024年29卷第9期页码：2764-2779
纸质出版日期： 2024-09-16 ，
DOI： 10.11834/jig.230688
稿件说明：

移动端阅览

孙金通，沈文忠. 2024. 图神经网络与CNN融合的虹膜特征编码方法. 中国图象图形学报， 29(09):2764-2779

Sun Jintong， Shen Wenzhong. 2024. An iris feature-encoding method by fusion of graph neural networks and convolutional neural networks. Journal of Image and Graphics， 29(09):2764-2779
孙金通，沈文忠. 2024. 图神经网络与CNN融合的虹膜特征编码方法. 中国图象图形学报， 29(09):2764-2779 DOI： 10.11834/jig.230688.

Sun Jintong， Shen Wenzhong. 2024. An iris feature-encoding method by fusion of graph neural networks and convolutional neural networks. Journal of Image and Graphics， 29(09):2764-2779 DOI： 10.11834/jig.230688.

摘要

目的

更具可解释性的虹膜特征编码方法一直是虹膜识别中的一个关键问题，且低质量虹膜样本识别比较困难，图神经网络的发展为此类虹膜图像特征编码带来了新思路。本文提出了一种图神经网络与卷积神经网络融合的虹膜特征编码网络IrisFusionNet。

方法

在骨干网络前添加一个像素级增强模块以消除输入图像不确定性，并使用双分支骨干网络提取虹膜微观与宏观融合特征。训练阶段使用一个独特的联合损失函数对网络参数进行优化；推理阶段使用融合特征匹配策略进行特征匹配。

结果

实验结果表明：使用IrisFusionNet训练得到的特征提取器在多个公开低质量虹膜数据集上进行测试分别得到了EER（equal error rate）和FAR@FRR = 0.01%的最佳值0.27%与0.84%，并且将分离度DI（discriminating index）提升30%以上，识别准确性以及类聚性均远远领先于基于卷积神经网络和其他使用图神经网络模型的虹膜识别优秀算法。

结论

本文提出的IrisFusionNet应用于虹膜识别任务具有极高的可行性和优越性。

Abstract

Objective

Iris recognition is a prevalent biometric feature in identity recognition technology owing to its inherent advantages， including stability， uniqueness， noncontact modality， and live-body authentication. The complete iris recognition workflow comprises four main steps： iris image acquisition， image preprocessing， feature encoding， and feature matching. Feature encoding serves as the core component of iris recognition algorithms. The improvement in interpretable iris feature encoding methods have become a pivotal concern in the field of iris recognition. Moreover， the recognition of low-quality iris samples， which often relies on specific parameter-dependent feature encoders， results in a poor generalization performance. The graph structure represents a data form with an irregular topological arrangement. Graph neural networks （GNNs） effectively update and aggregate features within such graph structures. The advancement of GNN led to the development of new approaches for feature encoding of these types of iris images. In this paper， a pioneering iris feature-fusion encoding network called IrisFusionNet， which integrates GNN with a convolutional neural network （CNN）， is proposed. This network eradicates the need to implement complex parameter tuning steps and exhibits excellent generalization performance across various iris datasets.

Method

In the backbone network， the previously inserted pixel-level enhancement module alleviates local uncertainty in the input image through median filtering. In addition， global uncertainty was mitigated via Gaussian normalization. A dual-branch backbone network was proposed， where the head of the backbone network comprised a shared stack of CONV modules， and the neck was divided into two branches. The primary branch constructed a graph structure from an image using graph converter. We designed a hard graph attention network that introduces an efficient channel attention mechanism to aggregate and update features through utilization of edge-associated information within the graph structure. This step led to the extraction of microfeatures of iris textures. The auxiliary branch， on the other hand， used conventional CNN pipeline components， such as simple convolutional layers， pooling layers， and fully connected layers， to capture the macrostructural information on the iris. During the training phase， the fused features from the primary and auxiliary branches were optimized using a unique unified loss function graph triplet and additive angular margin unified loss （GTAU-Loss）. The primary branch mapped iris images into a graph feature space， with the use of cosine similarity to measure semantic information in node feature vectors， L2 norm to measure the spatial relationship information within the adjacency matrix， and graph triplet loss to constrain feature distances within the feature space. The auxiliary branch applied an additional angular margin loss， which normalized the input image feature vectors and introduced an additional angular margin to constrain feature angle intervals， which improved intraclass feature compactness and interclass separation. Ultimately， a dynamic learning method based on an exponential model was used to fuse the features extracted from the primary and auxiliary branches and obtain the GTAU-Loss. The hyperparameter settings during training included the following： The optimization of network parameters involved the use of stochastic gradient descent （SGD） with a Nesterov momentum set to 0.9， an initial learning rate of 0.001， and a warm-up strategy adjusting the learning rate with a warm-up rate set to 0.1， conducted over 200 epochs. The iteration process of SGD was accelerated using NVIDIA RTX 3060 12 GB GPU devices， with 100 iterations lasting approximately one day. For feature matching concerning two distinct graph structures， the auxiliary branch calculated the cosine similarity between the output node features. Meanwhile， the primary branch applied a gate-based method and initially calculated the mean cosine similarity of all node pairs as the threshold for the gate， removed node pairs below this threshold， and retained node features above it to compute their cosine similarity. The similarity between these graph structures was represented as the weighted sum of cosine similarities from the primary and auxiliary branches. The similarity weights of the feature pairs computed using the primary and auxiliary branches were both set to 0.5. All experiments were conducted on a Windows 11 operating system， with PyTorch as the deep learning framework.

Result

To validate the effectiveness of integrating GNNs into the framework， this study conducted iris recognition experiments using a single-branch CNN framework and a dual-branch framework. The experimental outcomes substantiated the superior recognition performance involved in the structur

al design incorporating the GNN branch. Furthermore， to determine the optimal values for two crucial parameters， namely， the number of nearest neighbors （

） and the global feature dimension within the IrisFusionNet framework， we conducted detailed parameter experiments to determine their most favorable values.

was set to 8， and the optimal global feature dimension was 256. We compared the present method with several state-of-art （SOTA） methods in iris recognition， including CNN-based methods， such as ResNet， MobileNet， EfficientNet， ConvNext， etc.， and GNN-based methods， such as dynamic graph representation. Comparative experimental results indicate that the feature extractor trained using IrisFusionNet， which was tested on three publicly， available low-quality iris datasets — CASIA-Iris-V4-Distance， CASIA-Iris-V4-Lamp， CASIA-Iris-Mobile-V1.0-S2—to achieve equal error rates of 1.06%， 0.71%， and 0.27% and false rejection rates at a false acceptance rate of 0.01% （FRR@FAR = 0.01%） of 7.49%， 4.21%， and 0.84%， respectively. In addition， the discriminant index reached 6.102， 6.574， and 8.451， which denote an improvement of over 30% compared with the baseline algorithm. The accuracy and clustering capability of iris recognition tasks using the feature extractor derived from IrisFusionNet substantially outperformed SOTA iris recognition algorithms based on convolutional neural networks and other GNN models. Furthermore， the graph structures derived from the graph transformer were visually displayed. The generated graph structures of similar iris images exhibited a high similarity， and those of dissimilar iris images presented remarkable differences. This intuitive visualization explained the excellent performance achieved in iris recognition by constructing graph structures and utilizing GNN methods.

Conclusion

In this paper， we proposed a feature fusion coding a method based on GNN （IrisFusionNet）. The macro features of iris images were extracted using the CNN and the micro features of iris images were extracted using GNNs to obtain fusion features encompassing comprehensive texture characteristics. The experimental results indicate that our method considerably improved the accuracy and clustering of iris recognition and obtained a high feasibility and generalizability without necessitating complex parameter tuning specific to particular datasets.

关键词

虹膜特征编码图神经网络（GNN）硬图注意力算子特征融合联合损失函数

Keywords

iris feature codinggraph neural network （GNN）hard graph attention operatorsfeature fusionunified loss function

references

Chen Y， Wu W Q， Xu L and Guo S B. 2023. Iris-periocular features-fused non-collaborative authentication. Journal of Image and Graphics， 28（5）： 1462-1476

陈英，吴文强，徐亮，郭书斌. 2023. 融合虹膜—眼周特征的非协作身份认证. 中国图象图形学报， 28（5）： 1462-1476 ［DOI： 10.11834/jig.220649http://dx.doi.org/10.11834/jig.220649］

Daugman J. 2009. How iris recognition works//Bovik A， ed. The Essential Guide to Image Processing. 2nd ed. Burlington， USA： Academic Press： 715-739 ［DOI： 10.1016/B978-0-12-374457-9.00025-1http://dx.doi.org/10.1016/B978-0-12-374457-9.00025-1］

Deng J K， Guo J， Xue N N and Zafeiriou S. 2019. ArcFace： additive angular margin loss for deep face recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4685-4694 ［DOI： 10.1109/CVPR.2019.00482http://dx.doi.org/10.1109/CVPR.2019.00482］

Gangwar A and Joshi A. 2016. DeepIrisNet： deep iris representation with applications in iris recognition and cross-sensor iris recognition//Proceedings of 2016 IEEE International Conference on Image Processing （ICIP）. Phoenix， USA： IEEE： 2301-2305 ［DOI： 10.1109/ICIP.2016.7532769http://dx.doi.org/10.1109/ICIP.2016.7532769］

Gangwar A， Joshi A， Joshi P and Ramachandra R. 2019. DeepirisNet2： learning deep-IrisCodes from scratch for segmentation-robust visible wavelength and near infrared iris recognition ［EB/OL］. ［2023-09-21］. https://arxiv.org/ftp/arxiv/papers/1902/1902.05390.pdfhttps://arxiv.org/ftp/arxiv/papers/1902/1902.05390.pdf

Gao H Y and Ji S W. 2019. Graph representation learning via hard and channel-wise attention networks//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage， USA： Association for Computing Machinery： 741-749 ［DOI： 10.1145/3292500.3330897http://dx.doi.org/10.1145/3292500.3330897］

Hamilton W L， Ying R and Leskovec J. 2018. Inductive representation learning on large graphs ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/1706.02216.pdfhttps://arxiv.org/pdf/1706.02216.pdf

Han K， Wang Y H， Guo J Y， Tang Y H and Wu E H. 2022. Vision GNN： an image is worth graph of nodes ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/2206.00272.pdfhttps://arxiv.org/pdf/2206.00272.pdf

Han Y， Wang P H， Kundu S， Ding Y and Wang Z Y. 2023. Vision HGNN： an image is more than a graph of nodes//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 19821-19831 ［DOI： 10.1109/ICCV51070.2023.01820http://dx.doi.org/10.1109/ICCV51070.2023.01820］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hou L， Liu J H， Yu X and Du J W. 2023. Review of graph neural networks ［J/OL］. Computer Science， 1-25［2023-09-21］. https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJ&filename=JSJA20231008002 （https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJ&filename=JSJA20231008002(

侯磊，刘金环，于旭，杜军威. 2023. 图神经网络研究综述［J/OL］. 计算机科学， 1-25 ［2023-11-02］. https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJ&filename=JSJA20231008002https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJ&filename=JSJA20231008002

Howard A， Sandler M， Chen B， Wang W J， Chen L C， Tan M X， Chu G， Vasudevan V， Zhu Y K， Pang R M， Adam H and Le Q. 2019. Searching for MobileNetV3//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1314-1324 ［DOI： 10.1109/ICCV.2019.00140http://dx.doi.org/10.1109/ICCV.2019.00140］

Institute of Automation， Chinese Academy of Sciences. 2018. CASIA-Iris-Mobile-V1.0 database［DB/OL］. ［2023-09-21］. http://biometrics.idealtest.orghttp://biometrics.idealtest.org

中国科学院自动化研究所. 2018. CASIA-Iris-Mobile-V1.0 数据库［DB/OL］. ［2023-09-21］. http://biometrics.idealtest.orghttp://biometrics.idealtest.org

Institute of Automation， Chinese Academy of Sciences. 2020. CASIA-IrisV4 database ［DB/OL］. http://biometrics.idealtest.orghttp://biometrics.idealtest.org

中国科学院自动化研究所. 2020. CASIA-IrisV4 数据库［DB/OL］. http://biometrics.idealtest.orghttp://biometrics.idealtest.org

Jia D D and Shen W Z. 2022. IrisCodeNet： iris feature coding network. Computer Engineering and Applications， 58（10）： 185-192

贾丁丁，沈文忠. 2022. IrisCodeNet：虹膜特征编码网络. 计算机工程与应用， 58（10）： 185-192 ［DOI： 10.3778/j.issn.1002-8331.2011-0466http://dx.doi.org/10.3778/j.issn.1002-8331.2011-0466］

Kipf T N and Welling M. 2016. Semi-supervised classification with graph convolutional networks ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/1609.02907.pdfhttps://arxiv.org/pdf/1609.02907.pdf

Li G H， Müller M， Qian G C， Delgadillo I C， Abualshour A， Thabet A and Ghanem B. 2023. DeepGCNs： making GCNs go as deep as CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（6）： 6923-6939 ［DOI： 10.1109/TPAMI.2021.3074057http://dx.doi.org/10.1109/TPAMI.2021.3074057］

Liu Y C， Shao Z R， Teng Y Y and Hoffmann N. 2021. NAM： normalization-based attention module ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/2111.12419.pdfhttps://arxiv.org/pdf/2111.12419.pdf

Liu Z， Mao H Z， Wu C Y， Feichtenhofer C， Darrell T and Xie S N. 2022. A ConvNet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 11966-11976 ［DOI： 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167］

Miyazawa K， Ito K， Aoki T， Kobayashi K and Nakajima H. 2008. An effective approach for iris recognition using phase-based image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence， 30（10）： 1741-1756 ［DOI： 10.1109/TPAMI.2007.70833http://dx.doi.org/10.1109/TPAMI.2007.70833］

Monro D M， Rakshit S and Zhang D X. 2007. DCT-based iris recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 29（4）： 586-595 ［DOI： 10.1109/TPAMI.2007.1002http://dx.doi.org/10.1109/TPAMI.2007.1002］

Ren M， Wang Y L， Sun Z N and Tan T N. 2020. Dynamic graph representation for occlusion handling in biometrics//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York， USA： AAAI： 11940-11947 ［DOI： 10.1609/aaai.v34i07.6869http://dx.doi.org/10.1609/aaai.v34i07.6869］

Schroff F， Kalenichenko D and Philbin J. 2015. FaceNet： a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 815-823 ［DOI： 10.1109/CVPR.2015.7298682http://dx.doi.org/10.1109/CVPR.2015.7298682］

Shankar S， Garg S and Sarawagi S. 2018. Surprisingly easy hard-attention for sequence to sequence learning//Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels， Belgium： Association for Computational Linguistics： 640-645 ［DOI： 10.18653/v1/D18-1065http://dx.doi.org/10.18653/v1/D18-1065］

Sun Z N and Tan T N. 2009. Ordinal measures for iris recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 31（12）： 2211-2226 ［DOI： 10.1109/TPAMI.2008.240http://dx.doi.org/10.1109/TPAMI.2008.240］

Tan M X and Le Q V. 2019. EfficientNet： rethinking model scaling for convolutional neural networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： PMLR： 6105-6114

Teng T， Shen W Z and Mao Y F. 2020. Multi-task iris fast location method based on cascaded neural network. Computer Engineering and Applications， 56（12）： 118-124

滕童，沈文忠，毛云丰. 2020. 基于级联神经网络的多任务虹膜快速定位方法. 计算机工程与应用， 56（12）： 118-124 ［DOI： 10.3778/j.issn.1002-8331.1903-0145http://dx.doi.org/10.3778/j.issn.1002-8331.1903-0145］

Veličković P， Cucurull G， Casanova A， Romero A， Liò P and Bengio Y. 2018. Graph attention networks ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/1710.10903.pdfhttps://arxiv.org/pdf/1710.10903.pdf

Wang Q L， Wu B G， Zhu P F， Li P H， Zuo W M and Hu Q H. 2020. ECA-Net： efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11531-11539 ［DOI： 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155］

Wang Y H， Zhu Y and Tan T N. 2002. Biometrics personal identification based on iris pattern. Acta Automatica Sinica， 28（1）： 1-10

王蕴红，朱勇，谭铁牛. 2002. 基于虹膜识别的身份鉴别. 自动化学报， 28（1）： 1-10

Xiao L H， Sun Z N， He R and Tan T N. 2013. Coupled feature selection for cross-sensor iris recognition//Proceedings of the 6th IEEE International Conference on Biometrics： Theory， Applications and Systems （BTAS）. Arlington， USA： IEEE： 1-6 ［DOI： 10.1109/BTAS.2013.6712752http://dx.doi.org/10.1109/BTAS.2013.6712752］

Yu B， Yin H T and Zhu Z X. 2021. ST-UNet： a spatio-temporal U-network for graph-structured time series modeling ［EB/OL］. ［2023-09-21］. https://arxiv.org/pdf/1903.05631.pdfhttps://arxiv.org/pdf/1903.05631.pdf

Yue H L， Wang H Y and Peng R Y. 2022. The research progress of graph neural network//Proceedings of the 17th Chinese Conference on Stereology and Image Analysis. Chinese Society of stereology： 99-102

岳含霖，王浩宇，彭瑞云. 2022. 图神经网络研究进展//第十七届中国体视学与图像分析学术会议论文集. 中国体视学学会： 99-102 ［DOI： 10.26914/c.cnkihy.2022.051480http://dx.doi.org/10.26914/c.cnkihy.2022.051480］

Zhao Z J and Kumar A. 2017. Towards more accurate iris recognition using deeply learned spatially corresponding features//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 3829-3838 ［DOI： 10.1109/ICCV.2017.411http://dx.doi.org/10.1109/ICCV.2017.411］

文章被引用时，请邮件提醒。

提交

混合监督学习的乳腺癌全切片病理图像分类

红外与可见光图像特征动态选择的目标检测网络

融合姿态引导和多尺度特征的遮挡行人重识别

结合双边交叉增强与自注意力补偿的点云语义分割

融合事件相机的视觉场景识别