Current Issue Cover
图神经网络与CNN融合的虹膜特征编码方法

孙金通, 沈文忠(上海电力大学)

摘 要
目的 更具可解释性的虹膜特征编码方法一直是虹膜识别中一个关键问题,且对于低质量虹膜样本识别比较困难,图神经网络的发展为此类虹膜图像特征编码带来了新思路,本文提出了一种图神经网络与卷积神经网络融合的虹膜特征编码网络IrisFusionNet。方法 在骨干网络前添加一个像素级增强模块以消除输入图像不确定性,并使用双分支骨干网络提取虹膜微观与宏观融合特征。训练阶段使用一个独特的联合损失函数对网络参数进行优化;推理阶段使用融合特征匹配策略进行特征匹配。结果 实验结果表明:使用IrisFusionNet训练得到的特征提取器在多个公开低质量虹膜数据集上进行测试分别得到了EER和FAR@FRR = 0.01%最佳值0.27%与0.84%,并且将分离度(DI)提升30%以上,识别准确性以及类聚性均远远领先于基于卷积神经网络和其它使用图神经网络模型的虹膜识别SOTA算法。结论 本文提出的IrisFusionNet应用于虹膜识别任务具有极高可行性和优越性。
关键词
An iris feature encoding method by fusion of graph neural networks and CNNs

SunJintong, Shen Wenzhong(Shanghai University of Electric Power)

Abstract
Objective Iris recognition has emerged as a prevalent biometric feature within identity recognition technology owing to its inherent advantages in stability, uniqueness, non-contact modality, and live-body authentication. The complete iris recognition workflow typically comprises four main steps: iris image acquisition, image preprocessing, feature encoding, and feature matching. Among these, feature encoding stands out as the core component of iris recognition algorithms. The enhancement of interpretable iris feature encoding methods had emerged as a pivotal concern within the field of iris recognition. Moreover, the recognition of low-quality iris samples often relied on specific parameter-dependent feature encoders, resulting in poor generalization performance. The graph structure represented a data form with an irregular topological arrangement. Graph Neural Network (GNN) were effective in updating and aggregating features within such graph structures. The advancement of GNN brought forth new approaches to feature encoding for these types of iris images. In this paper, a pioneering iris feature fusion encoding network called IrisFusionNet, that integrated Graph Neural Networks with Convolutional Neural Network(CNN), is proposed. It eliminates the need for complex parameter tuning steps and has exhibited excellent generalization performance across multiple distinct iris datasets. Method In the backbone network, a Pixel-Level Enhancement (PLE) module was inserted before to alleviate local uncertainty in the input image by employing median filtering, and global uncertainty was mitigated using Gaussian normalization. A dual-branch backbone network was proposed, where the head of the backbone network consisted of a shared stack of CONV modules, and the neck divided into two branches: the primary branch constructed a graph structure from the image through the graph converter. We designed a hard graph attention network introducing an efficient channel attention mechanism to aggregate and update features by utilizing edge-associated information within the graph structure, thus extracting micro-features of iris textures. The auxiliary branch, on the other hand, employed conventional CNN pipeline components like simple convolutional layers, pooling layers, and fully connected layers to capture macro-structural information of the iris. During the training phase, the fused features from the primary and auxiliary branches were optimized using a unique unified loss function Graph Triplet and Additive Angular Margin Unified Loss (GTAU-Loss).The primary branch mapped iris images into a graph feature space, utilizing cosine similarity to measure semantic information within node feature vectors, employing the L2 norm to measure spatial relationship information within the adjacency matrix, and using Graph Triplet Loss to constrain feature distances within the feature space. The auxiliary branch utilized an additional angular margin loss, normalizing input image feature vectors, and introduced an additional angular margin to constrain feature angle intervals, enhancing intra-class feature compactness and inter-class separation. Ultimately, a dynamic learning method based on an exponential model fused the features extracted from the primary and auxiliary branches, yielding the GTAU-Loss. The hyperparameter settings during training included: The optimization of network parameters involved the use of Stochastic Gradient Descent (SGD) with Nesterov momentum set to 0.9, an initial learning rate of 0.001, and a warm-up strategy adjusting the learning rate with a warm-up rate set to 0.1, conducted over 200 epochs. The iteration process of SGD was accelerated using NVIDIA RTX 3060 12G GPU devices, taking approximately one day for 100 iterations. For feature matching, concerning two distinct graph structures, the auxiliary branch computed the cosine similarity between output node features. Meanwhile, the primary branch utilized a gate-based method, initially calculating the mean cosine similarity of all node pairs as the threshold for the gate, removing node pairs below this threshold, and retaining node features above it to compute their cosine similarity. Eventually, the similarity between these graph structures was represented as the weighted sum of cosine similarities from the primary and auxiliary branches, and the similarity weights of the feature pairs computed by both the primary and auxiliary branches were both set to 0.5. All experiments were conducted on the Windows 11 operating system, utilizing PyTorch as the deep learning framework. Result To validate the effectiveness of integrating graph neural networks into the framework, this study conducted iris recognition experiments using both a single-branch CNN framework and a dual-branch framework. The experimental outcomes substantiated that the structural design incorporating the graph neural network branch yielded superior recognition performance. Furthermore, to determine the optimal values for two crucial parameters, the number of nearest neighbors (k) and the global feature dimension within the IrisFusionNet framework, detailed parameter experiments were conducted, leading to the identification of their most favorable values, k was set to 8, and the optimal global feature dimension was 256. We compared the present method with several state-of-art(SOTA) methods in iris recognition, including CNN-based methods such as ResNet, MobileNet, EfficientNet, ConvNext, etc., and GNN-based methods such as DGR(Dynamic Graph Representation). Comparative experimental results indicated that the feature extractor trained using IrisFusionNet, tested on three publicly available low-quality iris datasets—CASIA-Iris-V4-Distance, CASIA-Iris-V4-Lamp, CASIA-Iris-Mobile-V1.0-S2—achieved an Equal Error Rate (EER) of 1.06%, 0.71%, 0.27% and a False Rejection Rate at a False Acceptance Rate of 0.01% (FRR@FAR=0.01%) of 7.49%, 4.21%, 0.84%, respectively. Additionally, the Discriminant Index (DI) reached 6.102, 6.574, 8.451, representing an improvement of over 30% compared to the baseline algorithm. The accuracy and clustering ability of iris recognition tasks using the feature extractor derived from IrisFusionNet significantly outperformed state-of-the-art iris recognition algorithms based on convolutional neural networks and other graph neural network models. Furthermore, this paper visually displayed the graph structures derived from the graph transformer. It was observed that the generated graph structures of similar iris images exhibited high similarity, while those of dissimilar iris images showed significant differences. This intuitive visualization provided an explanation for the excellent performance achieved in iris recognition by constructing graph structures and utilizing graph neural network methods. Conclusion In this paper, we proposed a feature fusion coding method based on graph neural network, named IrisFusionNet. The macro features of iris images were extracted by CNN and the micro features of iris images were extracted by graph neural networks to obtain the fusion features encompassing comprehensive texture characteristics. The experimental results showed that our method significantly improves the accuracy and clustering of iris recognition, and got high feasibility and generalizability without necessitating complex parameter tuning specific to particular datasets.
Keywords

订阅号|日报