联合软阈值去噪和视频数据融合的低质量3维人脸识别

桑高丽; 肖述笛; 赵启军

发布时间： 2023-05-16
摘要点击次数： 740
全文下载次数： 763
DOI: 10.11834/jig.220695
2023 | Volume 28 | Number 5

人脸、虹膜、步态等身份识别
<< 上一篇
下一篇>>

联合软阈值去噪和视频数据融合的低质量3维人脸识别

桑高丽¹, 肖述笛², 赵启军²(1.嘉兴学院信息科学与工程学院, 嘉兴 314001;2.四川大学计算机学院, 成都 610065)

摘要

目的低质量3维人脸识别是近年来模式识别领域的热点问题;区别于传统高质量3维人脸识别,低质量、高噪声是低质量3维人脸识别面对的主要问题。围绕低质量3维人脸数据噪声大、依赖单张有限深度数据提取有效特征困难的问题,提出了一种联合软阈值去噪和视频数据融合的低质量3维人脸识别方法。方法首先,针对低质量3维人脸中存在的噪声问题,提出了一个即插即用的软阈值去噪模块,在网络提取特征的过程中对特征进行去噪处理。为了使网络提取的特征更具有判别性,结合softmax和Arcface (additive angular margin loss for deep face recognition)提出的联合渐变损失函数使网络提取更具有判别性特征。为了更好地利用多帧低质量视频数据实现人脸数据质量提升,提出了基于门控循环单元的视频数据融合模块,实现了视频帧数据间互补信息的有效融合,进一步提高了低质量3维人脸识别准确率。结果实验在两个公开数据集上与较新方法进行比较,在Lock3DFace (low-cost kinect 3D faces)开、闭集评估协议上,相比于性能第2的方法,平均识别率分别提高了0.28%和3.13%;在ExtendedMulti-Dim开集评估协议上,相比于性能第2的方法,平均识别率提高了1.03%。结论提出的低质量3维人脸识别方法,不仅能有效缓解低质量噪声带来的影响,还有效融合了多帧视频数据的互补信息,大幅提高了低质量3维人脸识别准确率。

关键词

3维人脸识别低质量3维人脸软阈值去噪联合渐变损失函数视频数据融合

Soft threshold denoising and video data fusion-relevant low-quality 3D face recognition

Sang Gaoli¹, Xiao Shudi², Zhao Qijun²(1.College of Information and Engineering, Jiaxing University, Jiaxing 314001, China;2.College of Computer Science, Sichuan University, Chengdu 610065, China)

Abstract

Objective 3D sensors-portable are developed and focused on user-friendly 3D facial data. Its low-quality 3D face recognition is concerned about more in the context of pattern recognition in recent years. Low quality 3D face recognition is challenged of the problem of low quality and high noise. To suppress high noise in low-quality 3D face data and alleviate the difficulty of extracting effective features in terms of limited single-depth data，we develop a novel low-quality 3D face recognition method on the basis of soft threshold denoising and video data fusion. Method First，a trainable soft thresh中图法分类号： TP319. 4 文献标识码：A 文章编号：1006-8961 （2023） 05-1434-11 论文引用格式： Sang G L，Xiao S D and Zhao Q J. 2023. Soft threshold denoising and video data fusion-relevant low-quality 3D face recognition. Jourold denoising module is developed to denoise the features in the process of feature extraction. To denoise the features in the process of network feature extraction，deep learning method is melted into the soft threshold denoising module designed using the neural network model beyond threshold-manual method. Then，to make the features extracted more distinctive，a joint gradient loss function is fed into softmax and Arcface （additive angular margin loss for deep face recognition）to extract more effective features. Finally，to make use of multiple frames of low-quality video data，a recurrent unit-gated video data fusion module is proposed to improve the quality of face-related data，which can optimize the mutual-benefited information between video frame data. Result To verify the effectiveness，comparative analysis is carried out in respect of two popular low-quality 3D face datasets，called the Lock3DFace （low-cost kinect 3D faces）and the Extended-Multi-Dim dataset. To be clarified，the experiments are followed by the prior training and testing protocol. Specifically，each of three protocols mentioned below are in comparison with the method of second-highest performance. For the Lock3DFace closed-set protocol，the average recognition rate is increased by 3. 13%；For the Lock3DFace open-set protocol，the average recognition rate is optimized by 0. 28%；For the Extended-Multi-Dim open-set protocol，the average recognition rate is improved by 1. 03%. Furthermore，the ablation study demonstrates that the effectiveness and the feasibility of soft threshold denoising and video data fusion as well. Conclusion A trainable soft threshold denoising module is developed to denoise the lowquality 3D faces. The joint gradient loss function can be used to extract more distinctive features in relevant to softmax and Arcface. Furthermore，a video-based data fusion module is used to fuse information-added between video frames and the accuracy of low-quality 3D face recognition can be improved further. This low-quality 3D face recognition method can alleviate the degree of noise and integrate more effective information in terms of multiple frames of video data，which is potential to optimize low-quality 3D face recognition.

Keywords

3D face recognition low-quality 3D face soft threshold denoising joint gradient loss function video data fusion