融合通道位置注意力机制和并行空洞卷积的人脸年龄合成

张珂; 于婷婷; 石超君; 娄文硕; 刘阳

发布时间： 2023-12-21
摘要点击次数： 383
全文下载次数： 690
DOI: 10.11834/jig.230007
2023 | Volume 28 | Number 12

融合通道位置注意力机制和并行空洞卷积的人脸年龄合成

张珂^1,2, 于婷婷¹, 石超君^1,2, 娄文硕¹, 刘阳¹(1.华北电力大学电子与通信工程系, 保定 071003;2.华北电力大学河北省电力物联网技术重点实验室, 保定 071003)

摘要

目的人脸年龄合成旨在合成指定年龄人脸图像的同时保持高可信度的人脸，是计算机视觉领域的热门研究方向之一。然而目前主流人脸年龄合成模型过于关注纹理信息，忽视了与人脸相关的多尺度特征，此外网络存在对身份信息筛选不佳的问题。针对以上问题，提出一种融合通道位置注意力机制和并行空洞卷积的人脸年龄合成网络（generative adversarial network（GAN）composed of the parallel dilated convolution and channel-coordinate attention mechanism，PDA-GAN）。方法 PDA-GAN基于生成对抗网络提出了并行三通道空洞卷积残差块和通道—位置注意力机制。并行三通道空洞卷积残差块将3种膨胀系数空洞卷积提取的不同尺度人脸特征融合，提升了特征尺度上的多样性和总量上的丰富度；通道—位置注意力机制通过对人脸特征的长度、宽度和深度显著性计算，定位图像中与年龄高度相关的通道和空间位置区域，增强了网络对通道和空间位置上敏感特征的表达能力，解决了特征冗余问题。结果实验在Flickr高清人脸数据集（Flickr-faces-high-quality，FFHQ）上训练，在名人人脸属性高清数据集（large-scale celebfaces attributes dataset-high quality，Celeba-HQ）上测试，将本文提出的PDA-GAN与最新的3种人脸年龄图像合成网络进行定性和定量比较，以验证本文方法的有效性。实验结果表明，PDA-GAN显著提升了人脸年龄合成的身份置信度和年龄估计准确度，具有良好的身份信息保留和年龄操控能力。结论本文方法能够合成具有较高真实度和准确性的目标年龄人脸图像。

关键词

图像合成人脸年龄生成对抗网络(GAN) 空洞卷积注意力机制

Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution

Zhang Ke^1,2, Yu Tingting¹, Shi Chaojun^1,2, Lou Wenshuo¹, Liu Yang¹(1.Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, China;2.Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, China)

Abstract

Objective Face age synthesis is one of the most popular research fields in computer vision aiming at synthesizing face images of specified ages while maintaining high fidelity. With the continuous progress of science and technology, face age synthesis technology is being gradually applied in face recognition, film special effects, public security, and other fields with a very wide range of application scenarios. The generative adversarial network(GAN)is one of the most widely used deep learning models in face synthesis. The generator and discriminator of GAN fight each other to generate images that are real enough to be fake. While GAN and its variant models have achieved good synthesis results, some deficiencies remain unaddressed. First, in order to synthesize images that are close to the target age, the current face age synthesis models only limit the process of age change to texture information and ignore multi-scale features, such as contour, hair color, and texture, on the face. Second, the limited receptive field of the convolutional layer hinders the full convolutional network from extracting multi-scale features in the image. These problems greatly restrict the face age image synthesis effect of GAN. To solve these problems, this paper proposes a GAN composed of the parallel dilated convolution and channelcoordinate attention mechanism(PDA-GAN). Method PDA-GAN proposes a parallel three-channel dilated convolutional residual block(PTDCRB)and a channel-coordinate attention mechanism(CCAM)based on generative adversarial networks. PTDCRB is introduced in the generator network of the baseline. Each PTDCRB comprises three parallel dilated convolution channels that extract features at the same time. The dilated convolutions on different branches set expansion coefficients of[1, 2, 3], respectively. Each branch of PTDCRB shares weights and reduces the amount of network parameters. The first layer of each branch in PTDCRB uses a 1 × 1 convolutional layer, the second layer is a dilated convolution with different expansion coefficients, and the third layer uses a 1 × 1 convolutional layer to reduce dimensionality and improve computational efficiency. Meanwhile, CCAM significantly screens the channel dimension of the feature vector, retains meaningful channel information in the feature, and learns the importance of different channels in order to avoid feature redundancy. CCAM then embeds the position information into the feature vector after channel attention and fuses them together after calculating the attention mechanism along the two orthogonal directions of length and width. The purpose of CCAM is to easily capture the dependencies of features at different positions. Result An experiment is conducted on the FFHQ dataset, samples in the Celeba-HQ dataset are selected as the test set, and PDA-GAN is qualitatively and quantitatively compared with the three latest face age image synthesis networks HRFAE, LIFE, and SAM to verify its effectiveness. Age accuracy and identity consistency are adopted as quantitative indicators. PDA-GAN achieves the best accuracy for synthetic age images, with an average prediction difference of 4. 09. The identity confidence can reach 99. 2% when synthesizing a 30-year-old face. In the age-independent attribute retention experiment, PDA-GAN outperforms the other models in both quantitative indicators, with a gender retention rate of 99. 7% and emotion retention rate of 93. 2%. An ablation experiment is performed to further prove the effectiveness of each module of PDA-GAN, where PTDCRB is introduced into different layers of the generator backbone network. Experimental results show that PTDCRB-3 significantly improves identity confidence and age estimation accuracy. Four PTDCRB expansion coefficient sets are then established to train the network, and an expansion coefficient of[1, 2, 3]needs to be achieved to confirm the optimality of model identity confidence and predicted age distribution. The standard generator structure and the generator structure introducing the channel-coordinate attention mechanism are then tested for their performance on age synthesis accuracy and identity verification confidence. Experimental results show that the identity retention and age synthesis abilities are significantly improved after adding the channel-coordinate attention mechanism. Conclusion This study proposes a parallel threechannel dilated convolution residual block with shared weights that captures feature information at each scale and enhances the richness of the model detail features. To enhance the expressiveness of the model on sensitive features, this paper proposes a channel-coordinate attention mechanism that learns features of the channel and spatial dimensions simultaneously. Under the combined effect of the parallel three-channel dilated convolution residual block and the channel-position attention mechanism, the identity preservation ability and age synthesis accuracy of the model for face images are improved. Experimental results show that the proposed method outperforms other popular methods for face age synthesis tasks and can synthesize natural and realistic face images of the target age with high fidelity and accuracy.

Keywords

image synthesis face age generative adversarial network(GAN) dilated convolution attention mechanism

在线采编平台

论文出版

年度会议

下载中心

年度信息