廖远鸿1, 钱文华1, 曹进德2(1.云南大学信息学院;2.东南大学数学学院)
目的 针对人脸风格迁移算法StarGAN、MSGAN等存在细节风格学习不佳、迁移效果单一、生成图像失真情况等缺点,提出一种能够降低失真并生成不同风格强度图像的人脸风格迁移算法MStarGAN (Multilayer StarGAN)。方法 首先,通过特征金字塔网络 (FPN) 构建前置编码器,生成蕴含有图像细节特征的多层特征向量,增强生成图像在风格传输时能学习到风格图像的细节风格；其次,使用前置编码器对原图像及风格图像各生成一个风格向量并进行组合,利用组合后的风格向量进行风格传输,使生成图像具有不同的风格迁移强度；最后,采用权重解调算法作为生成器中的风格传输模块,通过对卷积权重的操作代替在特征图上的归一化操作,消除特征图中的特征伪影,减少生成图像中的失真。结果 模型采用Python实现并用RTX2080Ti在Celeba_hq数据集上进行实验,与MSGAN、StarGAN v2等对比算法相比,在参考引导合成实验中,FID指标分别降低了18.9和3.1,LPIPS指标分别提升了0.094和0.018,在潜在引导合成实验中,FID指标分别降低了20.2和0.8,LPIPS指标分别提升了0.155和0.92,并能够生成具有不同风格强度的结果图像。结论 本文提出的算法能够传输图像的细节风格,生成具有不同强度的输出图像,并减少生成图像的失真。
MStarGAN: A face style transfer network with changeable style intensity
Liao Yuanhong, Qian Wenhua1, Cao Jinde2(1.School of Information Science and Engineering,Yunnan University;2.School of Mathematics,Southeast University)
Objective The style transfer algorithm can transfer the style in the art image to the original natural image. The image that provides the features such as style texture and stroke is called the style image, and the image that provides the contour structure is called the content image. The goal of the style transfer algorithm is to synthesize a new stylized image with the texture stroke of the style image and the contour structure of the content image. The early face style transfer algorithm used the method of mathematical modeling to build a filter to count the local features of the target image to get its style, and established a statistical model to describe its image style. This algorithm can only generate for a single style, the resulting image style is not obvious, and the statistical model must be modeled manually, so the efficiency of the algorithm is low. With the rise of deep learning algorithm, Style transfer algorithm also began to use deep learning model as the core of the algorithm. Because GAN(Generative Adversarial Networks) can generate images that satisfy certain distribution laws, we can generate the target image which is similar to the real image by training GAN . Therefore, GAN is widely used in image style transfer algorithm. Currently, the main image style transfer algorithms are divided into two categories. The first type is the network which only improves the GAN without using the pre-encoder, such as pix2pix, CycleGAN, etc. The second type is the network using the pre-encoder. Due to the addition of encoders before the GAN structure, this network structure is more complex, but achieves more realistic results, such as StyleGAN, StarGAN, etc. In order to overcome the shortcomings of face style transfer algorithms such as StarGAN and MSGAN(Mode Seeking Generative Adversarial Networks), which have poor detail style learning, non significant style transfer effect and distorted image generation,we present a face style migration algorithm MStarGAN (Multi-layer StarGAN) with controllable style intensity. Method First, the pre-encoder is constructed through the feature pyramid network (FPN) to generate multi-layer feature vectors containing image detail features. Compared with the original 1×64 feature vector, the pre-encoder constructed by FPN can output 6×256 feature vector, it contains more details of the original image. So that the generated image can learn the detailed style of the style image during style transmission; Next, the pre-encoder is used to generate a style vector for the original image and the style image respectively and combine them. The combined style vector is used for style transmission, we can adjust the number of layers of this vector so that the style of the generated image is biased to one of the original image and the style image, makes the generated image has different style transfer intensity . In addition, we also use a new loss function to maintain the balance of the generated image’s style, ensure that it will not be too biased towards the original image or style image; At the end, the weight demodulation algorithm is used as the style transmission module in the generator. The traditional method AdaIN has been proved to distort the generated image. By replacing the normalization operation on the feature map with the operation of convolution weight, we eliminate the feature artifacts in the feature map and reduce the distortion in the generated image. Result Our model is implemented in Python and tested on Celeba_hq data set with RTX2080Ti. Our model can not only generate high-quality random face images, but also make the generated images learn the style of style images, such as hair color, skin color and so on. Compared with MUNIT(Multimodal Unsupervised Image-to-Image Translation), DRIT(Diverse image-to-image translation), MSGAN and StarGAN V2 algorithms, in the latent-guided synthesis experiment, the FID(Frechét inception distance) index is reduced by 18.5, 39.2, 20.2 and 0.8 respectively, and the LPIPS(Learned perceptual image patch similarity) index is increased by 0.181, 0.366, 0.155 and 0.092 respectively. In the reference-guided synthesis experiment, the FID index is reduced by 86.4, 32.6, 18.9 and 3.1 respectively, and the LPIPS index is increased by 0.23, 0.095, 0.094 and 0.018 respectively. Especially, our algorithms can generate the result images with different style and intensity. Conclusion The algorithm proposed in this paper can transmit the detail style of the image, control the style intensity of the output image, and reduce the distortion of the generated image.