结合全卷积网络与CycleGAN的图像实例风格迁移
Image instance style transfer combined with fully convolutional network and cycleGAN
- 2019年24卷第8期 页码:1283-1291
收稿:2018-11-19,
修回:2019-1-15,
纸质出版:2019-08-16
DOI: 10.11834/jig.180624
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-11-19,
修回:2019-1-15,
纸质出版:2019-08-16
移动端阅览
目的
2
传统的图像风格迁移主要在两个配对的图像间进行。循环一致性对抗网络(CycleGAN)首次将生成对抗网络应用于图像风格迁移,实现无配对图像之间的风格迁移,取得了一定的效果,但泛化能力较弱,当训练图像与测试图像之间差距较大时,迁移效果不佳。针对上述问题,本文提出了一种结合全卷积网络(FCN)与CycleGAN的图像风格迁移方法,使得图像能够实现特定目标之间的实例风格迁移。同时验证了训练数据集并非是造成CycleGAN风格迁移效果不佳的因素。
方法
2
首先结合全卷积网络对图像进行语义分割,确定风格迁移的目标,然后将风格迁移后的图像与目标进行匹配,确定迁移对象实现局部风格迁移。为验证CycleGAN在训练图像和测试图像差距较大时风格转移效果不佳并非因缺少相应训练集,制作了训练数据集并带入原网络训练。
结果
2
实验表明结合了全卷积网络与CycleGAN的图像风格迁移方法增加了识别能力,能够做到图像局部风格迁移而保持其余元素的完整性,相对于CycleGAN,该方法能够有效抑制目标之外区域的风格迁移,实验中所用4张图片平均只有4.03%的背景像素点发生了改变,实例迁移效果得到很好提升。而将自制训练集带入原网络训练后,依然不能准确地在目标对象之间进行风格迁移。
结论
2
结合了全卷积网络与CycleGAN的方法能够实现图像的局部风格迁移而保持目标对象之外元素不发生改变,而改变训练数据集对CycleGAN进行实例风格迁移准确性的影响并不大。
Objective
2
Gatys et al. successfully used convolutional neural networks (CNNs) to render a content image in different styles in a process referred to as neural style transfer (NST). Their work was the first time deep learning demonstrated its ability in the field of style transfer. In the past
most related problems in style transfer were manually modeled
which was a time consuming and laborious process. The goal of traditional NST is to learn the mapping between two different styles of paired images. Cycle-consistent adversarial networks (CycleGAN) is the first method to apply the generative adversarial network (GAN) to image style transfer. This method has a good performance on unpaired training data but does not work well when the test image is different from the training images. Thus
instance style transfer was developed to address this problem. Instance style transfer is built on image segmentation and should be applied only on the object of interest. The main challenge is the transition between the object and a non-stylized background. Most studies on instance style transfer have focused on the CNN. In this paper
some of these methods are extended to CycleGAN
and some steps are improved based on actual conditions. We propose a method to achieve instance style transfer by combining fully convolutional network (FCN) with CycleGAN. A dataset is used to verify that training data are not the reason CycleGAN cannot work well on instance style transfer.
Method
2
This study is divided into two parts:The first part is to improve the performance of CycleGAN to make it work efficiently in instance style transfer. The second part is to verify the conjecture in the reference. In the first part
the FCN is utilized to obtain the semantic segmentation of input image
$$\mathit{\boldsymbol{X}}$$
. FCN must be trained by a large amount of labeled data in advance so that the network can segment the object with high accuracy. The output of the FCN is label image
$$\mathit{\boldsymbol{Y}}$$
. Next
CycleGAN is utilized for style transfer. In this step
CycleGAN must be trained with the prepared data to obtain style transfer image
$$\mathit{\boldsymbol{Z}}$$
. Then
the output of the CycleGAN
$$\mathit{\boldsymbol{Z}}$$
is matched with the label image
$$\mathit{\boldsymbol{Y}}$$
. When referring to image matching
we make the pixel points outside the areas of interests in the FCN's label image
$$\mathit{\boldsymbol{Y}}$$
to be zero and make the Hadamard product
$$\mathit{\boldsymbol{R}}$$
=
$$\mathit{\boldsymbol{Y}}$$
$$ \circ $$
$$\mathit{\boldsymbol{Z}}$$
. In this way
the areas of interests can be separated from the style transfer image
$$\mathit{\boldsymbol{Z}}$$
and
$$\mathit{\boldsymbol{R}}$$
is used to replace the pixel in the same location of the original image
$$\mathit{\boldsymbol{X}}$$
. For the second part
we create training sets for people riding horses
people beside horses
people riding zebras
or people with zebras to verify the problem raised by CycleGAN's author. Then
this dataset is used to train CycleGAN and observe the result. Data augmentation needs to be performed because images are hard to find.
Result
2
The first experiment shows that the recognition ability of CycleGAN improves considerably when combined with the FCN. The proposed method can achieve instance style transfer of the image
while the rest of the area in the image has minimal influence. In the experiment
we define an index that calculates the number of pixels changed outside the target object to measure the performance of instance style transfer and show the improvement of this method. A smaller index corresponds to improved performance of the instance style transfer. In the numerical simulations
the value of our method is smaller than that of the style transfer that uses CycleGAN only
thereby showing that the proposed method is more efficient in instance style transfer. In the second experiment
using our dataset to train CycleGAN shows that the CycleGAN is still unable to achieve instance style transfer. The network is difficult to train and oscillation of loss function is violent because of the complex color and background of the new training dataset. However
the instance style transfer performance of using a new dataset to train is improved.
Conclusion
2
FCN can obtain the semantic segmentation of an image. CycleGAN combined with FCN can achieve instance style transfer and ensure that the background and other objects remain unchanged. We verify that CycleGAN cannot accurately achieve instance style transfer of a given target when the test image is different from the training images.
Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 2414-2423.[ DOI: 10.1109/CVPR.2016.265 http://dx.doi.org/10.1109/CVPR.2016.265 ]
Atarsaikhan G, Iwana B K, Narusawa A, et al. Neural font style transfer[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE, 2017: 51-56.[ DOI: 10.1109/ICDAR.2017.328 http://dx.doi.org/10.1109/ICDAR.2017.328 ]
Azadi S, Fisher M, Kim V, et al. Multi-content GAN for few-shot font style transfer[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018.[ DOI: 10.1109/CVPR.2018.00789 http://dx.doi.org/10.1109/CVPR.2018.00789 ]
Yang S, Liu J Y, Lian Z H, et al. Awesome typography: statistics-based text effects transfer[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 2886-2895.[ DOI: 10.1109/CVPR.2017.308 http://dx.doi.org/10.1109/CVPR.2017.308 ]
Jing Y C, Yang Y Z, Feng Z L, et al. Neural style transfer: a review[EB/OL].[2018-11-04] . https://arxiv.org/pdf/1705.04058.pdf https://arxiv.org/pdf/1705.04058.pdf .
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2242-2251.[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
Li C, Wand M. Combining Markov random fields and convolutional neural networks for image synthesis[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 2479-2486.[ DOI: 10.1109/CVPR.2016.272 http://dx.doi.org/10.1109/CVPR.2016.272 ]
Champandard A J. Semantic style transfer and turning two-bit doodles into fine artworks[EB/OL].[2018-11-04] . https://arxiv.org/pdf/1603.01768.pdf https://arxiv.org/pdf/1603.01768.pdf .
Chen Y L, Hsu C T. Towards deep style transfer: a content-aware perspective[C]//Proceedings of British Machine Vision Conference. New York: MVA Press, 2016: 8.1-8.11.[ DOI: 10.5244/C.30.8 http://dx.doi.org/10.5244/C.30.8 ]
Mechrez R, Talmi I, Zelnik-Manor L. The contextual loss for image transformation with non-aligned data[C]//Proceedings of Europen Conference on Computer Vision. Munish, Germany: ECCV, 2018: 768-783.
Lu M, Zhao H, Yao A B, et al. Decoder network over lightweight reconstructed feature for fast semantic style transfer[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2488-2496.[ DOI: 10.1109/ICCV.2017.270 http://dx.doi.org/10.1109/ICCV.2017.270 ]
Castillo C, De S, Han X T, et al. Son of Zorn's lemma: targetedstyle transfer using instance-aware semantic segmentation[C]//Proceedings of 2017 IEEE International Conference on Aco ustics, Speech, and Signal Processing. New Orleans, LA: IEEE, 2017: 1348-1352.[ DOI: 10.1109/ICASSP.2017.7952376 http://dx.doi.org/10.1109/ICASSP.2017.7952376 ]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015, 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 3150-3158.[ DOI: 10.1109/CVPR.2016.343 http://dx.doi.org/10.1109/CVPR.2016.343 ]
Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 2672-2680.
相关作者
相关机构
京公网安备11010802024621