Current Issue Cover

郭冬升, 顾肇瑞, 郑冰, 董军宇, 郑海永(中国海洋大学)

摘 要
目的:图像内补与外推可看作根据已知区域绘制未知区域的问题,是计算机视觉领域研究热点。近年来,深度神经网络成为解决内补与外推问题的主流方法。然而,当前解决方法多分别对待内补与外推问题导致二者难以统一处理; 且模型多采用卷积神经网络(CNN)构建,受到视野局部性限制较难绘制远距离内容。针对这两个问题,本文按照分而治之思想联合CNN与Transformer构建深度神经网络,提出图像内补与外推统一处理框架及模型。 方法:将内补与外推问题的解决过程分解为“表征、预测、合成”三个部分,表征与合成采用CNN完成,充分利用其局部相关性进行图像到特征映射和特征到图像重建;核心预测由Transformer实现,充分发挥其强大的全局上下文关系建模能力,并提出掩膜自增策略迭代预测特征,降低Transformer同时预测大范围未知区域特征难度;最后引入对抗学习提升绘制图像逼真度。 结果:实验给出在多种数据集下内补与外推对比评测,结果显示本方法各项性能指标均超越对比方法。通过消融实验发现,模型相比采用非分解方式具有更佳表现,说明分而治之思路功效显著。此外,对掩膜自增策略进行详细实验分析,表明迭代预测方法可有效提升绘制能力。最后,探究了Transformer关键结构参数对模型性能的影响。 结论:本文提出一种迭代预测统一框架解决图像内补与外推问题,所提方法较常用和先进方法性能更佳,并且各部分设计对性能提升均有贡献,显示了迭代预测统一框架及方法在图像内补与外推问题上的应用价值与潜力。
An unified framework with iterative prediction for both image inpainting and image outpainting


Objective: Image inpainting and outpainting an be regarded as the problem of painting unknown regions from known regions, which are research hotspots in computer vision. Recently, deep learning has become the mainstream to deal with image inpainting and outpainting tasks. However, most current solutions treat the cases of inpainting and outpainting separately, which is difficult to adapt to each other. Besides, Convolutional Neural Network (CNN) mainly constitutes the backbone of these methods, which is limited by locality that makes it difficult to paint the long-range content. In this paper, we propose a unified framework for tackling both image inpainting and image outpainting, with the model composed of CNN and Transformer based on divide-and-conquer strategy. Method: We divide the problem-solving process into three stages of representation, prediction, and synthesis, where, representation and synthesis are tackled by CNN, for respectively mapping images to features and reconstructing images from features leveraging CNN"s locality, while prediction is dealt with Transformer, which takes full advantages of the powerful modeling ability of global context, and further the mask growth strategy is devised to predict features iteratively, reducing the difficulty of parallelly predicting the features of large-range unknown regions, finally adversarial learning is introduced to improve the fidelity of synthesized images. Result: We conduct comprehensive experiments on different datasets covering both objects and scenes for both image inpainting and image outpainting, and the results demonstrate that our method outperforms state-of-the-art methods in terms of various metrics. Also ablation study validates the efficacy of each component of our method, including the framework structure and the mask growth strategy. Moreover, the impact of layer and head numbers of Transformer on performance is also empirically studied. Conclusion: This paper proposes an iterative prediction unified framework to address the problems of image inpainting and outpainting. The proposed method outperforms commonly used and advanced methods in terms of performance, and each part of the design contributes to the performance improvement. It demonstrates the application value and potential of the iterative prediction unified framework and method in the problems of image inpainting and outpainting.