融合感知损失的生成式对抗超分辨率算法
Generative adversarial network for image super-resolution combining perceptual loss
- 2019年24卷第8期 页码:1270-1282
收稿:2018-11-07,
修回:2019-3-20,
纸质出版:2019-08-16
DOI: 10.11834/jig.180613
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-11-07,
修回:2019-3-20,
纸质出版:2019-08-16
移动端阅览
目的
2
现有的基于深度学习的单帧图像超分辨率重建算法大多采用均方误差损失作为目标优化函数,以期获得较高的图像评价指标,然而重建出的图像高频信息丢失严重、纹理边缘模糊,难以满足主观视觉感受的需求。同时,现有的深度模型往往通过加深网络的方式来获得更好的重建效果,导致梯度消失问题的产生,训练难度增加。为了解决上述问题,本文提出融合感知损失的超分辨率重建算法,通过构建以生成对抗网络为主体框架的残差网络模型,提高了对低分率图像的特征重构能力,高度还原图像缺失的高频语义信息。
方法
2
本文算法模型包含生成器子网络和判别器子网络两个模块。生成器模块主要由包含稠密残差块的特征金字塔构成,每个稠密残差块的卷积层滤波器大小均为3×3。通过递进式提取图像不同尺度的高频特征完成生成器模块的重建任务。判别器模块通过在多层前馈神经网络中引入微步幅卷积和全局平均池化,有效地学习到生成器重建图像的数据分布规律,进而判断生成图像的真实性,并将判别结果反馈给生成器。最后,算法对融合了感知损失的目标函数进行优化,完成网络参数的更新。
结果
2
本文利用峰值信噪比(PSNR)和结构相似度(SSIM)两个指标作为客观评价标准,在Set5和Set14数据集上测得4倍重建后的峰值信噪比分别为31.72 dB和28.34 dB,结构相似度分别为0.892 4和0.785 6,与其他方法相比提升明显。
结论
2
结合感知损失的生成式对抗超分辨率重建算法准确恢复了图像的纹理细节,能够重建出视觉上舒适的高分辨率图像。
Objective
2
Single image super-resolution (SISR) is a research hotspot in computer vision. SISR aims to reconstruct a high-resolution image from its low-resolution counterpart and is widely used in video surveillance
remote sensing image
and medical imaging. In recent years
many researchers have concentrated on convolutional SISR networking to the massive development of deep learning. They constructed shallow convolutional networks
which perform poorly in improving the quality of reconstructed images. However
these methods adopt mean square error as objective function to obtain a high evaluation index. As a result
they are unable to characterize good edge details
thereby failing to sufficiently infer plausible high frequency. To address this problem
we propose a novel generative adversarial network (GAN) for image super-resolution combining perceptual loss to further improve SR performance. This method outperforms state-of-the-art methods by a large margin in terms of peak signal-to-noise ratio and structure similarity
resulting in noticeable improvement of the reconstruction results.
Method
2
SISR is inherently ill-posed because many solutions exist for any given low-resolution pixel. In other words
it is an underdetermined inverse problem that does not have a unique solution. Classical methods constrain the solution space by mitigating the prior information of a natural-scene image
thereby leading to unsatisfactory color analysis and context accuracy results with real high-resolution images. With its strong feature representation ability
CNN outperforms conventional methods. However
these forward CNNs for super-resolution are a single-path model that limits their reconstructive performance because they attempt to optimize the mean square error (MSE) in a pixelwise manner between the super-resolved image and the ground truth. Measuring pixel-wise difference cannot capture perceptual semantic well. Therefore
we propose a novel GAN for image super-resolution that integrates perceptual loss to boost visual performance. Our algorithm model consists of two modules:a generative subnetwork that is mainly composed of Laplacian feature pyramids and a pyramid that contains many dense residual blocks
which serve as the fundamental component. We introduce global residual learning in the identity branch of each residual unit to construct the dense residual block. Therefore
the full usage of all layers not only stabilizes the training process but also effectively preserves information flow through the network. As a result
the generative subnetwork can progressively extract different high-frequency scales of the reconstructed image. The other discriminative subnetwork is a type of forward CNN that introduces stride convolution and global average pooling to enlarge the receptive field and reduce spatial dimensions over a large image region to ensure efficient memory usage and fast inference. The discriminator estimates the probability that a generated high-resolution image came from the ground truth rather than the generative subnetwork by inspecting their feature maps and then feeds back the result to help the generator synthesize more perceptual high-frequency details. Finally
the algorithm model optimizes the objective function to complete the parameter updating.
Result
2
All experiments are implemented on the PyTorch framework. We train PSGAN (perceptual super-resolution using generative adversarial network) for 100 epochs by using 291 datasets. Following previous experiments
we transform all RGB images into YCbCr format and resolve the Y channel only because the human eye is most sensitive to this channel. We choose two standard datasets (Set5 and Set14) to verify the effectiveness of our proposed network compared with that of other state-of-the-art methods. For subjective visual evaluation
experiment results that the accuracy of all test samples is reasonable given that the perceptual quality difference between the original ground truth and our generated high-resolution image is not significant. Overall
PSGAN achieves superior clarity and barely shows a ripple effect. For objective evaluation
the average peak signal-to-noise ratio achieved by this method is 37.44 dB and 33.14 dB with scale factor 2 and 31.72 dB and 28.34 dB with scale factor 4 on Set5 and Set14
respectively. In the case of structure similarity measurement
the proposed approach obtains 0.961 4/0.892 4 on Set5 and 0.919 3/0.785 6 on Set14
respectively
which indicates PSGAN produces the best index results. In terms of perceptual measures
we calculate the FSIM of each method
and our PSGAN obtains 0.92/0.91 on Set5 and 0.92/0.88 on Set14
respectively. Experiment results demonstrate that our method improves the unsampled image quality by a large margin.
Conclusion
2
We employ a compact and recurrent CNN that mainly consists of dense residual blocks to super-resolve high-resolution image progressively. Comprehensive experiments show that PSGAN achieves considerable improvement in quantitation and visual perception against other state-of-the-art methods. This algorithm provides stronger supervision for brightness consistency and texture recovery and can be applied for photorealistic super-resolution of natural-scene images.
Zhou F, Yang W M, Liao Q M, et al. Interpolation-based image super-resolution using multisurface fitting[J]. IEEE Transactions on Image Processing, 2012, 21(7):3312-3318.[DOI:10.1109/TIP.2012.2189576]
Li X, Orchard M T. New edge-directed interpolation[J]. IEEE Transactions on Image Processing, 2001, 10(10):1521-1527.[DOI:10.1109/83.951537]
Lin Z C, Shum H Y. Fundamental limits of reconstruction-based superresolution algorithms under local translation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(1):83-97.[DOI:10.1109/TPAMI.2004.1261081]
Freeman W T, Pasztor E C, Carmichael O T. Learning low-level vision[J]. International Journal of Computer Vision, 2000, 40(1):25-47.[DOI:10.1023/A:1026501619075]
Chang H, Yeung D Y, Xiong Y M, et al. Super-resolution through neighbor embedding[C]//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE, 2004: 275-282.[ DOI: 10.1109/CVPR.2004.1315043 http://dx.doi.org/10.1109/CVPR.2004.1315043 ]
Yang J C, Wright J, Huang T, et al. Image super-resolution as sparse representation of raw image patches[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[ DOI: 10.1109/CVPR.2008.4587647 http://dx.doi.org/10.1109/CVPR.2008.4587647 ]
Yang J C, Wright J, Huang T S, et al. Image super-resolution via sparse representation[J]. IEEE Transactions on Image Processing, 2010, 19(11):2861-2873.[DOI:10.1109/TIP.2010.2050625]
Timofte R, De V, Van Gool L. Anchored neighborhood regressionfor fast example-based super-resolution[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 1920-1927.[ DOI: 10.1109/ICCV.2013.241 http://dx.doi.org/10.1109/ICCV.2013.241 ]
Timofte R, De Smet V, Van Gool L, et al. A+: adjusted anchored neighborhood regression for fast super-resolution[C]//Proceedings of the 12th Asian Conference on Computer Vision. Singapore: Springer, 2014: 111-126.[ DOI: 10.1007/978-3319-16817-3_8 http://dx.doi.org/10.1007/978-3319-16817-3_8 ]
Aharon M, Elad M, Bruckstein A M. rmK-SVD:An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11):4311-4322.[DOI:10.1109/TSP.2006.881199]
Krizhevsky A, Sutskever I, Hinton G E, et al. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012: 1097-1105.[ DOI: 10.1145/3065386 http://dx.doi.org/10.1145/3065386 ]
Dong C, Loy C C, He K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2):295-307.[DOI:10.1109/TPAMI.2015.2439281]
Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 391-407.[ DOI: 10.1007/978-3-319-46475-6_25 http://dx.doi.org/10.1007/978-3-319-46475-6_25 ]
Li Y F, Fu R D, Jin W, et al. Image super-resolution using multi-channel convolution[J]. Journal of Image and Graphics, 2017, 22(12):1690-1700.
李云飞, 符冉迪, 金炜, 等.多通道卷积的图像超分辨率方法[J].中国图象图形学报, 2017, 22(12):1690-1700. [DOI:10.11834/jig.170325]
Wang Z Y, Yang Y Z, Wang Z W, et al. Self-tuned deep super resolution[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE, 2015: 1-8.[ DOI: 10.1109/CVPRW.2015.7301266 http://dx.doi.org/10.1109/CVPRW.2015.7301266 ]
Kim J, Lee J K, Lee K M, et al. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654.[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]
Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.[DOI:10.1162/1532443033225-33223]
Kim J, Lee J K, Lee K M, et al. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1637-1645.[ DOI: 10.1109/CVPR.2016.181 http://dx.doi.org/10.1109/CVPR.2016.181 ]
Kim J, Lee J K, Lee K M, et al. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 1646-1654.[ DOI: 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ]
Lai W S, Huang J B, Ahuja N, et al. Deep pyramid networks for fast and accurate super-resolution[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 5835-5843.[ DOI: 10.1109/CVPR.2017.618 http://dx.doi.org/10.1109/CVPR.2017.618 ]
Zeiler M D, Taylor G W, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 2018-2025.[ DOI: 10.1109/ICCV.2011.6126474 http://dx.doi.org/10.1109/ICCV.2011.6126474 ]
Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, HI, USA: IEEE, 2017: 2790-2798.[ DOI: 10.1109/CVPR.2017.298 http://dx.doi.org/10.1109/CVPR.2017.298 ]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the 14th European Conference on Computer Vision.Amsterdam, The Netherlands: Springer, 2016: 694-711.[ DOI: 10.1007/978-3-319-46475-6_43 http://dx.doi.org/10.1007/978-3-319-46475-6_43 ]
Wu C Z, Chen X, Ji D, et al. Image denoising via residual network based on perceptual loss[J]. Journal of Image and Graphics, 2018, 23(10):1483-1491.
吴从中, 陈曦, 季栋, 等.结合深度残差学习和感知损失的图像去噪[J].中国图象图形学报, 2018, 23(10):1483-1491. [DOI:10.11834/jig.180069]
Yang Y, Zhang H Y, Zhu Y, et al. Class-information generative adversarial network for single image super-resolution[J]. Journal of Image and Graphics, 2018, 23(12):1777-1788.
杨云, 张海宇, 朱宇, 等.类别信息生成式对抗网络的单图超分辨重建[J].中国图象图形学报, 2018, 23(12):1777-1788. [DOI:10.11834/jig.180331]
Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014: 2672-2680.
Huang G, Liu Z, van Der Maaten L, et al. Denselyconnected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 2261-2269.[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE 2015: 1-9.[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, BC, Canada: IEEE, 2001: 416-423.[ DOI: 10.1109/ICCV.2001.937655 http://dx.doi.org/10.1109/ICCV.2001.937655 ]
Zhang L, Zhang L, Mou X Q,et al. FSIM:a feature similarity index for image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(8):2378-2386.[DOI:10.1109/TIP.2011.2109730]
相关作者
相关机构
京公网安备11010802024621