深度学习视频超分辨率技术综述

江俊君; 程豪; 李震宇; 刘贤明; 王中元

doi:10.11834/jig.220130

综述 | 浏览量 : 0 下载量: 6 CSCD: 1

PDF
导出
分享
收藏
专辑

深度学习视频超分辨率技术综述
Deep learning based video-related super-resolution technique： a survey
2023年28卷第7期页码：1927-1964
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.220130
稿件说明：

移动端阅览

江俊君，程豪，李震宇，刘贤明，王中元. 2023. 深度学习视频超分辨率技术综述. 中国图象图形学报， 28(07):1927-1964

Jiang Junjun， Cheng Hao， Li Zhenyu， Liu Xianming， Wang Zhongyuan. 2023. Deep learning based video-related super-resolution technique： a survey. Journal of Image and Graphics， 28(07):1927-1964
江俊君，程豪，李震宇，刘贤明，王中元. 2023. 深度学习视频超分辨率技术综述. 中国图象图形学报， 28(07):1927-1964 DOI： 10.11834/jig.220130.

Jiang Junjun， Cheng Hao， Li Zhenyu， Liu Xianming， Wang Zhongyuan. 2023. Deep learning based video-related super-resolution technique： a survey. Journal of Image and Graphics， 28(07):1927-1964 DOI： 10.11834/jig.220130.

摘要

视频超分辨率技术在卫星遥感侦测、视频监控和医疗影像等方面发挥着关键作用，在各领域具有广阔的应用前景，受到广泛关注，但传统的视频超分辨率算法具有一定局限性。随着深度学习技术的愈发成熟，基于深度神经网络的超分辨率算法在性能上取得了长足进步。充分融合视频时空信息可以快速高效地恢复真实且自然的纹理，视频超分辨率算法因其独特的优势成为一个研究热点。本文系统地对基于深度学习的视频超分辨率的研究进展进行详细综述，对基于深度学习的视频超分辨率技术的数据集和评价指标进行全面归纳，将现有视频超分辨率方法按研究思路分成两大类，即基于图像配准的视频超分辨率方法和非图像配准的视频超分辨率方法，并进一步立足于深度卷积神经网络的模型结构、模型优化历程和运动估计补偿的方法将视频超分辨率网络细分为10个子类，同时利用充足的实验数据对每种方法的核心思想以及网络结构的优缺点进行了对比分析。尽管视频超分辨率网络的重建效果在不断优化，模型参数量在逐渐降低，训练和推理速度在不断加快，然而已有的网络模型在性能上仍然存在提升的潜能。本文对基于深度学习的视频超分辨率技术存在的挑战和未来的发展前景进行了讨论。

Abstract

Video-related super-resolution （VSR） technique can be focused on high-resolution video profiling and restoration to optimize its low-resolution version-derived quality. It has been developing intensively in relevant to such domains like satellite remote sensing detection， video surveillance， medical imaging， and low-involved electronics. To reconstruct high-resolution frames， conventional video-relevant super-resolution methods can be used to estimate potential motion status and blur kernel parameters， which are challenged for multiscene hetegerneity. Due to the quick response ability of fully integrating video spatio-temporal information of real and natural textures， the emerging deep learning based video super-resolution algorithms have been developing dramatically. We review and analyze current situation of deep learning based video super-resolution systematically and literately. First， popular YCbCr datasets are introduced like YUV25， YUV21， ultra video group（UVG）， and the RGB datasets are involved in as well， such as video 4 （Vid4）， realistic and dynamic scenes （REDS）， Vimeo90K. The profile information of each dataset is summarized， including its name， year of publication， number of videos， frame number， and resolution. Furthermore， key parameters of the video super-resolution algorithm are introduced in detail in terms of peak signal-to-noise ratio （PSNR）， structural similarity （SSIM）， video quality model for variable frame delay （VQM_VFD）， and learned perceptual image patch similarity （LPIPS）. For the concept of video super-resolution and single image super-resolution， the difference between video super-resolution and single image super-resolution can be shown and the former one has richer video frames-interrelated motion information. If the video is processed frame by frame in terms of the single image super-resolution method， there would be a large number of artifacts in the reconstructed video. We carry out deep learning based video super-resolution methods analysis and it has two key technical challenges of those are image alignment and feature integration. For image alignment， its option of image alignment module is challenged for severe hetergeneity between video super-resolution methods. Image alignment and non-alignment methods are categorized. The integration of multi-frame information is based on the network structure like generative adversarial networks （GAN）， recurrent convolutional neural networks （RNN）， and Transformer. To process video feature and make neighboring frames align with the target frame， image-aligned methods can use different motion estimation and motion compensation module. Image alignment methods can be segmented into three alignment-related categories： optical flow， kernel， and convolution-deformable. This optical flow alignment method can be used to calculate the motion flows between two frames through their pixels-between gray changes in temporal and the neighboring frames are warped by motion compensation module. We divide them into four categories in terms of the optical flow alignment-relevant model structure of deep convolutional neural network （CNN） further： 2D convolution， RNN， GAN， and Transformer. For optical flow-aligned 2D convolution methods analysis， we mainly introduce video efficient sub-pixel convolutional network （VESPCN） and its improvement on optical flow estimation network and motion compensation network， such as ToFlow and spatial-temporal transformer network （STTN）. For the RNN methods with optical flow alignment， we analyze residual recurrent convolutional network （RRCN）， recurrent back-projection network （RBPN） and other related methods using optical flow to align neighboring frames at the image level， which is required to resolve the constraints of the sliding window methods. Therefore， to obtain excellent reconstruction performance， we focus on BasicVSR （basic video super-resolution）， IconVSR （information-refill mechanism and coupled propagation video super-resolution） and other networks， which can warp neighboring frames at the feature level. The optical flow alignment-based TecoGAN （temporal coherence via self-supervision for gan-based video generation） and VSR Transformer methods are introduced in detail as well. Due to a few kernel-based and deformable convolution-based align methods， it is still a challenging issue for classify network structure. Because convolution kernel size can used to limit the range of motion estimation， the reconstruction performance of the kernel-based alignment methods is relatively poor. Specifically， deformable convolution is a sampling improvement of conventional convolution， which still has some gaps to be bridged like high computational complexity and harsh convergence conditions. For non-alignment methods， multiple network structures are challenged for video frames-between correlation to a certain extent. We review and analyze the methods in related to non-aligned 3D convolution， non-aligned RNN， alignment-excluded GAN， and non-local. The non-alignment RNN methods consist of recurrent latent space propagation （RLSP）， recurrent residual network （RRN） and omniscient video super-resolution （OVSR） and it demonstrates that a balance can be achieved between reconstruction speed and visual quality. To reduce the computational cost， the improved non-local module is focused on when alignment-excluded non-local methods are introduced. All models are tested with 4× downsampling using two degradations like bicubic interpolation （BI） and blur downsampling （BD）. The multiple datasets-based quantitative results， speed comparison of the super-resolution methods are summarized as well， including REDS4， UDM10， and Vid4. Some effects can be optimized. The reconstruction performances of these video-based super-resolution networks are balanced in consistency， the parameters of the model are gradually shrinked， and the speed of training and reasoning is accelerated as well. However， the application of deep learning in video super-resolution is still to be facilitated more. We predict that it is necessary to improve the adaptability of the network and validate the traced result. Current deep learning technologies can be introduced on the nine aspects as mentioned below： network training and optimization， ultra-high resolution-oriented video super-resolution for， video-compressed super-resolution video-rescaling methods， self-supervised video super-resolution， various-scaled video super-resolution， spatio-temporal video super-resolution， auxiliary task-guided video super-resolution，and scenario-customized video super-resolution.

关键词

深度学习视频超分辨率（VSR）图像配准运动估计运动补偿

Keywords

deep learningvideo super-resolution （VSR）image alignmentmotion estimationmotion compensation

references

Ahmadi A and Patras I. 2016. Unsupervised convolutional neural networks for motion estimation//Proceedings of 2016 IEEE International Conference on Image Processing （ICIP）. Phoenix， USA： IEEE： 1629-1633 ［DOI： 10.1109/ICIP.2016.7532634http://dx.doi.org/10.1109/ICIP.2016.7532634］

Banham M R and Katsaggelos A K. 1997. Digital image restoration. IEEE Signal Processing Magazine， 14（2）： 24-41 ［DOI： 10.1109/79.581363http://dx.doi.org/10.1109/79.581363］

Bao W B， Lai W S， Ma C， Zhang X Y， Gao Z Y and Yang M H. 2019a. Depth-aware video frame interpolation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3698-3707 ［DOI： 10.1109/CVPR.2019.00382http://dx.doi.org/10.1109/CVPR.2019.00382］

Bao W B， Lai W S， Zhang X Y， Gao Z Y and Yang M H. 2019b. Memc-net： motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（3）： 933-948 ［DOI： 10.1109/TPAMI.2019.2941941http://dx.doi.org/10.1109/TPAMI.2019.2941941］

Bare B， Yan B， Ma C X and Li K. 2019. Real-time video super-resolution via motion convolution kernel estimation. Neurocomputing， 367： 236-245 ［DOI： 10.1016/j.neucom.2019.07.089http://dx.doi.org/10.1016/j.neucom.2019.07.089］

Bertasius G， Torresani L and Shi J B. 2018. Object detection in video with spatiotemporal sampling networks//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 342-357 ［DOI： 10.1007/978-3-030-01258-8_21http://dx.doi.org/10.1007/978-3-030-01258-8_21］

Bouguet J Y. 2001. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation， 5（4）： 1-10

Brox T， Bruhn A， Papenberg N and Weickert J. 2004. High accuracy optical flow estimation based on a theory for warping//Proceedings of the 8th European Conference on Computer Vision （ECCV）. Prague， Czech Republic： Springer： 25-36 ［DOI： 10.1007/978-3-540-24673-2_3http://dx.doi.org/10.1007/978-3-540-24673-2_3］

Caballero J， Ledig C， Aitken A， Acosta A， Totz J， Wang Z H and Shi W Z. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 2848-2857 ［DOI： 10.1109/CVPR.2017.304http://dx.doi.org/10.1109/CVPR.2017.304］

Cao J Z， Li Y W， Zhang K and van Gool L. 2021. Video super-resolution transformer ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/2106.06847.pdfhttps://arxiv.org/pdf/2106.06847.pdf

Chan K C K， Wang X T， Yu K， Dong C and Loy C C. 2021a. Basicvsr： the search for essential components in video super-resolution and beyond//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 4945-4954 ［DOI： 10.1109/CVPR46437.2021.00491http://dx.doi.org/10.1109/CVPR46437.2021.00491］

Chan K C K， Zhou S C， Xu X Y and Loy C C. 2021b. Basicvsr++： improving video super-resolution with enhanced propagation and alignment//Proceedings of 2022 IEEE/CVF conference on computer vision and pattern recognition （CVPR）. New Orleans， USA： IEEE： 5972-5981 ［DOI： 10.1109/CVPR52688.2022.00588http://dx.doi.org/10.1109/CVPR52688.2022.00588］

Chen J L， Tan X， Shan C W， Liu S and Chen Z B. 2020. VESR-Net： the winning solution to YouKu video enhancement and super-resolution challenge ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/2003.02115.pdfhttps://arxiv.org/pdf/2003.02115.pdf

Chen P L， Yang W H， Wang M， Sun L， Hu K K and Wang S Q. 2021. Compressed domain deep video super-resolution. IEEE Transactions on Image Processing， 30： 7156-7169 ［DOI： 10.1109/TIP.2021.3101826http://dx.doi.org/10.1109/TIP.2021.3101826］

Chen Y， Tai Y， Liu X M， Shen C H and Yang J. 2018. FSRNet： end-to-end learning face super-resolution with facial priors//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2492-2501 ［DOI： 10.1109/CVPR.2018.00264http://dx.doi.org/10.1109/CVPR.2018.00264］

Cheng D Q， Guo X， Chen L L， Kou Q Q， Zhao K and Gao R. 2021. Image super-resolution reconstruction from multi-channel recursive residual network. Journal of Image and Graphics， 26（3）： 605-618

程德强，郭昕，陈亮亮，寇旗旗，赵凯，高蕊. 2021. 多通道递归残差网络的图像超分辨率重建. 中国图象图形学报， 26（3）： 605-618 ［DOI： 10.11834/jig.200108http://dx.doi.org/10.11834/jig.200108］

Cheng M H， Lin N W， Hwang K S and Jeng J H. 2012. Fast video super-resolution using artificial neural networks//Proceedings of the 8th International Symposium on Communication Systems， Networks and Digital Signal Processing （CSNDSP）. Poznan， Poland： IEEE： 1-4 ［DOI： 10.1109/CSNDSP.2012.6292646http://dx.doi.org/10.1109/CSNDSP.2012.6292646］

Chu M Y， Xie Y， Mayer J， Leal-Taixé L and Thuerey N. 2020. Learning temporal coherence via self-supervision for gan-based video generation. ACM Transactions on Graphics， 39（4）： #75 ［DOI： 10.1145/3386569.3392457http://dx.doi.org/10.1145/3386569.3392457］

Cui Z， Chang H， Shan S G， Zhong B N and Chen X L. 2014. Deep network cascade for image super-resolution//Proceedings of the 13th European Conference on Computer Vision （ECCV）. Zurich， Switzerland： Springer： 49-64 ［DOI： 10.1007/978-3-319-10602-1_4http://dx.doi.org/10.1007/978-3-319-10602-1_4］

Dai J F， Qi H Z， Xiong Y W， Li Y， Zhang G D， Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV）. Venice， Italy： IEEE： 764-773 ［DOI： 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89］

Dong C， Loy C C， He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision （ECCV）. Zurich， Switzerland： Springer： 184-199 ［DOI： 10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13］

Dong C， Loy C C， He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 38（2）： 295-307 ［DOI： 10.1109/TPAMI.2015.2439281http://dx.doi.org/10.1109/TPAMI.2015.2439281］

Dosovitskiy A， Fischer P， Ilg E， Häusser P， Hazirbas C， Golkov V， van der Smagt P， Cremers D and Brox T. 2015. FlowNet： learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 2758-2766 ［DOI： 10.1109/ICCV.2015.316http://dx.doi.org/10.1109/ICCV.2015.316］

Drulea M and Nedevschi S. 2011. Total variation regularization of local-global optical flow//Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems （ITSC）. Washington， USA： IEEE： 318-323 ［DOI： 10.1109/ITSC.2011.6082986http://dx.doi.org/10.1109/ITSC.2011.6082986］

Dutta S， Shah N A and Mittal A. 2021. Efficient space-time video super resolution using low-resolution flow and mask upsampling//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Nashville， USA： IEEE： 314-323 ［DOI： 10.1109/CVPRW53098.2021.00041http://dx.doi.org/10.1109/CVPRW53098.2021.00041］

Ebadi S E， Ones V G and Izquierdo E. 2017. Uhd video super-resolution using low-rank and sparse decomposition//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops （ICCVW）. Venice， Italy： IEEE： 1889-1897 ［DOI： 10.1109/ICCVW.2017.223http://dx.doi.org/10.1109/ICCVW.2017.223］

Farnebäck G. 2003. Two-frame motion estimation based on polynomial expansion//Proceedings of the 13th Scandinavian Conference on Image Analysis. Halmstad， Sweden： Springer： 363-370 ［DOI： 10.1007/3-540-45103-X_50http://dx.doi.org/10.1007/3-540-45103-X_50］

Fuoli D， Gu S H and Timofte R. 2019. Efficient video super-resolution through recurrent latent space propagation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW）. Seoul， Korea （South）： IEEE： 3476-3485 ［DOI： 10.1109/ICCVW.2019.00431http://dx.doi.org/10.1109/ICCVW.2019.00431］

Fuoli D， Huang Z W， Gu S H， Timofte R， Raventos A， Esfandiari A， Karout S， Xu X， Li X， Xiong X， Wang J G， Michelini P N， Zhang W H， Zhang D Y， Zhu H W， Xia D， Chen H Y， Gu J J， Zhang Z， Zhao T T， Zhao S S， Akita K， Ukita N， Hrishikesh P S， Puthussery D and Jiji C V. 2020. AIM 2020 challenge on video extreme super-resolution： methods and results//Proceedings of 2020 European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 57-81 ［DOI： 10.1007/978-3-030-66823-5_4http://dx.doi.org/10.1007/978-3-030-66823-5_4］

Ganin Y， Kononenko D， Sungatullina D and Lempitsky V. 2016. DeepWarp： photorealistic image resynthesis for gaze manipulation//Proceedings of the 14th European Conference on Computer Vision （ECCV）. Amsterdam， the Netherlands： Springer： 311-326 ［DOI： 10.1007/978-3-319-46475-6_20http://dx.doi.org/10.1007/978-3-319-46475-6_20］

Gao H， Zhu X Z， Lin S and Dai J F. 2019. Deformable kernels： adapting effective receptive fields for object deformation ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1910.02940v1.pdfhttps://arxiv.org/pdf/1910.02940v1.pdf

Goodfellow I， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial networks. Communications of the ACM， 63（11）： 139-144 ［DOI： 10.1145/3422622http://dx.doi.org/10.1145/3422622］

Graves A， Fern􀆦ndez S and Schmidhuber J. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition//Proceedings of the 15th International Conference on Artificial Neural Networks： Formal Models and Their Applications. Warsaw， Poland： Springer： 799-804 ［DOI： 10.1007/11550907_126http://dx.doi.org/10.1007/11550907_126］

Gunturk B K， Batur A U， Altunbasak Y， Hayes M H and Mersereau R M. 2003. Eigenface-domain super-resolution for face recognition. IEEE Transactions on Image Processing， 12（5）： 597-606 ［DOI： 10.1109/TIP.2003.811513http://dx.doi.org/10.1109/TIP.2003.811513］

Guo J and Chao H Y. 2017. Building an end-to-end spatial-temporal convolutional network for video super-resolution. Proceedings of the AAAI Conference on Artificial Intelligence， 31（1）： 4053-4060 ［DOI： 10.1609/aaai.v31i1.11228http://dx.doi.org/10.1609/aaai.v31i1.11228］

Handa A， Bloesch M， Pătrăucean V， Stent S， McCormac J and Davison A. 2016. Gvnn： neural network library for geometric computer vision//Proceedings of 2016 European Conference on Computer Vision （ECCV）. Amsterdam， the Netherlands： Springer： 67-82 ［DOI： 10.1007/978-3-319-49409-8_9http://dx.doi.org/10.1007/978-3-319-49409-8_9］

Haris M， Shakhnarovich G and Ukita N. 2019. Recurrent back-projection network for video super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 3892-3901 ［DOI： 10.1109/CVPR.2019.00402http://dx.doi.org/10.1109/CVPR.2019.00402］

Haris M， Shakhnarovich G and Ukita N. 2020. Space-time-aware multi-resolution video enhancement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Atlanta， USA： IEEE： 2856-2865 ［DOI： 10.1109/CVPR42600.2020.00293http://dx.doi.org/10.1109/CVPR42600.2020.00293］

Harris J L. 1964. Diffraction and resolving power. Journal of the Optical Society of America， 54（7）： 931-936 ［DOI： 10.1364/josa.54.000931http://dx.doi.org/10.1364/josa.54.000931］

He K M， Zhang X Y， Ren S Q and Sun J. 2015. Delving deep into rectifiers： surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 1026-1034 ［DOI： 10.1109/ICCV.2015.123http://dx.doi.org/10.1109/ICCV.2015.123］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

He X H， Wu Y Y， Chen W L and Qing L B. 2011. A survey of video super-resolution reconstruction technology. Information and Electronic Engineering， 9（1）： 1-6

何小海，吴媛媛，陈为龙，卿粼波. 2011. 视频超分辨率重建技术综述. 信息与电子工程， 9（1）： 1-6 ［DOI： 10.3969/j.issn.1672-2892.2011.01.001http://dx.doi.org/10.3969/j.issn.1672-2892.2011.01.001］

Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation， 9（8）： 1735-1780 ［DOI： 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735］

Huang G， Liu Z， van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 2261-2269 ［DOI： 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243］

Huang T S and Tsai R Y. 1984. Multiframe image restoration and registration//Advances in Computer Vision and Image Processing. Greenwich， UK： JAI Press： 317-339

Huang Y， Wang W and Wang L. 2015. Bidirectional recurrent convolutional networks for multi-frame super-resolution ［EB/OL］. ［2022-02-08］. http://cognn.com/papers/24%20NIPS%202015%20Yan%20bidirecional-recurrent-convolutional-networks-for-multi-frame-super-resolution-Paper.pdfhttp://cognn.com/papers/24%20NIPS%202015%20Yan%20bidirecional-recurrent-convolutional-networks-for-multi-frame-super-resolution-Paper.pdf

Huang Y， Wang W and Wang L. 2018. Video super-resolution via bidirectional recurrent convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 1015-1028 ［DOI： 10.1109/TPAMI.2017.2701380http://dx.doi.org/10.1109/TPAMI.2017.2701380］

Huang Y C， Chen Y H， Lu C Y， Wang H P， Peng W H and Huang C C. 2021. Video rescaling networks with joint optimization strategies for downscaling and upscaling//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 3526-3535 ［DOI： 10.1109/CVPR46437.2021.00353http://dx.doi.org/10.1109/CVPR46437.2021.00353］

Hui Z， Li J， Gao X B and Wang X M. 2021. Progressive perception-oriented network for single image super-resolution. Information Sciences， 546： 769-786 ［DOI： 10.1016/j.ins.2020.08.114http://dx.doi.org/10.1016/j.ins.2020.08.114］

Ilg E， Mayer N， Saikia T， Keuper M， Dosovitskiy A and Brox T. 2017. FlowNet 2.0： evolution of optical flow estimation with deep networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 1647-1655 ［DOI： 10.1109/CVPR.2017.179http://dx.doi.org/10.1109/CVPR.2017.179］

Isobe T， Jia X， Gu S H， Li S J， Wang S J and Tian Q. 2020a. Video super-resolution with recurrent structure-detail network//Proceedings of the 16th European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 645-660 ［DOI： 10.1007/978-3-030-58610-2_38http://dx.doi.org/10.1007/978-3-030-58610-2_38］

Isobe T， Li S J， Jia X， Yuan S X， Slabaugh G， Xu C J， Li Y L， Wang S J and Tian Q. 2020b. Video super-resolution with temporal group attention//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 8005-8014 ［DOI： 10.1109/CVPR42600.2020.00803http://dx.doi.org/10.1109/CVPR42600.2020.00803］

Isobe T， Zhu F， Jia X and Wang S J. 2020c. Revisiting temporal modeling for video super-resolution ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/2008.05765.pdfhttps://arxiv.org/pdf/2008.05765.pdf

Jaderberg M， Simonyan K， Zisserman A and Kavukcuoglu K. 2016. Spatial transformer networks ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1506.02025.pdfhttps://arxiv.org/pdf/1506.02025.pdf

Ji S W， Xu W， Yang M and Yu K. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 35（1）： 221-231 ［DOI： 10.1109/TPAMI.2012.59http://dx.doi.org/10.1109/TPAMI.2012.59］

Jing Y C， Yang Y D， Wang X C， Song M L and Tao D C. 2021. Turning frequency to resolution： video super-resolution via event cameras//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 7768-7777 ［DOI： 10.1109/CVPR46437.2021.00768http://dx.doi.org/10.1109/CVPR46437.2021.00768］

Jo Y， Oh S W， Kang J and Kim S J. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3224-3232 ［DOI： 10.1109/CVPR.2018.00340http://dx.doi.org/10.1109/CVPR.2018.00340］

Kalarot R and Porikli F. 2019. MultiBoot Vsr： multi-stage multi-reference bootstrapping for video super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Long Beach， USA： IEEE： 2060-2069 ［DOI： 10.1109/CVPRW.2019.00258http://dx.doi.org/10.1109/CVPRW.2019.00258］

Kappeler A， Yoo S， Dai Q Q and Katsaggelos A K. 2016. Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging， 2（2）： 109-122 ［DOI： 10.1109/TCI.2016.2532323http://dx.doi.org/10.1109/TCI.2016.2532323］

Kim H， Hong S， Han B， Myeong H and Lee K M. 2019a. Fine-grained neural architecture search ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1911.07478.pdfhttps://arxiv.org/pdf/1911.07478.pdf

Kim J， Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 1646-1654 ［DOI： 10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182］

Kim S Y， Lim J， Na T and Kim M. 2019b. Video super-resolution based on 3D-CNNS with consideration of scene change//Proceedings of 2019 IEEE International Conference on Image Processing （ICIP）. Taipei， China： IEEE： 2831-2835 ［DOI： 10.1109/ICIP.2019.8803297http://dx.doi.org/10.1109/ICIP.2019.8803297］

Kim T H， Sajjadi M S M， Hirsch M and Schölkopf B. 2018. Spatio-temporal transformer network for video restoration//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 111-127 ［DOI： 10.1007/978-3-030-01219-9_7http://dx.doi.org/10.1007/978-3-030-01219-9_7］

Lai W S， Huang J B， Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 5835-5843 ［DOI： 10.1109/CVPR.2017.618http://dx.doi.org/10.1109/CVPR.2017.618］

Ledig C， Theis L， Husz􀆦r F， Caballero J， Cunningham A， Acosta A， Aitken A， Tejani A， Totz J， Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 105-114 ［DOI： 10.1109/CVPR.2017.19http://dx.doi.org/10.1109/CVPR.2017.19］

Lertrattanapanich S and Bose N K. 1999. Latest results on high-resolution reconstruction from video sequences ［EB/OL］. ［2022-02-08］. https://www.semanticscholar.org/paper/Latest-Results-on-High-Resolution-ReconstructionLertrattanapanich/bd8bc32eaf0ffd502d008c36f2c1d870e12ea238https://www.semanticscholar.org/paper/Latest-Results-on-High-Resolution-ReconstructionLertrattanapanich/bd8bc32eaf0ffd502d008c36f2c1d870e12ea238

Li D Y， Liu Y and Wang Z F. 2019a. Video super-resolution using non-simultaneous fully recurrent convolutional network. IEEE Transactions on Image Processing， 28（3）： 1342-1355 ［DOI： 10.1109/TIP.2018.2877334http://dx.doi.org/10.1109/TIP.2018.2877334］

Li D Y and Wang Z F. 2017. Video superresolution via motion compensation and deep residual learning. IEEE Transactions on Computational Imaging， 3（4）： 749-762 ［DOI： 10.1109/TCI.2017.2671360http://dx.doi.org/10.1109/TCI.2017.2671360］

Li K， Bare B， Yan B， Feng B L and Yao C F. 2018. Face hallucination based on key parts enhancement//Proceedings of 2018 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Calgary， Canada： IEEE： 1378-1382 ［DOI： 10.1109/ICASSP.2018.8462170http://dx.doi.org/10.1109/ICASSP.2018.8462170］

Li S， He F X， Du B， Zhang L F， Xu Y H and Tao D C. 2019b. Fast spatio-temporal residual network for video super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10514-10523 ［DOI： 10.1109/CVPR.2019.01077http://dx.doi.org/10.1109/CVPR.2019.01077］

Li S L， Feng C L， Yu K， Liu X， Jiang X and Zhao D Z. 2022. Critical review of human cardiac magnetic resonance image super resolution reconstruction based on deep learning method. Journal of Image and Graphics， 27（3）： 704-721

李书林，冯朝路，于鲲，刘鑫，江鑫，赵大哲. 2022. 基于深度学习的心脏磁共振影像超分辨率前沿进展. 中国图象图形学报， 27（3）： 704-721 ［DOI： 10.11834/jig.210150http://dx.doi.org/10.11834/jig.210150］

Li W B， Tao X， Guo T A， Qi L， Lu J B and Jia J Y. 2020. MuCAN： multi-correspondence aggregation network for video super-resolution//Proceedings of the 16th European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 335-351 ［DOI： 10.1007/978-3-030-58607-2_20http://dx.doi.org/10.1007/978-3-030-58607-2_20］

Li Y， Jin P， Yang F， Liu C， Yang M H and Milanfar P. 2021. COMISR： compression-informed video super-resolution//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 2543-2552 ［DOI： 10.1109/ICCV48922.2021.00254http://dx.doi.org/10.1109/ICCV48922.2021.00254］

Liao R J， Tao X， Li R Y， Ma Z Y and Jia J Y. 2015. Video super-resolution via deep draft-ensemble learning//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 531-539 ［DOI： 10.1109/ICCV.2015.68http://dx.doi.org/10.1109/ICCV.2015.68］

Liu C and Sun D Q. 2011. A Bayesian approach to adaptive video super resolution//Proceedings of 2011 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Colorado Springs， USA： IEEE： 209-216 ［DOI： 10.1109/CVPR.2011.5995614http://dx.doi.org/10.1109/CVPR.2011.5995614］

Liu C and Sun D Q. 2014. On bayesian adaptive video super resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence， 36（2）： 346-360 ［DOI： 10.1109/TPAMI.2013.127http://dx.doi.org/10.1109/TPAMI.2013.127］

Liu D， Wang Z W， Fan Y C， Liu X M， Wang Z Y， Chang S Y and Huang T. 2017. Robust video super-resolution with learned temporal dynamics//Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV）. Venice， Italy： IEEE： 2526-2534 ［DOI： 10.1109/ICCV.2017.274http://dx.doi.org/10.1109/ICCV.2017.274］

Liu H Y， Ruan Z B， Zhao P， Dong C， Shang F H， Liu Y Y， Yang L L and Timofte R. 2022. Video super resolution based on deep learning： a comprehensive survey. Artificial Intelligence Review， 55（8）： 5981-6035 ［DOI： 10.1007/s10462-022-10147-yhttp://dx.doi.org/10.1007/s10462-022-10147-y］

Liu S L， Zheng C J， Lu K D， Gao S， Wang N， Wang B F， Zhang D K， Zhang X F and Xu T Y. 2021. EVSRNet： efficient video super-resolution with neural architecture search//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Nashville， USA： IEEE： 2480-2485 ［DOI： 10.1109/CVPRW53098.2021.00281http://dx.doi.org/10.1109/CVPRW53098.2021.00281］

Lucas A， Lopez-Tapia S， Molina R and Katsaggelos A K. 2019. Generative adversarial networks and perceptual losses for video super-resolution. IEEE Transactions on Image Processing， 28（7）： 3312-3327 ［DOI： 10.1109/TIP.2019.2895768http://dx.doi.org/10.1109/TIP.2019.2895768］

Lucas B D and Kanade T. 1981. An iterative image registration technique with an application to stereo vision//Proceedings of the 7th international joint conference on Artificial intelligence. Vancouver BC， Canada： Morgan Kaufmann Publishers Inc： 674-679

Mao X J， Shen C H and Yang Y B. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 2810-2818 ［DOI： 10.5555/3157382.3157412http://dx.doi.org/10.5555/3157382.3157412］

Nah S， Baik S， Hong S， Moon G， Son S， Timofte R and Lee K M. 2019a. NTIRE 2019 challenge on video deblurring and super-resolution： dataset and study//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Long Beach， USA： IEEE： 1996-2005 ［DOI： 10.1109/CVPRW.2019.00251http://dx.doi.org/10.1109/CVPRW.2019.00251］

Nah S， Timofte R， Gu S H， Baik S， Hong S， Moon G， Son S， Lee K M， Wang X T， Chan K C K， Yu K， Dong C， Loy C C， Fan Y C， Yu J H， Liu D， Huang T S， Liu X， Li C， He D L， Ding Y K， Wen S L， Porikli F， Kalarot R， Haris M， Shakhnarovich G， Ukita N， Yi P， Wang Z Y， Jiang K， Jiang J J， Ma J Y， Dong H， Zhang X Y， Hu Z， Kim K， Kang D U， Chun S Y， Purohit K， Rajagopalan A N， Tian Y P， Zhang Y L， Fu Y， Xu C L， Tekalp A M， Yilmaz M A， Korkmaz C， Sharma M， Makwana M， Badhwar A， Singh A P， Upadhyay A， Mukhopadhyay R， Shukla A， Khanna D， Mandal A S， Chaudhury S， Miao S， Zhu Y X and Huo X. 2019b. NTIRE 2019 challenge on video super-resolution： methods and results//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Long Beach， USA： IEEE： 1985-1995 ［DOI： 10.1109/CVPRW.2019.00250http://dx.doi.org/10.1109/CVPRW.2019.00250］

Nazeri K， Thasarathan H and Ebrahimi M. 2019. Edge-informed single image super-resolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop （CVPRW）. Seoul， Korea （South）： IEEE： 3275-3284 ［DOI： 10.1109/ICCVW.2019.00409http://dx.doi.org/10.1109/ICCVW.2019.00409］

Niklaus S， Mai L and Liu F. 2017a. Video frame interpolation via adaptive separable convolution//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 261-270 ［DOI： 10.1109/ICCV.2017.37http://dx.doi.org/10.1109/ICCV.2017.37］

Niklaus S， Mai L and Liu F. 2017b. Video frame interpolation via adaptive convolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2270-2279 ［DOI： 10.1109/CVPR.2017.244http://dx.doi.org/10.1109/CVPR.2017.244］

Patraucean V， Handa A and Cipolla R. 2016. Spatio-temporal video autoencoder with differentiable memory ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1511.06309.pdfhttps://arxiv.org/pdf/1511.06309.pdf

Protter M， Elad M， Takeda H and Milanfar P. 2009. Generalizing the nonlocal-means to super-resolution reconstruction. IEEE Transactions on Image Processing， 18（1）： 36-51 ［DOI： 10.1109/TIP.2008.2008067http://dx.doi.org/10.1109/TIP.2008.2008067］

Ranjan A and Black M J. 2017. Optical flow estimation using a spatial pyramid network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 2720-2729 ［DOI： 10.1109/CVPR.2017.291http://dx.doi.org/10.1109/CVPR.2017.291］

Revaud J， Weinzaepfel P， Harchaoui Z and Schmid C. 2015. EpicFlow： edge-preserving interpolation of correspondences for optical flow//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 1164-1172 ［DOI： 10.1109/CVPR.2015.7298720http://dx.doi.org/10.1109/CVPR.2015.7298720］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention （MICCAI）. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Sajjadi M S M， Vemulapalli R and Brown M. 2018. Frame-recurrent video super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6626-6634 ［DOI： 10.1109/CVPR.2018.00693http://dx.doi.org/10.1109/CVPR.2018.00693］

Schuster M and Paliwal K K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing， 45（11）： 2673-2681 ［DOI： 10.1109/78.650093http://dx.doi.org/10.1109/78.650093］

Seshadrinathan K and Bovik A C. 2010. Motion tuned spatio-temporal quality assessment of natural videos. IEEE Transactions on Image Processing， 19（2）： 335-350 ［DOI： 10.1109/TIP.2009.2034992http://dx.doi.org/10.1109/TIP.2009.2034992］

Shahar O， Faktor A and Irani M. 2011. Space-time super-resolution from a single video//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Colorado Springs， USA： IEEE： 3353-3360 ［DOI： 10.1109/CVPR.2011.5995360http://dx.doi.org/10.1109/CVPR.2011.5995360］

Sheikh H R， Sabir M F and Bovik A C. 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing， 15（11）： 3440-3451 ［DOI： 10.1109/TIP.2006.881959http://dx.doi.org/10.1109/TIP.2006.881959］

Shi W Z， Caballero J， Husz􀆦r F， Totz， Aitken A P， Bishop R， Rueckert D and Wang Z. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 1874-1883 ［DOI： 10.1109/CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207］

Shi X J， Chen Z R， Wang H， Yeung D Y， Wong W K and Woo W C. 2015. Convolutional LSTM network： a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 802-810

Singh A and Singh J. 2020. Survey on single image based super-resolutio— implementation challenges and solutions. Multimedia Tools and Applications， 79（3）： 1641-1672 ［DOI： 10.1007/s11042-019-08254-0http://dx.doi.org/10.1007/s11042-019-08254-0］

Sun D Q， Yang X D， Liu M Y and Kautz J. 2018a. PWC-Net： CNNs for optical flow using pyramid， warping， and cost volume//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 8934-8943 ［DOI： 10.1109/CVPR.2018.00931http://dx.doi.org/10.1109/CVPR.2018.00931］

Sun X， Xiao B， Wei F Y， Liang S and Wei Y C. 2018b. Integral human pose regression//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 536-553 ［DOI： 10.1007/978-3-030-01231-1_33http://dx.doi.org/10.1007/978-3-030-01231-1_33］

Szegedy C， Liu W， Jia Y Q， Sermanet P， Reed S， Anguelov D， Erhan D， Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 1-9 ［DOI： 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594］

Szegedy C， Vanhoucke V， Ioffe S， Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 2818-2826 ［DOI： 10.1109/CVPR.2016.308http://dx.doi.org/10.1109/CVPR.2016.308］

Tao X， Gao H Y， Liao R J， Wang J and Jia J Y. 2017. Detail-revealing deep video super-resolution//Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV）. Venice， Italy： IEEE： 4482-4490 ［DOI： 10.1109/ICCV.2017.479http://dx.doi.org/10.1109/ICCV.2017.479］

Tian Y P， Zhang Y L， Fu Y and Xu C L. 2020. TDAN： Temporally-deformable alignment network for video super-resolution//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 3357-3366 ［DOI： 10.1109/CVPR42600.2020.00342http://dx.doi.org/10.1109/CVPR42600.2020.00342］

Toderici G， O'Malley S M， Hwang S J， Vincent D， Minnen D， Baluja S， Covell M and Sukthankar R. 2016. Variable rate image compression with recurrent neural networks ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1511.06085v5.pdfhttps://arxiv.org/pdf/1511.06085v5.pdf

Tong T， Li G， Liu X J and Gao Q Q. 2017. Image super-resolution using dense skip connections//Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV）. Venice， Italy： IEEE： 4809-4817 ［DOI： 10.1109/ICCV.2017.514http://dx.doi.org/10.1109/ICCV.2017.514］

Tran D， Bourdev L， Fergus R， Torresani L and Paluri M. 2015. Learning spatiotemporal features with 3D convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 4489-4497 ［DOI： 10.1109/ICCV.2015.510http://dx.doi.org/10.1109/ICCV.2015.510］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1706.03762.pdfhttps://arxiv.org/pdf/1706.03762.pdf

Wang H， Su D W， Liu C C， Jin L C， Sun X F and Peng X Y. 2019a. Deformable non-local network for video super-resolution. IEEE Access， 7： 177734-177744 ［DOI： 10.1109/ACCESS.2019.2958030http://dx.doi.org/10.1109/ACCESS.2019.2958030］

Wang L G， Guo Y L， Lin Z P， Deng X P and An W. 2018a. Learning for video super-resolution through HR optical flow estimation//Proceedings of the 14th Asian Conference on Computer Vision. Perth， Australia： Springer： 514-529 ［DOI： 10.1007/978-3-030-20887-5_32http://dx.doi.org/10.1007/978-3-030-20887-5_32］

Wang X L， Girshick R， Gupta A and He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 7794-7803 ［DOI： 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813］

Wang X T， Chan K C K， Yu K， Dong C and Loy C C. 2019b. EDVR： video restoration with enhanced deformable convolutional networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Long Beach， USA： IEEE： 1954-1963 ［DOI： 10.1109/CVPRW.2019.00247http://dx.doi.org/10.1109/CVPRW.2019.00247］

Wang X T， Yu K， Dong C and Loy C C. 2018c. Recovering realistic texture in image super-resolution by deep spatial feature transform//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 606-615 ［DOI： 10.1109/CVPR.2018.00070http://dx.doi.org/10.1109/CVPR.2018.00070］

Wang Z， Bovik A C， Sheikh H R and Simoncelli E P. 2004. Image quality assessment： from error visibility to structural similarity. IEEE Transactions on Image Processing， 13（4）： 600-612 ［DOI： 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861］

Wang Z H， Chen J and Hoi S C H. 2021. Deep learning for image super-resolution： a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（10）： 3365-3387 ［DOI： 10.1109/TPAMI.2020.2982166http://dx.doi.org/10.1109/TPAMI.2020.2982166］

Wang Z W， Liu D， Yang J C， Han W and Huang T. 2015a. Deep networks for image super-resolution with sparse prior//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 370-378 ［DOI： 10.1109/ICCV.2015.50http://dx.doi.org/10.1109/ICCV.2015.50］

Wang Z Y， Yang Y Z， Wang Z W， Chang S Y， Han W， Yang J C and Huang T. 2015b. Self-tuned deep super resolution//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Boston， USA： IEEE： 1-8 ［DOI： 10.1109/CVPRW.2015.7301266http://dx.doi.org/10.1109/CVPRW.2015.7301266］

Wang Z Y， Yi P， Jiang K， Jiang J J， Han Z， Lu T and Ma J Y. 2019c. Multi-memory convolutional neural network for video super-resolution. IEEE Transactions on Image Processing， 28（5）： 2530-2544 ［DOI： 10.1109/TIP.2018.2887017http://dx.doi.org/10.1109/TIP.2018.2887017］

Wolf S， Pinson M H. 2011， Video quality model for variable frame delay （VQM-VFD）. ［EB/OL］. ［2022-02-08］. https://last.hit.bme.hu/download/vidtechlab/fcc/literature/video/ntia_tm-11-482.pdfhttps://last.hit.bme.hu/download/vidtechlab/fcc/literature/video/ntia_tm-11-482.pdf

Wu Y and Fan G H. 2017. Survey of super-resolution reconstruction techniques for video sequences. Computer Engineering and Software， 38（4）： 154-160

吴洋，樊桂花. 2017. 视频序列超分辨率重构技术综述. 软件， 38（4）： 154-160 ［DOI： 10.3969/j.issn.1003-6970.2017.04.030http://dx.doi.org/10.3969/j.issn.1003-6970.2017.04.030］

Xiang X Y， Tian Y P， Zhang Y L， Fu Y， Allebach J P and Xu C L. 2020. Zooming Slow-Mo： fast and accurate one-stage space-time video super-resolution//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 3367-3376 ［DOI： 10.1109/CVPR42600.2020.00343http://dx.doi.org/10.1109/CVPR42600.2020.00343］

Xiao Z Y， Fu X Y， Huang J， Cheng Z and Xiong Z W. 2021. Space-time distillation for video super-resolution//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 2113-2122 ［DOI： 10.1109/CVPR46437.2021.00215http://dx.doi.org/10.1109/CVPR46437.2021.00215］

Xu G， Xu J， Li Z， Wang L， Sun X and Cheng M M. 2021. Temporal modulation network for controllable space-time video super-resolution//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 6384-6393 ［DOI： 10.1109/CVPR46437.2021.00632http://dx.doi.org/10.1109/CVPR46437.2021.00632］

Xu J， Chae Y， Stenger B and Datta A. 2018. Dense bynet： residual dense network for image super resolution//Proceedings of the 25th IEEE International Conference on Image Processing （ICIP）. Athens， Greece： IEEE： 71-75 ［DOI： 10.1109/ICIP.2018.8451696http://dx.doi.org/10.1109/ICIP.2018.8451696］

Xu L， Jia J Y and Matsushita Y. 2012. Motion detail preserving optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 34（9）： 1744-1757 ［DOI： 10.1109/TPAMI.2011.236http://dx.doi.org/10.1109/TPAMI.2011.236］

Xu X， Xiong X， Wang J G and Li X. 2020. Deformable kernel convolutional network for video extreme super-resolution//Proceedings of 2020 European Conference on Computer Vision （ECCV）. Glasgow， UK： Springer： 82-98 ［DOI： 10.1007/978-3-030-66823-5_5http://dx.doi.org/10.1007/978-3-030-66823-5_5］

Xue T F， Chen B A， Wu J J， Wei D L and Freeman W T. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision， 127（8）： 1106-1125 ［DOI： 10.1007/s11263-018-01144-2http://dx.doi.org/10.1007/s11263-018-01144-2］

Yang W M， Zhang X C， Tian Y P， Wang W， Xue J H and Liao Q M. 2019. Deep learning for single image super-resolution： a brief review. IEEE Transactions on Multimedia， 21（12）： 3106-3121 ［DOI： 10.1109/TMM.2019.2919431http://dx.doi.org/10.1109/TMM.2019.2919431］

Yang X， Xiang W M， Zeng H and Zhang L. 2021. Real-world video super-resolution： a benchmark dataset and a decomposition based learning scheme//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Montreal， Canada： IEEE： 4761-4770 ［DOI： 10.1109/ICCV48922.2021.00474http://dx.doi.org/10.1109/ICCV48922.2021.00474］

Yi P， Wang Z Y， Jiang K， Jiang J J， Lu T， Tian X and Ma J Y. 2021. Omniscient video super-resolution//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 4429-4438 ［DOI： 10.1109/ICCV48922.2021.00439http://dx.doi.org/10.1109/ICCV48922.2021.00439］

Yi P， Wang Z Y， Jiang K， Jiang J J and Ma J Y. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 3106-3115 ［DOI： 10.1109/ICCV.2019.00320http://dx.doi.org/10.1109/ICCV.2019.00320］

Ying X Y， Wang L G， Wang Y Q， Sheng W D， An W and Guo Y L. 2020. Deformable 3D convolution for video super-resolution. IEEE Signal Processing Letters， 27： 1500-1504 ［DOI： 10.1109/LSP.2020.3013518http://dx.doi.org/10.1109/LSP.2020.3013518］

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions ［EB/OL］. ［2022-02-08］. https://arxiv.org/pdf/1511.07122v2.pdfhttps://arxiv.org/pdf/1511.07122v2.pdf

Zhang L P， Zhang H Y， Shen H F and Li P X. 2010. A super-resolution reconstruction algorithm for surveillance images. Signal Processing， 90（3）： 848-859 ［DOI： 10.1016/j.sigpro.2009.09.002http://dx.doi.org/10.1016/j.sigpro.2009.09.002］

Zhang R， Isola P， Efros A A， Shechtman E and Wang O. 2018a. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 586-595 ［DOI： 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068］

Zhang Y， Li J Z， Li D L and Du Y L. 2016. Super-resolution reconstruction for UAV video. Journal of Image and Graphics， 21（7）： 967-976

张岩，李建增，李德良，杜玉龙. 2016. 无人机侦察视频超分辨率重建方法. 中国图象图形学报， 21（7）： 967-976 ［DOI： 10.11834/jig.20160715http://dx.doi.org/10.11834/jig.20160715］

Zhang Y L， Gan Z L and Zhu X C. 2013. Video super-resolution method based on similarity constraints. Journal of Image and Graphics， 18（7）： 761-767

张义轮，干宗良，朱秀昌. 2013. 相似性约束的视频超分辨率重建. 中国图象图形学报， 18（7）： 761-767 ［DOI： 10.11834/jig.20130712http://dx.doi.org/10.11834/jig.20130712］

Zhang Y L， Li K P， Li K， Wang L C， Zhong B N and Fu Y. 2018b. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： IEEE： 294-310 ［DOI： 10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18］

Zhou B， Li C H and Chen W. 2021. Region-level channel attention for single image super-resolution combining high frequency loss. Journal of Image and Graphics， 26（12）： 2836-2847

周波，李成华，陈伟. 2021. 区域级通道注意力融合高频损失的图像超分辨率重建. 中国图象图形学报， 26（12）： 2836-2847 ［DOI： 10.11834/jig.200582http://dx.doi.org/10.11834/jig.200582］

Zhou L and Zhu X C. 2006. Algorithm of compressed video super-resolution restoration based on bayesian theory. Journal of Image and Graphics， 11（5）： 730-735

周亮，朱秀昌. 2006. 基于Bayesian理论的压缩视频超分辨率重构算法. 中国图象图形学报， 11（5）： 730-735 ［DOI： 10.11834/jig.200605121http://dx.doi.org/10.11834/jig.200605121］

Zhu X B， Li Z Z， Zhang X Y， Li C S， Liu Y Q and Xue Z Y. 2019a. Residual invertible spatio-temporal network for video super-resolution. Proceedings of the AAAI Conference on Artificial Intelligence， 33（1）： 5981-5988 ［DOI： 10.1609/aaai.v33i01.33015981http://dx.doi.org/10.1609/aaai.v33i01.33015981］

Zhu X Z， Hu H， Lin S and Dai J F. 2019b. Deformable convnets v2： more deformable， better results//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9300-9308 ［DOI： 10.1109/CVPR.2019.00953http://dx.doi.org/10.1109/CVPR.2019.00953］

文章被引用时，请邮件提醒。

提交