视频处理与压缩技术

贾川民; 马海川; 杨文瀚; 任文琦; 潘金山; 刘东; 刘家瑛; 马思伟

发布时间： 2021-06-15
摘要点击次数： 17831
全文下载次数： 15540
DOI: 10.11834/jig.200861
2021 | Volume 26 | Number 6

视频处理与压缩技术

贾川民¹, 马海川², 杨文瀚³, 任文琦⁴, 潘金山⁵, 刘东², 刘家瑛⁶, 马思伟¹(1.北京大学信息科学技术学院, 北京 100871;2.中国科学技术大学信息科学技术学院, 合肥 230027;3.香港城巿大学计算机科学系, 中国香港 999077;4.中国科学院信息工程研究所, 北京 100196;5.南京理工大学计算机科学与工程学院, 南京 210094;6.北京大学王选计算机研究所, 北京 100871)

摘要

视频处理与压缩是多媒体计算与通信领域的核心主题之一，是连接视频采集传输和视觉分析理解的关键桥梁，也是诸多视频应用的基础。当前“5G+超高清+AI”正在引发多媒体计算与通信领域的新一轮重大技术革新，视频处理与压缩技术正在发生深刻变革，亟需建立视频大数据高效紧凑表示理论和方法。为此，学术研究机构和工业界对视频大数据的视觉表示机理、视觉信息紧凑表达、视频信号重建与恢复、高层与低层视觉融合处理方法及相应硬件技术等前沿领域进行了广泛深入研究。本文从数字信号处理基础理论出发，分析了当前视频处理与压缩领域的热点问题和研究内容，包括基于统计先验模型的视频数据表示模型及处理方法、融合深度网络模型的视频处理技术、视频压缩技术以及视频压缩标准进展等领域。详细描述了视频超分辨率、视频重建与恢复、视频压缩技术等领域面临的前沿动态、发展趋势、技术瓶颈和标准化进程等内容，对国际国内研究内容和发展现状进行了综合对比与分析，并展望了视频处理与压缩技术的发展与演进方向。更高质量视觉效果和高效率视觉表达之间将不再是单独研究的个体，融合类脑视觉系统及编码机理的视频处理与压缩技术将是未来研究的重要领域之一。

关键词

多媒体技术视频信号处理视频压缩人工智能深度学习

Video processing and compression technologies

Jia Chuanmin¹, Ma Haichuan², Yang Wenhan³, Ren Wenqi⁴, Pan Jinshan⁵, Liu Dong², Liu Jiaying⁶, Ma Siwei¹(1.School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China;3.Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China;4.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100196, China;5.School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;6.Wangxuan Institute of Computer Technology, Peking University, Beijing 100871, China)

Abstract

Video processing and compression are the most fundamental research areas in multimedia computing and communication technologies. They play a significant role in bridging video acquisition, video streaming, and video delivery together with the visual information analysis and visual understanding. Video processing and compression are also the foundations of applicational multimedia technologies and support various down-stream video applications. Digital videos are the largest big data in our contemporary modern society. The multimedia industry is the core component of the intellectual information era. The human kind steps into the intellectual information era with the continuous development of artificial intelligence and new generation of information revolution. Many emerging interdisciplinary research topics interact and fuse. Currently, the 5G plus ultra-high definition plus artificial intelligence invokes a novel trend of massive technology revolution in the context of multimedia computing and communication. The video processing and compression techniques also face challenging and intensive reform given this background. The demands for the theoretical and applicational breakthrough research on the compact video data representations, the highly efficient processing pipelines, and the high-performance algorithms are increasing. To address these issues, the academic and industrial society have already made extensive contributions and studies into several cutting-edge research areas and contents, including visual signal representation mechanism of video big data, compact visual information expression, video signal restoration and reconstruction, high-level and low-level vision fusion methods, and their hardware implementations. Based on fundamental theories in discrete signal processing, the active research topics as well as the corresponding state-of-the-art methodologies in the field of video processing and compression are systematically reviewed and analyzed. A comprehensive review of research topics, namely, statistical prior model-based video data representation learning and its processing methods, deep network-based video processing and compression solutions, video coding techniques, and video compression standardization process is provided. More importantly, the challenges of these research areas, the future developing tendency, the state-of-the-art approach as well as the standardization process are also provided from top to bottom. Specifically, the video processing algorithms, including model-based and deep learning based video super-resolution and video restoration solutions are initially reviewed. The video super-resolution contains spatial super-resolution and temporal super-resolution methods. The video restoration focuses on video deblurring and video deraining. The prior model based approaches and neural approaches are reviewed and compared. Subsequently, this paper presents the review of video compression methods from two aspects, namely, conventional coding tool development and learning-based video coding approaches. The former focuses on the modular improvements on predictive coding, transform and quantization, filtering, and entropy coding. With the development of multiple next-generation video coding standards, the scope and depth for the coding tool research in conventional hybrid coding framework are extensively broadened. The latter introduces the deep learning based video coding methods, not only for hybrid coding framework but also for end-to-end coding framework. Deep neural network based coding would definitely become the next jump of high-dimensional multimedia signal coding. For both parts, the detailed technology and standardization are described to shape the overall development of video compression. In addition, the extensive comparative study on these areas between oversea community and domestic community is conducted and analyzed, providing the evidence for the difference and similarity in the current situation. Finally, the future work on theoretical and application studies in video processing and compression is envisioned. In particular, the research between high quality visual effects and high efficiency visual representation would not be separate areas. The fusion of brain-like visual system and encoding mechanism for video processing and compression is a key direction of future research.

Keywords

multimedia technology video signal processing video compression artificial intelligence deep learning