多媒体信号处理的数学理论前沿进展

熊红凯; 戴文睿; 林宙辰; 吴飞; 于俊清; 申扬眉; 徐明星

发布时间： 2020-01-16
摘要点击次数： 2542
全文下载次数： 524
DOI: 10.11834/jig.190468
2020 | Volume 25 | Number 1

多媒体信号处理的数学理论前沿进展

熊红凯¹, 戴文睿¹, 林宙辰², 吴飞³, 于俊清⁴, 申扬眉¹, 徐明星¹(1.上海交通大学电子信息与电气工程学院, 上海 200240;2.北京大学信息科学技术学院, 北京 100080;3.浙江大学计算机科学与技术学院, 杭州 310027;4.华中科技大学计算机科学与技术学院, 武汉 430074)

摘要

深度学习模型广泛应用于多媒体信号处理领域，通过引入非线性能够极大地提升性能，但是其黑箱结构无法解析地给出最优点和优化条件。因此如何利用传统信号处理理论，基于变换/基映射模型逼近深度学习模型，解析优化问题，成为当前研究的前沿问题。本文从信号处理的基础理论出发，分析了当前针对高维非线性非规则结构方法的数学模型和理论边界，主要包括：结构化稀疏表示模型、基于框架理论的深度网络模型、多层卷积稀疏编码模型以及图信号处理理论。详细描述了基于组稀疏性和层次化稀疏性的表示模型和优化方法，分析基于半离散框架和卷积稀疏编码构建深度/多层网络模型，进一步在非欧氏空间上扩展形成图信号处理模型，并对国内外关于记忆网络的研究进展进行了比较。最后，展望了多媒体信号处理的理论模型发展，认为图信号处理通过解析谱图模型的数学性质，解释其中的关联性，为建立广义的大规模非规则多媒体信号处理模型提供理论基础，是未来研究的重要领域之一。

关键词

结构化稀疏表示基于框架理论的深度卷积网络多层卷积稀疏编码图信号处理多媒体信号处理

Advances in mathematical theory for multimedia signal processing

Xiong Hongkai¹, Dai Wenrui¹, Lin Zhouchen², Wu Fei³, Yu Junqing⁴, Shen Yangmei¹, Xu Mingxing¹(1.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;2.School of Electronic Engineering and Computer Science, Peking University, Beijing 100080, China;3.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;4.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China)

Abstract

Deep learning models have been widely used in multimedia signal processing. They considerably improve the performance of signal processing tasks by introducing nonlinearities but lack analytical formulation of optimum and optimality conditions due to their black-box architectures. In recent years, analyzing the optimal formulation and approximating the deep learning models based on classical signal processing theory have been popular for multimedia, that is, transform/basis projection-based models. This paper presents and analyzes the mathematical models and their theoretical bounds for high-dimensional nonlinear and irregular structured methods based on the fundamental theories of signal processing. The main content includes structured sparse representation, frame-based deep networks, multilayer convolutional sparse coding, and graph signal processing. We begin with sparse representation models based on group and hierarchical sparsities with their optimization methods and subsequently analyze the deep/multilayer networks developed using semi-discrete frames and convolutional sparse coding. We also present graph signal processing models by extending classical signal processing to the non-Euclidean geometry. Recent advances in these topics achieved by domestic and foreign researchers are compared and discussed. Structured sparse representation introduces the mixed norms to formulate a group Lasso problem for structural information, which can be solved using proximal method or network flow optimization. Considering that structured sparse representation is still based on the linear projection onto dictionary atoms, frame-based deep networks are developed to extend the semi-discrete frames in multiscale geometric analysis. They inherit the scale and directional decomposition led by frame theory and introduce nonlinearities to guarantee deformation stability. Inspired by scattering networks, multilayer convolutional sparse coding introduces combined regularization into sparse representation to fit max pooling operation. Sparse representation of irregular multiscale structures can be achieved with the trained overcomplete dictionary in a recursive manner. Graph signal processing extends conventional signal processing into non-Euclidean spaces. When integrated with convolutional neural networks, graph neural networks learn complex relational networks and are desirable for data-driven large-scale high-dimensional irregular signal processing. This paper forecasts the future work of mathematical theories and models for multimedia signal processing. This research is useful for developing a generalized graph signal processing model for large-scale irregular multimedia signals by analyzing the mathematical properties and linkages of conventional signal processing and graph spectral model.

Keywords

structured sparse representation frame-based deep convolutional network multi-layer convolutional sparse coding graph signal processing multimedia signal processing

在线采编平台

在线出版

年度会议

下载中心

年度信息