目的 深度学习模型广泛应用于多媒体信号处理领域，通过引入非线性能够获得极大地性能提升，但是其黑箱结构无法解析地给出最优点和优化条件。因此如何基于传统信号处理理论，利用基于变换/基映射模型逼近深度学习模型，解析优化问题，成为当前研究的前沿问题。方法 本文从信号处理的基础理论出发，分析了当前针对高维非线性非规则结构方法的数学模型和理论边界，主要包括：结构化稀疏表示模型、基于框架理论的深度网络模型、多层卷积稀疏编码模型以及图信号处理理论。具体地，详细描述了基于组稀疏性和层次化稀疏性的表示模型和优化方法，分析基于半离散框架和卷积稀疏编码构建深度/多层网络模型，进一步在非欧空间上扩展形成图信号处理模型，并对国内外关于记忆网络的研究进展进行了比较。结果 结构化稀疏方法通过混合的联合范式正则项，利用近端算法或网络流优化求解组Lasso目标问题，获取包含结构信息的更优的稀疏表示。鉴于结构化稀疏表示仍然是立足于基集的线性映射，基于框架理论的深度网络成为焦点。它通过延拓多尺度几何分析中的半离散框架来生成深度卷积网络，引入非线性形成对于形变稳定的特征表示，并继承框架理论良好的尺度和方向分解性质。受散射网络启发，多层卷积稀疏编码在稀疏编码基础上，引入组合正则约束，拟合最大化池化操作，并递归逐层学习和分解稀疏表示的过完备字典，形成表示多尺度非规则结构的稀疏表示。图信号处理理论基于非欧空间上的拓扑结构扩展传统信号处理理论，通过结合卷积神经网络学习复杂关系网络，适用于数据驱动的高维不规则信号的分析和处理。结论 最后，本文展望了多媒体信号处理的理论模型发展，认为图信号处理通过解析谱图模型的数学性质，解释其中的关联性，为建立广义的大规模非规则多媒体信号处理模型提供理论基础，是未来研究的重要领域之一。
Deep learning models have been widely used in multimedia signals processing. They significantly improve the performance of signal processing tasks by introducing non-linearities, but lacks analytical formulation of optimum and optimality conditions due to their black-box architectures. In recent years, it is popular to analyze the optimal formulation and approximate the deep learning models based on classical signal processing theory for multimedia, i.e., transform/basis projection based models. Method This report presents and analyzes the mathematical models and their theoretical bounds for high-dimensional nonlinear and irregular structured methods based on the fundamental theories of signal processing. The main content of this paper includes structured sparse representation, frame-based deep networks, multi-layer convolutional sparse coding and graph signal processing. We begin with sparse representation models based on group sparsity and hierarchical sparsity with their optimization methods, and subsequently, analyze the deep/multi-layer networks developed using semi-discrete frames and convolutional sparse coding. We also present graph signal processing models by extending classical signal processing to the non-Euclidean geometry. Recent advances in these topics achieved by domestic and abroad researchers are compared and discussed. Result Structured sparse representation introduces the mixed norms to formulate a group Lasso problem for structural information, which can be solved using proximal method or network flow optimization. Considering that structured sparse representation is still based on the linear projection onto dictionary atoms, frame-based deep networks are developed to extend the semi-discrete frames in multiscale geometric analysis. They inherit the scale and directional decomposition led by frame theory and introduce nonlinearities to guarantee deformation stability. Inspired by scattering networks, multi-layer convolutional sparse coding introduces combined regularization into sparse representation to fit max pooling operation. Sparse representation of irregular multiscale structures can be achieved with the trained overcomplete dictionary in a recursive manner. Graph signal processing extends conventional signal processing into the non-Euclidean spaces. When integrated with convolutional neural networks, graph neural networks learn complex relational networks and are desirable for data-driven large-scale high-dimensional irregular signal processing. Conclusion In the end, this paper forecasts the future work of mathematical theories and models for multimedia signal processing. It is significant to develop generalized graph signal processing model for large-scale irregular multimedia signals by analyzing the mathematical properties and linkages of conventional signal processing and graph spectral model.