柳世禹, 戴文睿, 李成林, 熊红凯(上海交通大学)
From Graph Convolution Networks to Graph Scattering Networks: A Survey
Liu Shiyu, Dai Wenrui, Li Chenglin, Xiong Hongkai(Shanghai Jiao Tong University)
In Image processing and computer graphics, non-Euclidean data such as graphs gain increasing attention in recent years since Euclidean data such as images and videos fail to represent data with structure information. Compared with traditional Euclidean data, the scale of graph can be arbitrary large. Meanwhile, the structure of a graph usually contains information such as the relation between vertices, the logical sequences and the properties of graph itself. While images can be easily converted to graphs based on Euclidean position of pixels, graphs (especially for irregular graphs) can merely be converted to images. Therefore, graphs require higher level of representation learning compared with traditional Euclidean data. In the era of deep learning, however, traditional convolution neural networks (CNNs) fail to learn representations for graphs due to permutation covariance for nodewise features and permutation invariance for outputs such as classification labels. The performance of CNNs in graph representation learning is still limited even if inputs are augmented by arbitrary permutation during training process to learn permutation covariance. The development of graph neural networks and graph convolution operations achieves milestone in representation learning of non-Euclidean data such as graphs. Commonly, graph convolution neural networks (GCNs) can be divided into two categories: spatial GCNs and spectral GCNs. Spatial GCNs focus on the establishment of neighborhood, and update with aggregation functions which combine the features of the center vertex and its neighbors. Though GCNs based on neighborhood feature aggregation encourage the propagation of nodewise features, deep GCNs usually suffer from over-smoothness issue and the features of vertices become indistinguishable. Therefore, later works consider introducing skip connections in deep GCNs or constructing shallow GCNs with multi-scale neighborhood considered within each convolution to alleviate this issue. Spectral GCNs focus on the graph spectral theorem and update their parameters by signal filtering in spectral domain with designed filters. However, eigen-decomposition of graph shift operator is costly for large graphs since its computation complexity is O(N^3). Therefore, spectral GCNs usually apply k-order polynomials (i.e. Chebyshev polynomials) to approximate the target filters and avoid eigen-decomposition. Though spectral GCNs may avoid over-smoothness issue with graph filter design, the limited number of learnable parameters and filter responses of spectral GCNs usually limit their expression ability. Spatial GCNs and spectral GCNs are not necessarily independent from each other. For example, 1-order Chebyshev polynomials with diffusion matrix are equivalent to feature aggregation within 1-hop neighborhood. Therefore, spatial GCNs based on feature aggregation with diffusion Laplacian matrix or lazy random walk matrix usually have the spectral form, which bridges the spatial GCNs and spectral GCNs. The rapid development in graph representation learning gives rise to the demand for survey and review that summarize existing works and serve as guidance for beginners. Currently, there have been reviews for graph neural networks such as graph convolution neural networks, graph embeddings and graph autoencoders. However, we notice that current surveys and reviews lack one domain in graph representation learning: graph scattering transforms (GSTs) and graph scattering networks (GSNs). In essence, graph scattering networks are non-trainable spectral GCNs based on wavelet decomposition. With the benefit of multi-scale wavelets and the structure of networks, GSNs generate diverse features with nearly non-overlapping frequency responses in the spectral domain. As one of the newly developed graph representation learning method, graph scattering networks are used in tasks such as graph classification and node classification. Recent works employed graph scattering transform to spatial GCNs to overcome the over-smoothness issue. Compared with spectral GCNs, GSNs generate diverse features that strengthen the expressive capability of model without introducing the over-smoothness issue. However, the nontrainable property of GSNs may limit the flexibility in graph representation learning on different graph dataset with different distribution of spectrum. Meanwhile, GSNs suffer from exponential growth of diffusion paths with the growth of scattering layers, which limit the depth of GSNs in practice. In this paper, we provide a survey that comprehensively reviews the designs from GCNs to GSNs. We first discuss about GCNs by dividing GCNs into two categories: spatial GCNs and spectral GCNs. Spatial GCNs are categorized into the following types: (1) diffusion based GCNs, (2) GCNs on large graphs with neighbor sampling or sub-graph sampling, (3) GCNs with attention mechanism, and (4) GCNs with dynamic neighborhood construction. Meanwhile, we review the spectral GCNs according to different filters (filter kernels): (1) Chebyshev polynomials, (2) Cayley polynomials, (3) k-order polynomials. After addressing the drawbacks of both spatial GCNs and spatial GCNs, we introduce the definition of GSTs, the structure of classical GSNs and the advantages of GSNs compared with GCNs. We further elaborate the current arts of GSNs from the perspectives of network design along with application and stability in theory. We review the networks and application of graph scattering transform in the following categories: (1) classical GSNs, (2) graph scattering transforms in GCNs to solve the over-smoothness issue, (3) graph attention networks with GSTs, (4) graph scattering transform on spatial-temporal graphs, (5) reducing scattering paths of GSNs via pruning to increase the efficiency of graph scattering transform, (6) GSNs with multi-resolution graphs, (7) trainable GSNs with wavelet scale selection and learnable spectral filters. In theory, we conclude the frame theorem and the stability theorem under signal and topology perturbation respectively. We further analyze the limitations of GSNs (GSTs) in current works and propose possible directions for the development of graph scattering technics in the future.