Structure of the graph现有工作告诉我们,在分子图、点云和网格(图像也可以看作是一种网格状的图)上使用深度图网络是有效的,这些图跟我们常用以评估图神经网络效果的引用网络(例如:Cora、PubMed 或 CoauthorCS)等有何不同呢?下图就给出一个很清晰的示例:我们常用的引用网络这一类图往往属于直径较小的小世界网络,简单点讲就是图中任意两个结点的距离都比较近,几跳就能到达,使用更深的网络架构并不会扩大感受野;相比之下,在网格(比如图像)这一类图中,增加层数能让感受野成倍增长,从而更好地捕捉到上下文。使用不同颜色标注不同层数所能到达的结点,可以看到,同样是12个结点的图,左图仅用两种颜色就够了,右图则用了6种颜色。
02
Long-range vs short-range problems.根据问题的特性,有的问题则可以通过短距离的信息解决,比如社交网络中,预测往往仅依赖于结点的局部邻居;有的问题可能需要更长距离的信息,比如分子图种分子的化学性质可能取决于另一侧的原子组合,这就需要深度图网络来组合这些远程信息。但是随着层数增多,如果图结构会引起结点感受野的指数级增长,那么瓶颈现象就会阻止远程信息的有效传播:上图就展示了一个例子,在使用深度图网络时,我们需要把多个结点的信息整合到一个结点中并迭代此过程,所以传播过程中可能会丢失不少有用的信息,也就解释了为什么深度模型难以提高性能。
Evaluation最后但或许也很重要的一点就是评估方法,一些常见的基准数据集和方法未必能准确评估图神经网络的效果,我们观察到深度图网络在一些数据集上性能随深度下降,或许仅仅是因为数据集太小,发生了过拟合。斯坦福新推出的 Open Graph Benchmark 可以解决部分问题,它提供了大规模的图数据,并给定了训练和测试数据的划分方式。
06
参考文献[1] More precisely, over-smoothing makes node feature vector collapse into a subspace, see K. Oono and T. Suzuki, Graph neural networks exponentially loose expressive power for node classification (2019). arXiv:1905.10947, which provides asymptotic analysis using dynamic systems formalist.[2] Q. Li, Z. Han, X.-M. Wu, Deeper insights into graph convolutional networks for semi-supervised learning (2019). Proc. AAAI. Draws the analogy between the GCN model and Laplacian smoothing and points to the over-smoothing phenomenon.[3] H. Nt and T. Maehara, Revisiting graph neural networks: All we have is low-pass filters (2019). arXiv:1905.09550. Uses spectral analysis on graphs to answer when GCNs perform well.[4] U. Alon and E. Yahav, On the bottleneck of graph neural networks and its practical implications (2020). arXiv:2006.05205. Identified the over-squashing phenomenon in graph neural networks, which is similar to one observed in sequential recurrent models.[5] Y. Rong et al. DropEdge: Towards deep graph convolutional networks on node classification (2020). In Proc. ICLR. An idea similar to DropOut where a random subset of edges is used during training.[6] L. Zhao and L. Akoglu. PairNorm: Tackling oversmoothing in GNNs (2020). Proc. ICLR. Proposes normalising the sum of pairwise distances between node features in order to prevent them collapsing into a single point.[7] K. Zhou et al. Effective training strategies for deep graph neural networks (2020). arXiv:2006.07107.[8] K. Xu et al., Representation learning on graphs with jumping knowledge networks (2018). Proc. ICML 2018.[9] S. Gong et al. Geometrically principled connections in graph neural networks (2020). Proc. CVPR.[10] C. Szegedy et al. Going deeper with convolutions (2015). Proc. CVPR.[11] K. He et al., Deep residual learning for image recognition (2016). Proc. CVPR.[12] G. Li et al. DeepGCNs: Can GCNs go as deep as CNNs? (2019). Proc. ICCV. Shows the advantages of depth for geometric point-cloud data.[13] Alon and Yahav refer to the case when a node is unable to receive information from nodes that are farther away than the number of layers as “under-reaching”. This phenomenon was first pointed out by P Barceló et al., The logical expressiveness of graph neural networks (2020). Proc. ICLR. Alon and Yahav show experimentally on the problem of chemical properties prediction in molecular graphs (using GNNs with more layers than the diameter of the graphs) that the source of poor performance is not under-reaching but over-squashing.[14] André Araujo and co-authors have an excellent blog post about receptive fields in convolutional neural networks. As CNN models evolved in computer vision applications, from AlexNet, to VGG, ResNet, and Inception, their receptive fields increased as a natural consequence of the increased number of layers. In modern architectures, the receptive field usually covers the entire input image, i.e., the context used by each feature in the final output feature map includes all of the input pixels. Araujo et al observe a logarithmic relationship between classification accuracy and receptive field size, which suggests that large receptive fields are necessary for high-level recognition tasks, but with diminishing return.[15] M. K. Matlock et al. Deep learning long-range information in undirected graphs with wave networks (2019). Proc. IJCNN. Observes the failure of graph neural networks to capture long-distance interactions in molecular graphs.[16] This stems from message-passing GNN equivalence to the Weisfeiler-Lehman graph isomorphism test, see e.g. V. Arvind et al. On Weisfeiler-Leman invariance: subgraph counts and related graph properties (2018). arXiv:1811.04801 and Z. Chen et al. Can graph neural networks count substructures? (2020). arXiv:2002.04025.[17] N. Dehmamy, A.-L. Barabási, R. Yu, Understanding the representation power of graph neural networks in learning graph topology (2019). Proc. NeurIPS. Shows that learning graph moments of certain order requires GNNs of certain depth.[18] F. Monti, K. Otness, M. M. Bronstein, MotifNet: a motif-based Graph Convolutional Network for directed graphs (2018). arXiv:1802.01572.[19] G. Bouritsas et al. Improving graph neural network expressivity via subgraph isomorphism counting (2020). arXiv:2006.09252.[20] E. Rossi et al. SIGN: Scalable inception graph neural networks (2020). arXiv:2004.11198[21] O. Shchur et al. Pitfalls of graph neural network evaluation (2018). Workshop on Relational Representation Learning. Shows that simple GNN models perform on par with more complex ones.