GraphiT: Encoding Graph Structure in Transformers (arXiv 2021)
https://arxiv.org/abs/2106.05667
该工作表明,将结构和位置信息合并到 transformer 中,能够优于现有的经典 GNN。GraphiT(1)利用基于图上的核函数的相对位置编码来影响 attention scores,(2)并编码出 local sub-structures 进行利用。实现发现,无论将这种方法单独使用,还是结合起来使用都取得了不错的效果。
(i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length
之前 GT 发现 self-attention 在只关注 neighboring nodes 的时候会取得比较好的效果,但是在关注到所有节点的时候,性能就不行。这篇论文发现 transformer with global communication 同样可以达到不错的效果。因此,GraphiT 通过一些策略将 local graph structure 编码进模型中,(1)基于正定核的注意力得分加权的相对位置编码策略 (2)通过利用 graph convolution kernel networks(GCKN)将 small sub-structure(e.g.,paths或者subtree patterns)编码出来作为transformer的输入。 Transformer Architectures
Encoding Node Positions Relative Position Encoding Strategies by Using Kernels on Graphs
现有的方法基本都针对 small graphs(最多几百个节点),Graph-BERT 虽然针对节点分类任务,但是首先会通过 sampling 得到子图,这会损害性能(比 GAT 多了很多参数,但性能是差不多的),能否设计一种针对大图的 transformer 还是一个比较难的问题。
参考文献
[1] GRAPH-BERT: Only Attention is Needed for Learning Graph Representations:https://github.com/jwzhanggy/Graph-Bert[2] (GROVER) Self-Supervised Graph Transformer on Large-Scale Molecular Data:https://github.com/tencent-ailab/grover[3] (GT) A Generalization of Transformer Networks to Graphs:https://github.com/graphdeeplearning/graphtransformer[4] GraphiT: Encoding Graph Structure in Transformers [Code is unavailable][5] (GraphTrans) Representing Long-Range Context for Graph Neural Networks with Global Attention:https://github.com/ucbrise/graphtrans[6] (SAN) Rethinking Graph Transformers with Spectral Attention [Code is unavailable][7] (Graphormer) Do Transformers Really Perform Bad for Graph Representation?:https://github.com/microsoft/Graphormer[8] (SAT) Structure-Aware Transformer for Graph Representation Learning [Code is unavailable] [9] (GraphGPS) Recipe for a General, Powerful, Scalable Graph Transformer:https://github.com/rampasek/GraphGPS
其他资料
[1] Graph Transformer综述:https://arxiv.org/abs/2202.08455 [ Code][2] Tutorial: [Arxiv 2022,06] A Bird's-Eye Tutorial of Graph Attention Architectures:https://arxiv.org/pdf/2206.02849.pdf
[3] Dataset: [Arxiv 2022,06]Long Range Graph Benchmark [ Code]:https://arxiv.org/pdf/2206.08164.pdf
简介:GNN 一般只能捕获 k-hop 的邻居,而可能无法捕获长距离依赖信息,Transformer 可以解决这一问题。该 benmark 共包含五个数据集(PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct),需要模型能捕获长距离依赖才能取得比较好的效果,该数据集主要用来验证模型捕获 long range interactions 的能力。
还有一些同质图上Graph Transformers的工作,感兴趣的同学自行阅读:
[1] [KDD 2022] Global Self-Attention as a Replacement for Graph Convolution:https://arxiv.org/pdf/2108.03348.pdf[2] [ICOMV 2022] Experimental analysis of position embedding in graph transformer networks:https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12173/121731O/Experimental-analysis-of-position-embedding-in-graph-transformer-networks/10.1117/12.2634427.short[3] [ICLR Workshop MLDD] GRPE: Relative Positional Encoding for Graph Transformer [Code]:https://arxiv.org/abs/2201.12787[4] [Arxiv 2022,05] Your Transformer May Not be as Powerful as You Expect [Code]:https://arxiv.org/pdf/2205.13401.pdf[5] [Arxiv 2022,06] NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs:https://arxiv.org/abs/2206.04910