当前位置：首页 > news >正文

[NeurIPS 2024]Long-range Brain Graph Transformer

news 2025/7/6 11:17:25

论文网址：NeurIPS Poster Long-range Brain Graph Transformer

论文代码：Page not found · GitHub · GitHub

好一个Page not fund

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Brain Network Analysis

2.3.2. Graph Transformer

2.4. Method

2.4.1. Adaptive Long-range Aware (ALGA) Strategy

2.4.2. Long-range Brain Graph Transformer

2.5. Experiments

2.5.1. Experimental Settings

2.5.2. Performance Comparison

2.5.3. Ablation Study

2.5.4. In-depth Analysis of ALTER and ALGA Strategy

2.6. Discussions and Conclusion

3. 知识补充

3.1. Random walk kernel

4. Reference

1. 心得

（1）我能说什么呢？爱中中呗

2. 论文逐段精读

2.1. Abstract

①额这倒没什么好说的就是和论文名字一样

2.2. Introduction

①Brain possesses short term and long term connectivity at the same time:

②The proposed Adaptive Long-range aware TransformER (ALTER)

2.3. Related Work

2.3.1. Brain Network Analysis

①Lists GNN based neuropsychiatric disorder diagnosis methods and graph pooling methods

2.3.2. Graph Transformer

①Introduced Transformer related works and pointed out these works are more competitive

②They thought existing Transformer based methods on neuropsychiatric disorder diagnosis do not capture long range dependence

2.4. Method

①For a set of brain network $\left \{ G_1,...,G_L \right \}\subseteq \mathcal{G}$ with label $\left \{ y_1,...,y_L \right \}\subseteq Y$ , where $L$ denotes the number of subjects

②A single brain graph $G=(V,X,A)$ , where $V$ is node set, $X \in \mathbb{R}^{N \times d}$ denotes node feature matrix with $N$ ROIs and dimension $d$ , $A \in \mathbb{R}^{N \times N}$ denotes adjacency matrix

③Learning task: learn a vector $h_G$ for on brain, which is able to predict the disease state $y_G=f\left ( h_G \right )$ , where $f$ denotes prediction function

④Framework of ALTER:

2.4.1. Adaptive Long-range Aware (ALGA) Strategy

（1）Adaptive Factors

①The adaptive factor $F_G \in \mathbb{R}^{N \times N}$ is calculated by correlation:

where $t_i$ denotes the original feature of fMRI (eg. BOLD signal)

②Compared to random walk, adaptive factor based method will walk to more relevant node

（2）Adaptive Long-range Encoding

①The probability of walking (transfer matrix):

②State vector:

where $k$ denotes the number of hops, $t_j\left ( k \right )$ is the probability of the walker stops at node $j$ after $k$ times walk

③And:

where $K$ denotes the total hops of random walk

④General recursive formula:

$T(k)=T(0)P^k_G$

⑤Fine tune the transfer probability by adaptive factor:

$\hat{P}_G=F_G \odot P_G$

where $\odot$ denotes dot product

⑥Random walk kernel:

$R=\left ( F_G \odot A_G \right )D^{-1}_G$

⑦The long-range embedding $E_G$ :

where $I$ is identity matrix, $e_i$ denotes long range embedding associated with the $i$ -th node（这个R啥玩意儿？）

2.4.2. Long-range Brain Graph Transformer

（1）Injecting Long-range Embedding

①Remapping $E_G$ by linear layer to:

where $W_G \in \mathbb{R}^{k' \times k}$ and $b_G \in \mathbb{R}^{k'}$ is learnable weight matrix and bias vecoter respectively

（2）Self-attention Module

①The token is calculated by:

②Then fed them into Transformer encoder with $L$ -layer nonlinear mapping and $M$ attention head:

$Z_{G}=W_{o}\left(||_{m=1}^{M}Z_{G}^{m,l}\right)\in\mathbb{R}^{N\times d_{out}}, Z_{G}^{m,l}=\mathrm{softmax}\left(\frac{Q^{m,l}K^{m,l}{}^{T}}{\sqrt{d_{out}^{m,l}}}\right)V^{m,l}\in\mathbb{R}^{N\times d_{out}^{m,l}}$

where $Q^{m,l}=W_{q}Z_{G}^{m,l-1}$ , $K^{m,l^{T}}=\left(W_{k}Z_{G}^{m,l-1}\right)^{T}$ and $V^{m,l}=W_{v}Z_{G}^{m,l-1}$ , where $Z_{G}^{0}=\hat{X}_{G}$ , both $||$ and $\left [ \cdot | \cdot \right ]$ are concatenate operator（作者写的concentrate）, $l$ denotes layer index, $m$ is head index, $W_{q},W_{k},W_{v}\in\mathbb{R}^{d_{out}^{m,l}\times d_{out}^{m,l-1}}$ and $W_{o}\in\mathbb{R}^{d_{out}\times d_{out}^{m}}$ are learnable projection matrices