Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Variables from GPU as Mentioned in 'Algorithm 3' of Roland's Paper #12

Open
ssong915 opened this issue Apr 30, 2024 · 1 comment

Comments

@ssong915
Copy link

Hello, I found Roland's paper to be quite insightful and have been exploring the accompanying code.

I have a question regarding "Algorithm 3" mentioned in the paper, specifically about the instruction on line 7 that states "Remove Gnn, Gt, Ht-1 from GPU."
Could you please guide me to where I might find the corresponding code for this step?

Thank you.

@TianyuDu
Copy link
Collaborator

TianyuDu commented May 4, 2024

The training loop in Algorithm 3 can be found in the train_live_update.py script.

Just so you know, lines 2 and 7 in Algorithm 3 are only necessary if GPU memory is a bottleneck (e.g., when dealing with a large graph using a GPU with, for example, 12 GiB of memory).

The while-loop in Algorithm 3 starts line 286 in the script; the while-loop was implemented as a for-loop with early stopping in the script. At line 309, we called the train_step to train the model using the snapshot graph $G_t$. Please into the definition of train_step, especially on line 158, when the get_task_batch is called. The get_task_batch method will generate the "graph snapshot" $G_t$ described in algorithm 3. On line 109 inside the get_task_batch method, we move the batch object to the GPU (specified by the cfg.device). Becausebatch object contains both $G_t$ and the $H_{t-1}$ (prev_node_states variable in the code), both $G_t$ and $H_{t-1}$ were moved to GPU.

The memory footprint of GNN is relatively small compared to graph datasets. Since we had a large enough GPU while running these experiments, we kept the GNN itself on the GPU all the time to minimize the cost of transferring data between the GPU and CPU. If you want to off-load the GNN as well, you could do it on line 296, by calling model.to('cpu') before it. However, you would need to move the model back to GPU before the train_step on line 309.

I wrote this code a couple of years ago and am a bit rusty about the code structure; please let me know if anything is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants