This repository is the official implementation of MSPipe: Efficient Temporal GNN Training via Staleness-aware Pipeline
Our development environment:
- Ubuntu 20.04LTS
- g++ 9.4
- CUDA 11.3 / 11.6
- cmake 3.23
Dependencies:
- torch >= 1.10
- dgl (CUDA version)
Compile and install the MSPipe:
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py installFor debug mode,
DEBUG=1 pip install -v -e .Compile and install the TGL (presample version):
cd tgl
python setup_tgl.py build_ext --inplacecd scripts/ && ./download_data.shMSPipe
Training TGN model on the REDDIT dataset with MSPipe on 4 GPUs.
cd scripts
./run_offline.sh TGN REDDIT 4Presample (TGL)
Training TGN model on the REDDIT dataset with Presample on 4 GPUs.
cd tgl
./run_tgl.sh TGN REDDIT 4Distributed training
Training TGN model on the GDELT dataset on more than 1 servers, each server is required to do the following step:
- change the
INTERFACEto your netcard name (can be found usingifconfig) - change the
HOST_NODE_ADDR: IP address of the host machineHOST_NODE_PORT: The port of the host machineNNODES: Total number of serversNPROC_PER_NODE: The number of GPU for each servers
cd script
./run_offline_dist.sh TGN GDELT