Skip to content

Commit

Permalink
Merge pull request NAG-DevOps#35 from carlos-encs/multigpu
Browse files Browse the repository at this point in the history
Multigpu, Multinode training
  • Loading branch information
smokhov authored Jan 21, 2024
2 parents 18b6d47 + b0e661f commit 0a30bcb
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 1 deletion.
19 changes: 19 additions & 0 deletions doc/scheduler-job-examples.tex
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,25 @@ \subsection{Scheduling On The GPU Nodes}
%And that there are no more GPUs available on that node (\texttt{hc:gpu=0}).
%Note that no more than two GPUs can be requested for any one job.

% ------------------------------------------------------------------------------
\subsubsection{P6 on Multi-GPU, Multi-Node}

As described lines above, P6 cards are not compatible with Distribute and DataParallel functions
(\texttt{Pytorch, Tensorflow}) when running on Multi-GPUs.
One workaround is to run the job in Multi-node, single GPU per node; per example:

\begin{verbatim}
#SBATCH --nodes=2
#SBATCH --gpus-per-node=1
\end{verbatim}

On P6 nodes: \texttt{speed-05, speed-17, speed-01}

The example:
\href{https://github.com/NAG-DevOps/speed-hpc/blob/master/src/pytorch-multinode-multigpu.sh}
{pytorch-multinode-multigpu.sh}
illustrates a job for training on Multi-nodes, Multi-GPUs

% ------------------------------------------------------------------------------
\subsubsection{CUDA}

Expand Down
2 changes: 1 addition & 1 deletion doc/scheduler-scripting.tex
Original file line number Diff line number Diff line change
Expand Up @@ -601,7 +601,7 @@ \subsubsection{Jupyter Notebooks}
Create an \tool{ssh} tunnel between your computer and the node (\texttt{speed-XX}) where Jupyter is
running (Using \texttt{speed-submit} as a ``jump server'') (Preferably: PuTTY, see \xf{fig:putty1} and \xf{fig:putty2})
\begin{verbatim}
ssh -L 8888:localhost:8888 speed-XX
ssh -L 8888:speed-XX:8888 YOUR_USER@speed-submit.encs.concordia.ca
\end{verbatim}
Don't close the tunnel.
Expand Down
Binary file modified doc/speed-manual.pdf
Binary file not shown.
1 change: 1 addition & 0 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ These are examples either trivial or some are more elaborate. Some are described
- `efficientdet.sh` -- `efficientdet` with Conda environment described below
- `gurobi-with-python.sh` -- using Gurobi with Python and Python virtual environment
- `pytorch-multicpu.txt` -- using Pytorch with Python virtual environment to run on CPUs; with instructions and code ready to paste.
- `pytorch-multinode-multigpu.sh` -- using Pytorch with Python virtual environment to run on Multinodes and MultiGpus
- `lambdal-singularity.sh` -- an example use of the Singularity container to run LambdaLabs software stack on the GPU node. The container was built from the docker image as a [source](https://github.com/NAG-DevOps/lambda-stack-dockerfiles).
- `openiss-reid-speed.sh` -- OpenISS computer vision exame for re-edentification, see [more](https://github.com/NAG-DevOps/speed-hpc/tree/master/src#openiss-reid-tfk) in its section
- `openiss-yolo-cpu.sh`, `openiss-yolo-gpu.sh`, and `openiss-yolo-interactive.sh` -- OpenISS examples with YOLO, related to `reid`, see [more](https://github.com/NAG-DevOps/speed-hpc/tree/master/src#openiss-yolov3) in the corresponding section
Expand Down
29 changes: 29 additions & 0 deletions src/pytorch-multinode-multigpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/encs/bin/tcsh
#SBATCH --job-name=pytorch_multinode_multigpu_train

#SBATCH --nodes=2
#SBATCH --gpus-per-node=1 #On P6 cards this value MUST be: 1
#SBATCH --cpus-per-task=8
#SBATCH --ntasks-per-node=1
#SBATCH --mem=128G ## Assign memory per node

if ( $?SLURM_CPUS_PER_TASK ) then
setenv omp_threads $SLURM_CPUS_PER_TASK
else
setenv omp_threads 1
endif
setenv OMP_NUM_THREADS $omp_threads

setenv RDZV_HOST `hostname -s`
setenv RDZV_PORT 29400
setenv endpoint ${RDZV_HOST}:${RDZV_PORT}
setenv CUDA_LAUNCH_BLOCKING 1
setenv NCCL_BLOCKING_WAIT 1
#setenv NCCL_DEBUG INFO
setenv NCCL_P2P_DISABLE 1
setenv NCCL_IB_DISABLE 1
source /speed-scratch/$USER/tmp/Venv-Name/bin/activate.csh #path where you have created your python venv
unsetenv CUDA_VISIBLE_DEVICES
# nproc_per_node=1 On P6 cards
srun torchrun --nnodes=$SLURM_JOB_NUM_NODES --nproc_per_node=1 --rdzv_id=$SLURM_JOB_ID --rdzv_backend=c10d --rdzv_endpoint=$endpoint main_multinode.py
deactivate

0 comments on commit 0a30bcb

Please sign in to comment.