When I reading NVIDIA NCCL Documentation, it said that NCCL does not define specific verbs for sendrecv, gather, gatherv, scatter, scatterv, alltoall, alltoallv, alltoallw, nor neighbor collectives. All those operations can be simply expressed using a combination of ncclSend, ncclRecv, and ncclGroupStart/ncclGroupEnd, similarly to how they can be expressed with MPI_Isend, MPI_Irecv and MPI_Waitall.
So I try to use nccl's API ncclSend
,ncclRecv
,ncclGroupStart
,ncclGroupEnd
to realize these function:
- NCCLSendrecv
- NCCLGather
- NCCLScatter
- NCCLAlltoall
I referenced openmpi's API when writing these APIs.
- Use a linux PC
- Make sure that openmpi and nccl is installed on your PC
- Make sure your cuda version or nvcc could use std::c++17 (my cuda version is 11.4 )
- Clone this repo to your disk
cd
to any one of three directories and then:make
ormake all
, it will build the binary filemake test
, it will execute the examples
All the new added functions are in ncclEnhance.h