Cricket is a virtualization layer for CUDA application that enables remote execution and checkpoint/restart without the need to recompile applications. Cricket isolates CUDA applications from the CUDA APIs by using ONC Remote Procedure Calls. User code and CUDA APIs are thus executed in separate processes.
For Cricket to be able to insert the virtualization layer, the CUDA application has to link dynamically to the CUDA APIs. For this, you have to pass -cudart shared
to nvcc
during linking.
Supported transports for cudaMemcpy:
- TCP (slow, for pageable memory)
- Infiniband (fast, for pinned memory)
- Shared Memory (fastest, for pinned memory and no remote execution)
Cricket requires
- CUDA Toolkit (E.g. CUDA 12.1)
rpcbind
libcrypto
libtirpc
libtirpc is built as part of the main Makefile.
On the system where the Cricket server should be executed, the appropriate NVIDIA drivers should be installed.
git clone https://github.com/RWTH-ACS/cricket.git
cd cricket && git submodule update --init
LOG=INFO make
Environment variables for Makefile:
LOG
: Log level. Can be one ofDEBUG
,INFO
,WARNING
,ERROR
.WITH_IB
: If set toYES
build with Infiniband support.WITH_DEBUG
: Use gcc debug flags for compilation
By default Cricket uses TCP/IP as a transport for the Remote Procedure Calls. This enables both remote execution, where server and client execute on different systems and local execution, where server and client execute on the same system.
To support Cricket, the CUDA libraries must be linked dynamically to the CUDA application. For the runtime library, this can be done using the '-cudart shared' flag of nvcc
.
The Cricket library has to be preloaded to the CUDA Application. For starting the server:
<path-to-cricket>/bin/cricket-rpc-server [optional rpc id]
The client can be started like this:
CRICKET_RPCID=[optional rpc id] REMOTE_GPU_ADDRESS=<address-of-server> LD_PRELOAD=<path-to-cricket>/bin/cricket-client.so <cuda-binary>
/opt/cricket/bin/cricket-rpc-server
REMOTE_GPU_ADDRESS=127.0.0.1 LD_PRELOAD=/opt/cricket/bin/cricket-client.so /opt/cricket/tests/test_kernel
Compile the application
cd /nfs_share/cuda/samples/5_Simulations/nbody
make NVCCFLAGS="-m64 -cudart shared" GENCODE_FLAGS="-arch=sm_61"
Start the Cricket server
/opt/cricket/bin/cricket-rpc-server
Run the application
REMOTE_GPU_ADDRESS=remoteSystem.my-domain.com LD_PRELOAD=/nfs_share/cricket/bin/cricket-client.so /nfs_share/cuda/samples/5_Simulations/nbody/nbody -benchmark
- cpu: The virtualization layer
- gpu: experimental in-kernel checkpoint/restart
- submodules: Submodules are located here.
- cuda-gdb: modified GDB for use with CUDA. This is only required for in-kernel checkpoint/restart
- libtirpc: Transport Indepentend Remote Procedure Calls is requried for the virtualization layer
- tests: various CUDA applications to test cricket.
- utils: A Dockerfile for for our CI.s
Please agree to the DCO by signing off your commits.
Eiling et. al: A virtualization layer for distributed execution of CUDA applications with checkpoint/restart support. Concurrency and Computation: Practice and Experience. 2022. https://doi.org/10.1002/cpe.6474
Eiling et. al: Checkpoint/Restart for CUDA Kernels. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23). 2023. ACM. https://doi.org/10.1145/3624062.3624254
Eiling et. al: GPU Acceleration in Unikernels Using Cricket GPU Virtualization. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23). 2023. ACM. https://doi.org/10.1145/3624062.3624236
Eiling et. al: An Open-Source Virtualization Layer for CUDA Applications. In Euro-Par 2020: Parallel Processing Workshops. 2021. Lecture Notes in Computer Science, vol 12480. Springer. https://doi.org/10.1007/978-3-030-71593-9_13